Construction methods and practice of high-quality datasets for telecommunications large model training
XIAO Wenbin1, LI Yufei2, HUANG Yixiao1, MA Wenda2
1China Mobile Communications Group Guangdong Co., Ltd., Guangzhou 510150, China 2Institute of Artificial Intelligence, China Academy of Information and Communications Technology, Beijing 100191, China
XIAO Wenbin, LI Yufei, HUANG Yixiao, MA Wenda. Construction methods and practice of high-quality datasets for telecommunications large model training[J]. Information and Communications Technology and Policy, 2026, 52(5): 41-49.
ARJOVSKY M, CHINTALA S, BOTTOU L. Wasserstein GAN[C]// Proceedings of the 34th International Conference on Machine Learning. Sydney: PMLR, 2017:214-223.
WEI J, WANG X, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]//Advances in Neural Information Processing Systems 35. New Orleans: Curran Associates,Inc., 2022:24824-24837.
KIRKPATRICK J, PASCANU R, RABINOWITZ N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13):3521-3526.
doi: 10.1073/pnas.1611835114URL
[10]
BARBOULE C, HUYNH V P. TelcoLM: collecting data, adapting, and benchmarking language models for the telecommunication domain[J]. arXiv Preprint, arXiv:2412.15891, 2024.