| [1] |
KAPLAN J, MCCANDLISH S, HENIGHAN T, et al. Scaling laws for neural language model[J]. arXiv Preprint, arXiv:2001.08361, 2020.
|
| [2] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing systems. Long Beach: NIPS, 2017: 5998-6008.
|
| [3] |
OpenAI. Improving language understanding by generative pre-training[R], 2018.
|
| [4] |
RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J]. OpenAI Blog, 2019, 1(8): 9.
|
| [5] |
BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[J]. Advances in Neural Information Processing Systems, 2020, 33: 1877-1901.
|
| [6] |
DEEPSEEK-AI D A, GUO D, YANG D, et al. DeepSeek-R1:incentivizing reasoning capability in LLMs via reinforcement learning[EB/OL]. 2025[2025-07-08]. https://arxiv.org/pdf/2501.12948.
|
| [7] |
SCHICK T, DWIVEDI-YU J, DESSI R, et al. Toolformer: language models can teach themselves to use tools[EB/OL]. 2023[2025-07-08]. https://doi.org/10.48550/arXiv.2302.04761.
|
| [8] |
WAN Z, WANG X, LIU C, et al. Efficient large language models: a survey.[J]. arXiv Preprint, arXiv: 2312.03863, 2023.
|
| [9] |
王帅, 李丹. 分布式机器学习系统网络性能优化研究进展[J]. 计算机学报, 2022, 45(7): 1384-1411.
|
| [10] |
JIANG Z, LIN H, ZHONG Y, et al. MegaScale: scaling large language model training to more than 10,000 GPUs[C]// The Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation. Santa Clara: USENIX Association, 2024: 16-18.
|
| [11] |
Team L, AI @Meta. The Llama 3 herd of models[EB-OL]. (2024-07-23)[2025-07-08]. https://Llama.meta.com/.
|
| [12] |
徐勇. 先进存力在三个方面展现六大新特征[N]. 人民邮电, 2024-10-31(005).
|
| [13] |
郭亮, 王少鹏, 权伟, 等. 面向大模型的智算网络发展研究[J]. 电信科学, 2024, 40(6): 137-145.
doi: 10.11959/j.issn.1000-0801.2024147
|
| [14] |
中国信息通信研究院. 中国运力发展报告(2024年)[R], 2024.
|
| [15] |
JLL. Data centers 2024 global outlook[R], 2024.
|
| [16] |
The International Energy Agency (IEA). Data centres and data transmission networks[EB/OL]. (2025-01-03)[2025-07-08]. https://www.iea.org/energy-system/buildings/data-centres-and-data-transmission-networks?itid=lk_inline_enhanced-template.
|
| [17] |
ZHANG Z X, WEN Y B, LYU H Q, et al. AI computing systems for large language models training[J]. Journal of Computer Science and Technology, 2024, 40(1): 6-41.
doi: 10.1007/s11390-024-4178-1
|
| [18] |
SemiAnalysis. 100,000 H100 clusters: power, network topology, ethernet vs infiniBand, reliability, failures, checkpointing[R], 2024.
|
| [19] |
Apple. Apple intelligence foundation language models[R], 2024.
|
| [20] |
Yale University. Scaling laws for post training quantized large language models[R], 2024.
|
| [21] |
DeepSeekAI. DeepSeek-V3 technical report[R], 2024.
|