[1] |
Dean J . The deep learning revolution and its implications for computer architecture and chip design [ J ]. arXiv: 1911. 05289, 2019.
|
[2] |
Amdahl G M . Validity of the single processor approach to achieving large-scale computing capabilities [ C ] / / AFIPS Conference, 1967.
|
[3] |
Gustafson J L. Reevaluating amdahls law [ J ]. Communications of the ACM, 1988,31(5):532-533.
|
[4] |
Ristov S , Prodan R , Gusev M , et al. Superlinear speedup in HPC systems: why and when? [ C ] / / Computer Science & Information Systems. IEEE, 2016.
|
[5] |
Shi Y. Reevaluating amdahls law and gustafsons law [ R ]. US: Computer Sciences Department, Temple University, 1996.
|
[6] |
Gusev M , Ristov S . A superlinear speedup region for matrix multiplication[J]. Concurrency and Computation: Practice and Experience, 2014,26(11).
|
[7] |
Djinevski L, Ristov S, Gusev M. Superlinear speedup for matrix multiplication in GPU devices [J]. Advances in Intelligent Systems and Computing, 2013(207):285-294.
|
[8] |
Gusev M, Ristov S. Superlinear speedup in windows azure Cloud [ C ] / / 1st Int. Conference on Cloud Networking, 2012.
|
[9] |
Cédric A, Thibault S, Namyst R, et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures[J]. Concurrency and Computation: Practiceand Experience, 2011,23(2).
|
[10] |
Anchev N, Gusev M, Ristov S. Intel vs AMD:matrix multiplication performance [J]. Information Communication
|
|
Technology Electronics Microelectronics, 2013:182-187.
|
[11] |
刘铁岩, 陈薇, 王太峰, 等. 分布式机器学习:算法、理论与实践[M]. 北京:机械工业出版社, 2018.
|
[12] |
You Y, Zhang Z, Hsieh C J, et al. Fast deep neural network training on distributed systems and cloud TPUs [J ]. IEEE Transactions on Parallel and Distributed Systems, 2019(99):1-1.
|
[13] |
Goyal P, Dollár, Piotr, Girshick R, et al. Accurate, large minibatch SGD: training image net in 1 hour[J]. arXiv:1706. 02677, 2017.
|
[14] |
Akiba T, Suzuki S, Fukuda K. Extremely large minibatch sgd: training resnet-50 on imagenet in 15 minutes[J]. 2017.
|
[15] |
Jia X, Song S, He W, et al. Highly scalable deep learning training system with mixed-precision: training imagenet
|
|
in four minutes[J]. arXiv:1807. 11205, 2018:1-9.
|
[16] |
Hiroaki M, Hisahiro S, et al. Imagenet / resnet-50 training in 224 seconds[J]. arXiv:1811. 05233v1, 2018.
|
[17] |
Masafumi Y, Akihiko K, Akihiro T. Yet another accelerated sgd: resnet-50 training on imagenet in 74. 7 seconds[J]. arXiv:1903. 12650v1, 2019.
|
[18] |
Dekel O, Gilad-Bachrach R, Shamir O, et al. Optimal distributed online prediction using mini-batches [ J ]. Journal of Machine Learning Research, 2010(13): 165- 202.
|
[19] |
Yu H, Yang S, Zhu S. Parallel restarted SGD with faster convergence and less communication: demystifying why model averaging works for deep learning [ J ]. Proceedings of the AAAI Conference on Artificial
|
|
Intelligence, 2019(33):5693-5700.
|
[20] |
Stich S U. Local SGD converges fast and communicates little [ J ]. International Conference on Learning Representations, 2018.
|
[21] |
Haddadpour F, Kamani M, Mahdavi M , et al. Local SGD with periodic averaging: tighter analysis and adaptive synchronization[J]. arXiv:1910. 13598, 2019:1-24.
|
[22] |
Shen S, Xu L, Liu J, et al. Faster distributed deep net training: computation and communication decoupled stochastic gradient descent [ C ] / / Twenty-Eighth International Joint Conference on Artificial Intelligence IJCAI-19, 2019.
|
[23] |
Liu C, Wei J, Wang Y, et al. Optimizing deep learning frameworks incrementally to get linear speedup: a comparison between IPoIB and RDMA verbs [ C ] / / 2018 IEEE 24th International Conference on Parallel and
|
|
Distributed Systems. IEEE, 2018.
|
[24] |
Ouyang S, Dong D, Xu Y, et al. Communication optimization strategies for distributed deep learning: a survey [ J ]. Journal of Parallel and Distributed Computing, 2021(149):52-65.
|
[25] |
王蕴韬. “新基建”助推人工智能基础设施全面升级[J]. 通信世界, 2020(7):20-21.
|