Research on the development of compute-storage collaboration driven by large model inference

doi:10.12267/j.issn.2096-5931.2025.10.001

Information and Communications Technology and Policy ›› 2025, Vol. 51 ›› Issue (10): 2-6.doi: 10.12267/j.issn.2096-5931.2025.10.001

Previous Articles Next Articles

Research on the development of compute-storage collaboration driven by large model inference

ZHOU Lan, CHEN Lei

Informatization and Industrialization Integration Research Institute, China Academy of Information and Communications Technology, Beijing 100191, China

Received:2025-09-10 Online:2025-10-25 Published:2025-11-06

Abstract

Abstract:

With the continuous enhancement of large model capabilities and the deepening of inference applications, the scale of data processing has expanded drastically, and data processing requirements have become increasingly diversified, this has imposed higher demands on the collaborative between storage and computing power. In response to the new demands on storage systems by larger data volumes, larger model sizes, and longer context windows in current large model inference scenarios, this study first conducts an in-depth analysis of the implementation mechanisms, key technologies, and practical applications of both “computing-in-place-of-storage” and “storage-in-place-of-computing”, Subsequently, by integrating the current technological and industrial foundation as well as application scenario requirements, this paper proposes that based on access latency and bandwidth demands,a hierarchical and systematic collaborative storage model for the future development of computing-storage synergy is important. This paper aims to explore the specific implementation mechanisms and evolutionary pathways of compute-storage collaboration, providing valuable references for promoting the improvement of intelligent computing cluster utilization efficiency and better supporting the development of large model inference.

Key words: large model inference, AI storage, KV Cache, computing-storage collaboration

CLC Number:

ZHOU Lan, CHEN Lei. Research on the development of compute-storage collaboration driven by large model inference[J]. Information and Communications Technology and Policy, 2025, 51(10): 2-6.

Add to citation manager EndNote|Ris|BibTeX

URL:

http://ictp.caict.ac.cn/EN/10.12267/j.issn.2096-5931.2025.10.001

http://ictp.caict.ac.cn/EN/Y2025/V51/I10/2

References 11

[1]	悠然见南山. DeepSeek 70B大模型 FP16微调硬件要求[EB/OL]. (2025-06-23)[2025-09-01]. https://zestb.com/81200.html.
[2]	算法熔炉. 大模型扫盲之推理时显存占用计算[EB/OL]. (2025-04-28)[2025-09-01]. https://blog.csdn.net/kycg_/article/details/147350574.
[3]	DeepSeek-AI, LIU A X, FENG B, et al. DeepSeek-V3: a strong mixture-of-experts language model[J]. arXiv Preprint, arXiv:2406.07524, 2024.
[4]	SUN Z, YANG A, LITMAN D. Heavy-hitter oracle: constrained generation for LLMs using KV cache[C]// IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul: IEEE, 2024:12165-12169.
[5]	HOOPER C, KIM S, MOHAMMADZADEH H, et al. KVQuant: towards 10 million context length LLM inference with KV cache quantization[J]. arXiv Preprint, arXiv:2401.18079, 2024.
[6]	新华网. 中国工程院院士郑纬民:内存型长记忆存储以存换算,是AI推理新趋势[EB/OL]. (2024-11-01)[2025-09-01]. https://www.xinhuanet.com/tech/20241101/f9ccc1a70d8748bc8c6e75fde88a8e4d/c.html.
[7]	新浪财经. 华为:构筑先进AI存力底座,引领时代更创造时代[EB/OL]. (2024-09-24)[2025-09-01]. https://baijiahao.baidu.com/s?id=1811071914663022446&wfr=spider&for=pc.
[8]	KWON W, LI Z H, ZHUANG S Y, et al. vLLM: easy, fast, and cheap LLM serving with PagedAttention[EB/OL]. (2023-06-20)[2025-09-01]. https://blog.vllm.ai/2023/06/20/vllm.html.
[9]	DAO T, FU D, ERMON S, et al. FlashAttention: fast and memory-efficient exact attention with IO-awareness[J] arXiv Preprint, arXiv:2205.14135, 2022.
[10]	半导体行业观察. 系统内存的未来,属于CXL[EB/OL]. (2022-08-10)[2025-09-01]. https://user.guancha.cn/main/content?id=824811.
[11]	中国信息通信研究院. 先进计算暨算力发展指数蓝皮书(2024年)[R], 2024.

Research on the development of compute-storage collaboration driven by large model inference

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

References 11

Related Articles 0

Recommended Articles

Metrics

Comments