Information and Communications Technology and Policy

Information and Communications Technology and Policy

Information and Communications Technology and Policy ›› 2026, Vol. 52 ›› Issue (2): 18-23.doi: 10.12267/j.issn.2096-5931.2026.02.003

Previous Articles     Next Articles

PD separation architecture and implementation based on wide-area distributed inference network

WANG Feifei1, DENG Heng1, TANG Jing1, WANG Wei1, SU Yue2   

  1. 1. Research Institute of China Telecom Corporation Limited,Beijing 102209,China
    2. Cloud Computing and Digitalization Research Institute,China Academy of Information and Communications Technology,Beijing 100191,China
  • Received:2025-12-20 Online:2026-02-25 Published:2026-03-06

Abstract:

With the explosive growth in demand for large models inference,traditional centralized or static multi-data center deployment models face severe challenges in latency,data compliance,and resource elasticity. This paper proposes a cloud-edge collaborative wide-area distributed inference network architecture,focusing on building a new intelligent-computing service system for the emerging computing-power internet. The architecture introduces a prefill-decode separation mechanism: the latency-sensitive prefill stage is offloaded to edge nodes closer to data sources,while the high-throughput decode stage is deployed in the central cloud,enabling secure collaboration over a wide-area network.

Key words: wide-area distributed inference, prefill-decode separation, large model inference

CLC Number: