信息通信技术与政策

信息通信技术与政策

信息通信技术与政策 ›› 2024, Vol. 50 ›› Issue (12): 13-20.doi: 10.12267/j.issn.2096-5931.2024.12.003

专题:人工智能赋能新型工业化 上一篇    下一篇

大语言模型核心架构演进态势分析

Analysis of large language model architecture evolution

王蕴韬   

  1. 中国信息通信研究院人工智能研究所,北京 100191
  • 收稿日期:2024-11-04 出版日期:2024-12-25 发布日期:2025-01-02
  • 作者简介:
    王蕴韬, 中国信息通信研究院人工智能研究所副总工程师,高级工程师,长期从事人工智能技术及产业应用,标准及国际合作以及政策研究支撑等工作

WANG Yuntao   

  1. Artificial Intelligence Research Institute, China Academy of Information and Communications Technology, Beijing 100191, China
  • Received:2024-11-04 Online:2024-12-25 Published:2025-01-02

摘要:

体系化梳理分析了基于Transformer架构的重要创新方向,从Transformer自身架构创新、与其他架构融合创新以及非Transformer算法创新3个维度分析了大语言模型算法演进态势,就未来大模型发展方向进行展望。

关键词: 大模型架构, Transformer, 注意力机制, 架构创新

Abstract:

This paper systematically reviews and analyzes the significant innovation directions based on the Transformer architecture. It examines the evolution of large language model architecture from three dimensions: innovation within the Transformer architecture itself, fusion innovation with other architectures, and innovations in non-Transformer architecture. This paper also provides an outlook on the future development directions of foundation models.

Key words: large model architecture, Transformer, attention mechanism, architectural innovation

中图分类号: