Information and Communications Technology and Policy

Information and Communications Technology and Policy

Information and Communications Technology and Policy ›› 2026, Vol. 52 ›› Issue (5): 32-40.doi: 10.12267/j.issn.2096-5931.2026.05.005

Previous Articles     Next Articles

Research on platform tool systems and implementation practices for dataset construction

ZHANG Yunlong1, TONG Jinrui2, XIANG Yong1, ZHANG Zhiqiang1, YAO Guihua1, YUAN Bo2   

  1. 1 China Telecom Artificial Intelligence Technology Co., Ltd., Beijing 264001, China
    2 Institute of Artificial Intelligence, China Academy of Information and Communications Technology, Beijing 100191, China
  • Received:2026-04-20 Online:2026-05-25 Published:2026-05-28

Abstract:

This paper focuses on the demand for high-quality datasets in large model development and proposes an enterprise-level platform methodology covering resource management, automated processing, quality evaluation, version traceability, and secure sharing. By integrating visual workflows, high-performance filtering, hybrid scheduling, and a multidimensional evaluation framework, the platform addresses key challenges in large-scale data processing, including efficiency, compliance, and governance, and provides a practical solution for high-quality dataset construction.

Key words: multimodal operator fusion, operator workflow orchestration, high-quality dataset, data governance, data annotation, data evaluation

CLC Number: