信息通信技术与政策

信息通信技术与政策

信息通信技术与政策 ›› 2019, Vol. 45 ›› Issue (7): 44-50.

上一篇    下一篇

一种机器学习数据集半自动标注方法研究*

Research on a semi-automatic labeling method for machine learning data sets

  

  • 出版日期:2019-07-15 发布日期:2020-11-26
  • 作者简介:
    吕博:中国信息通信研究院技术与标准所宽带网络研究部工程师
  • 基金资助:
    国家自然科学基金项目(No.61671159)资助

  • Online:2019-07-15 Published:2020-11-26

摘要: 基于“教师-学生”模型,提出了一种数据集半自动标注方法,解决了监督学习中数据集人工标注工作量大,数据质量不一和专业门槛高的问题。在云端试验中,利用该标注方法,一方面实现了对时钟同步模式分类数据的半自动标注,一方面实现了对数据集的难易程度的自动评估,可用于指导机器学习模型的优化与测评。

关键词: 机器学习, 数据标注, “教师-学生”模型

Abstract: Based on the teacher- student model, a semi-automatic annotation method for datasets was proposed, which solved the problem of large workload of dataset manual annotation, different data quality and high professional threshold in supervised learning. In the cloud experiment, the annotation method was used to realize the semi-automatic labeling of the clock synchronization pattern classification data. On the other hand, the automatic evaluation of the difficulty of the data set was realized, which can be used to guide the optimization and evaluation of the machine learning model.

Key words: machine learning, data annotation, teacher-student model