| [1] |
程乐. 我国高质量场景数据集的供给现状与发展策略[J]. 人民论坛, 2025(5):68-72.
|
| [2] |
中国信息通信研究院. 人工智能高质量数据集建设指南[R], 2025.
|
| [3] |
TADAS B, CHAITANYA A, LOUIS-PHILIPPE M. Multimodal machine learning: a survey and taxonomy[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 41(2):423-443.
doi: 10.1109/TPAMI.2018.2798607
pmid: 29994351
|
| [4] |
SETTLES B. Active learning literature survey[R], 2009.
|
| [5] |
SHORTEN C, KHOSHGOFTAAR T M. A survey on image data augmentation for deep learning[J]. Journal of Big Data, 2019, 6(1):1-48.
doi: 10.1186/s40537-018-0162-3
|
| [6] |
中国信息通信研究院. 高质量数据集建设指引[R], 2025.
|
| [7] |
RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// International Conference on Machine Learning. New York: PMLR, 2021:8748-8763.
|
| [8] |
HUO Y, ZHANG M, LIU G, et al. WenLan: bridging vision and language by large-scale multi-modal pre-training[J]. IEEE Transactions on Multimedia, 2022, 25:1131-1143.
|
| [9] |
SUN C, MYERS A, VONDRICK C, et al. VideoBERT: a joint model for video and language representation learning[C]// IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019:7464-7473.
|
| [10] |
LUO H, JI L, ZHONG M, et al. UniVL: a unified video and language pre-training model for multimodal understanding and generation[J]. arXiv Preprint, arXiv:2002.06353, 2020.
|
| [11] |
RADFORD A, KIM J W, XU T, et al. Robust speech recognition via large-scale weak supervision[C]// Proceedings of the 40th International Conference on Machine Learning. Honolulu: ACM, 2023:28492-28518.
|
| [12] |
BAEVSKI A, ZHOU Y, MOHAMED A, et al. Wav2vec 2.0: a framework for self-supervised learning of speech representations[J]. Advances in Neural Information Processing Systems, 2020, 33:12449-12460.
|
| [13] |
CHEN K, WANG J, PANG J, et al. MMDetection: open MMLab detection toolbox and benchmark[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019: 9269-9278.
|
| [14] |
DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of NAACL-HLT. Stroudsburg:ACL, 2019: 4171-4186.
|
| [15] |
SUN Y, WANG S, LI Y, et al. ERNIE 2.0: a continual pre-training framework for language understanding[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020, 34(5): 8968-8975.
|
| [16] |
LEWIS D D, GALE W A. A sequential algorithm for training text classifiers[C]// ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1994:3-12.
|
| [17] |
KONYUSHKOVA K, SZNITMAN R, FUA P. Learning active learning from data[C]// Advances in Neural Information Processing Systems. Red Hook: Curran Associates, Inc., 2017, 30.
|
| [18] |
SENER O, SAVARESE S. Active learning for convolutional neural networks: a core-set approach[C]// International Conference on Learning Representations. Online:OpenReview, 2018.
|
| [19] |
RATNER A, BACH S H, EHRENBERG H, et al. Snorkel: rapid training data creation with weak supervision[C]// Proceedings of the VLDB Endowment. New York: VLDB Endowment, 2017, 11(3): 269-282.
|
| [20] |
RATNER A, VARMA P, HANCOCK B, et al. Learning to compose domain-specific transformations for data augmentation[J]. Advances in Neural Information Processing Systems, 2017, 30.
|
| [21] |
FU B, LI W, MA S, et al. Graph-based weak label denoising for entity typing[C]// Proceedings of the Web Conference. New York: ACM Press, 2021: 932-942.
|
| [22] |
SAMBASIVAN N, KAPANIA S, HIGHFILL H, et al. “Everyone wants to do the model work, not the data work”: data cascades in high-stakes AI[C]// Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. New York: ACM Press, 2021: 1-15.
|
| [23] |
NORTHCUTT C G, ATHALYE A, MUELLER J. Pervasive label errors in test sets destabilize machine learning benchmarks[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021, 35(11):9651-9660.
|
| [24] |
NORTHCUTT C G, JIANG L, CHUANG I L. Confident learning: estimating uncertainty in dataset labels[J]. Journal of Artificial Intelligence Research, 2021, 70:1373-1411.
doi: 10.1613/jair.1.12125
URL
|
| [25] |
GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[C]// Advances in Neural Information Processing Systems. Red Hook: Curran Associates, Inc., 2014, 27.
|
| [26] |
KINGMA D P, WELLING M. Auto-encoding variational bayes[C]// International Conference on Learning Representations. Online:OpenReview, 2014.
|
| [27] |
HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[J]. Advances in Neural Information Processing Systems, 2020, 33: 6840-6851.
|
| [28] |
WONG T T, GUO N. Finite element simulation for computer-aided synthesis of defect images[J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 1-12.
|
| [29] |
YOON J, JARRETT D, VAN DER SCHAAR M. Time-series generative adversarial networks[C]// Advances in Neural Information Processing Systems. Red Hook: Curran Associates, Inc., 2019, 32.
|
| [30] |
LE GUENNEC A, MALINOWSKI S, TAVENARD R. Data augmentation for time series classification using convolutional neural networks[C]// ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data. Berlin:Springer, 2016.
|
| [31] |
FISHER A, RUDIN C, DOMINICI F. All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously[J]. Journal of Machine Learning Research, 2019, 20(177):1-81.
|
| [32] |
中国信息通信研究院. 工业数据空间发展研究报告[R], 2024.
|
| [33] |
LEI Y, YANG B, JIANG X, et al. Applications of machine learning to machine fault diagnosis: A review and roadmap[J]. Mechanical Systems and Signal Processing, 2020, 138:106587.
doi: 10.1016/j.ymssp.2019.106587
URL
|
| [34] |
中国人工智能产业发展联盟. 人工智能数据标注行业研究报告[R], 2024.
|