面向可信大语言模型智能体的安全挑战与应对机制

doi:10.12267/j.issn.2096-5931.2025.01.005

信息通信技术与政策 ›› 2025, Vol. 51 ›› Issue (1): 33-37.doi: 10.12267/j.issn.2096-5931.2025.01.005

面向可信大语言模型智能体的安全挑战与应对机制

Security challenges and response mechanisms for trustworthy large language model agents

张熙¹, 李朝卓¹, 许诺¹, 张力天²

1.北京邮电大学网络空间安全学院,北京 100876
2.北京航空航天大学网络空间安全学院,北京 100191

收稿日期:2024-06-28 出版日期:2025-01-25 发布日期:2025-02-14
通讯作者: 李朝卓, 北京邮电大学网络空间安全学院特聘副研究员,主要从事可信大语言模型、图神经网络和推荐系统等研究工作。
作者简介:
张熙, 北京邮电大学网络空间安全学院教授,博士生导师,主要从事信息内容安全和人工智能安全等研究工作;
许诺, 北京邮电大学网络空间安全学院硕士研究生在读,主要从事大模型安全等研究工作;
张力天, 北京航空航天大学网络空间安全学院博士研究生在读,主要从事大模型安全、大模型多智能体、多模态机器学习等研究工作。

ZHANG Xi¹, LI Chaozhuo¹, XU Nuo¹, ZHANG Litian²

1. School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
2. School of Cyber Science and Technology, Beihang University, Beijing 100191, China

Received:2024-06-28 Online:2025-01-25 Published:2025-02-14

摘要/Abstract

摘要：

随着大语言模型驱动的智能体在各领域的应用日益深化,潜在的安全隐患逐渐凸显。旨在系统梳理基于大语言模型的智能体面临的信息泄露、模型攻击、幻觉输出、伦理道德风险和法律合规隐患等安全可信问题。通过对这些安全隐患的成因与影响进行深入分析,探讨现有的防护措施和技术手段,提出构建可信大语言模型智能体的建议,为相关研究和实践提供参考。

关键词: 可信大语言模型智能体, 安全, 防御

Abstract:

As the application of large language model-driven agents deepens in various fields, potential security risks are gradually prominent. This paper aims to systematically sort out the security and trustworthiness problems faced by agents based on large language models, including information leakage, model attacks, hallucination outputs, ethical and moral risks, and legal compliance hazards. By conducting an in-depth analysis of the causes and impacts of these security risks, this paper discusses existing protective measures and technical means, and proposes suggestions for building trustworthy large language model agents, providing references for related research and practice.

Key words: trustworthy large language model agent, security, defense

中图分类号:

TP391.1

张熙, 李朝卓, 许诺, 张力天. 面向可信大语言模型智能体的安全挑战与应对机制[J]. 信息通信技术与政策, 2025, 51(1): 33-37.

ZHANG Xi, LI Chaozhuo, XU Nuo, ZHANG Litian. Security challenges and response mechanisms for trustworthy large language model agents[J]. Information and Communications Technology and Policy, 2025, 51(1): 33-37.

导出引用管理器 EndNote|Ris|BibTeX

链接本文:

http://ictp.caict.ac.cn/CN/10.12267/j.issn.2096-5931.2025.01.005

http://ictp.caict.ac.cn/CN/Y2025/V51/I1/33

图/表 1

参考文献 16

[1]	LU D, PANG T, DU C, et al. Test-time backdoor attacks on multimodal large language models[J]. arXiv Preprint, arXiv: 2402.08577, 2024.
[2]	LIU H, LIU Z, TANG R, et al. LoRA-as-an-attack! piercing LLM safety under the share-and-play scenario[J]. arXiv Preprint, arXiv: 2403.00108, 2024.
[3]	WEI C, KUN C, MENG W, et al. LMSanitator: defending prompt-tuning against task-agnostic backdoors[J]. arXiv Preprint, arXiv: 2308.13904, 2023.
[4]	CHEN D, WANG H, HUO Y, et al. GameGPT: multi-agent collaborative framework for game development[J]. arXiv Preprint, arXiv: 2310.08067, 2023.
[5]	DU Y, LI S, TORRALBA A, et al. Improving factuality and reasoning in language models through multiagent debate[J]. arXiv Preprint, arXiv: 2305.14325, 2023.
[6]	MAYA A, AMIT G, GOLDSTEEN A, et al. Is my data in your retrieval database? membership inference attacks against retrieval augmented generation[J]. arXiv Preprint, arXiv: 2405.20446, 2024.
[7]	LI H, GUO D, FAN W, et al. Multi-step jailbreaking privacy attacks on ChatGPT[J]. arXiv Preprint, arXiv: 2304.05197, 2023.
[8]	LI H, XU M, SONG Y. Sentence embedding leaks more information than you expect: generative embedding inversion attack to recover the whole sentence[J]. arXiv Preprint, arXiv: 2305.03010, 2023.
[9]	EITAN B, GEIPING J, CHEREPANOVA V, et al. DP-InstaHide: provably defusing poisoning and backdoor attacks with differentially private data augmentations[J]. arXiv Preprint, arXiv: 2103.02079, 2021.
[10]	TIM B, GAO Y, ALON D, et al. Best-of-venom: attacking RLHF by injecting poisoned preference data[J]. arXiv Preprint, arXiv: 2404.05530, 2024.
[11]	HUANG T, HU S, LIU L. Vaccine: perturbation-aware alignment for large language model[J]. arXiv Preprint, arXiv: 2402.01109, 2024.
[12]	HUANG T, HU S, ILHAN F, et al. Lazy safety alignment for large language models against harmful fine-tuning[J]. arXiv Preprint, arXiv: 2405.18641, 2024.
[13]	HUANG T, BHATTACHARYA G, JOSHI P, et al. Antidote: post-fine-tuning safety alignment for large language models against harmful fine-tuning[J]. arXiv Preprint, arXiv: 2408.09600, 2024.
[14]	LI S, SUN T, CHENG Q, et al. Agent alignment in evolving social norms[J]. arXiv Preprint, arXiv:2401.04620, 2024.
[15]	LIN B, BOUNEFFOUF D, CECCHI G, et al. Towards healthy AI: large language models need therapists too[J]. arXiv Preprint, arXiv: 2304.00416, 2023.
[16]	KAI G, ABDELNABI S, MISHRA S, et al. Not what you’ve signed up for: compromising real-world LLM-integrated applications with indirect prompt injection[J]. arXiv Preprint, arXiv: 302.12173, 2023.

面向可信大语言模型智能体的安全挑战与应对机制

Security challenges and response mechanisms for trustworthy large language model agents

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 1

参考文献 16

相关文章 15

编辑推荐

Metrics

本文评价

[1]	徐明. 生成式人工智能大模型的安全挑战与治理路径研究[J]. 信息通信技术与政策, 2025, 51(1): 10-19.
[2]	苑卫国, 张新跃, 尉迟学彪. 生成式人工智能技术对网络安全领域的影响分析与启示建议[J]. 信息通信技术与政策, 2025, 51(1): 2-9.
[3]	李诗婧, 赵爽. 生成式人工智能应用于制造业的网络安全风险及对策研究[J]. 信息通信技术与政策, 2025, 51(1): 20-24.
[4]	周成胜, 赵勋, 田慧蓉. 基于生成式人工智能的网络安全主动防御技术[J]. 信息通信技术与政策, 2025, 51(1): 25-32.
[5]	杨云龙, 郭中梅, 张亮, 孙亮, 杨旭蕾. 数据安全体系建设的研究及思考[J]. 信息通信技术与政策, 2025, 51(1): 40-45.
[6]	卢丹, 崔颖, 何异舟, 韩文婷. 6G安全需求和关键技术探讨[J]. 信息通信技术与政策, 2025, 51(1): 52-55.
[7]	杨跃强, 陈洁, 刘谦. F5G时代宽带客户网络安全能力标准化现状与关键技术研究[J]. 信息通信技术与政策, 2024, 50(9): 85-91.
[8]	王娟娟, 张倩. 全球主要网络安全指数研究[J]. 信息通信技术与政策, 2024, 50(8): 2-8.
[9]	董耀聪, 张倩, 李宝强, 李艺, 董悦. 基于生成式人工智能的工业互联网安全技术与应用研究[J]. 信息通信技术与政策, 2024, 50(8): 32-37.
[10]	陈湉, 刘明辉, 汪露露. 算力网络数据安全保护框架研究[J]. 信息通信技术与政策, 2024, 50(8): 62-67.
[11]	关伟东. 国外人工智能数据安全规制及对我国的启示[J]. 信息通信技术与政策, 2024, 50(8): 68-72.
[12]	邱诗韵, 林梓瀚. 智能网联汽车产业数据出境安全机制构建探析^*[J]. 信息通信技术与政策, 2024, 50(8): 73-79.
[13]	姚景良. “互联网+”时代政务数据安全问题应对措施[J]. 信息通信技术与政策, 2024, 50(8): 80-83.
[14]	耿进步, 贺磊, 牛玉坤. 数字化转型背景下制造侧网络安全发展情况与建议^*[J]. 信息通信技术与政策, 2024, 50(8): 9-16.
[15]	缪亚军, 董芃, 李明翰, 马彰超, 戚巍. 面向时间敏感网络的量子保密通信技术概述^*[J]. 信息通信技术与政策, 2024, 50(7): 2-8.