Information and Communications Technology and Policy ›› 2025, Vol. 51 ›› Issue (10): 73-86.doi: 10.12267/j.issn.2096-5931.2025.10.011
Previous Articles Next Articles
WANG Ling1, YAN Kun2, NIE Peng2
Received:2025-05-10
Online:2025-10-25
Published:2025-11-06
CLC Number:
WANG Ling, YAN Kun, NIE Peng. A review of multimodal deepfake detection technology[J]. Information and Communications Technology and Policy, 2025, 51(10): 73-86.
| 数据集 | 模态类型 | 样本量 | 标注粒度 | 应用场景 |
|---|---|---|---|---|
| VoxCeleb | 音频 | 100 000+段 | 说话人身份标签 | 身份识别 |
| VoxCeleb2 | 音频 | 1 000 000+段 | 说话人身份标签 | 扩展身份识别 |
| FakeAVCeleb | 视频+音频 | 20 000个 | 真/伪标签 | 标签检测 |
| DefakeAVMiT | 视频+音频 | 7 020个 | 真/伪标签 | 标签检测 |
| PolyGlotFake | 视频+音频 | 15 238个 | 真/伪标签+多语种 | 标签检测 |
| Deepfake-Eval-2024 | 视频+音频+图像 | 101.5 h音/视频+ 1 975张图片 | 真/伪标签+多语种 | 标签检测 |
| LAV-DF | 视频+音频 | 136 304个 | 真/伪标签+时间戳 | 定位检测 |
| DGM4 | 图像+文本 | 230 000个 | 真/伪标签+篡改类型标签+ 定位坐标 | 多类型定位检测 |
| MMTT | 图像+文本 | 128 303个 | 真/伪标签+篡改原因解释文本 | 可解释性伪造检测 |
| MMTD-Set | 图像+文本+掩码 | — | 真/伪标签+掩码+描述 | 可解释性伪造检测 |
| 数据集 | 模态类型 | 样本量 | 标注粒度 | 应用场景 |
|---|---|---|---|---|
| VoxCeleb | 音频 | 100 000+段 | 说话人身份标签 | 身份识别 |
| VoxCeleb2 | 音频 | 1 000 000+段 | 说话人身份标签 | 扩展身份识别 |
| FakeAVCeleb | 视频+音频 | 20 000个 | 真/伪标签 | 标签检测 |
| DefakeAVMiT | 视频+音频 | 7 020个 | 真/伪标签 | 标签检测 |
| PolyGlotFake | 视频+音频 | 15 238个 | 真/伪标签+多语种 | 标签检测 |
| Deepfake-Eval-2024 | 视频+音频+图像 | 101.5 h音/视频+ 1 975张图片 | 真/伪标签+多语种 | 标签检测 |
| LAV-DF | 视频+音频 | 136 304个 | 真/伪标签+时间戳 | 定位检测 |
| DGM4 | 图像+文本 | 230 000个 | 真/伪标签+篡改类型标签+ 定位坐标 | 多类型定位检测 |
| MMTT | 图像+文本 | 128 303个 | 真/伪标签+篡改原因解释文本 | 可解释性伪造检测 |
| MMTD-Set | 图像+文本+掩码 | — | 真/伪标签+掩码+描述 | 可解释性伪造检测 |
| 方法 | 数据集 | 性能指标 | 性能 | 核心创新点 | 发表时间/年 |
|---|---|---|---|---|---|
| Lewis等 | DFDC | ACC | 61.95% | DCT进行面部频谱特征分析 | 2020 |
| M2TR | SR-DF FF++ | ACC | 91.2% 99.5% | 多尺度变换器捕捉不同尺度下的 伪造特征 | 2021 |
| Salvi等 | — | — | — | 视频单模态提取训练 | 2023 |
| AVoiD-DF | DefakeAVMiT | ACC | 83.70% | 多模态联合学习与时空特征融合 | 2023 |
| Multimodaltrace | FakeAVCeleb | ACC | 92.9% | 跨模态多层次混合学习 | 2023 |
| Muppalla等 | FakeAVCeleb | ACC | 90.51% | 跨模态多任务学习 | 2023 |
| Shirley等 | 混合数据集 | ACC | 96.8% | 多模态融合与注意力机制 | 2024 |
| Gandhi等 | 混合数据集 | ACC | 94% | 面部特征提取与梅尔频谱图分析 的组合 | 2024 |
| FRADE | FakeAVCeleb | AUC | 93.1% | 音频蒸馏跨模态交互 | 2024 |
| MCAN | Twitter | ACC | 80.9% 89.9% | 层叠共注意力机制 | 2021 |
| 方法 | 数据集 | 性能指标 | 性能 | 核心创新点 | 发表时间/年 |
|---|---|---|---|---|---|
| Lewis等 | DFDC | ACC | 61.95% | DCT进行面部频谱特征分析 | 2020 |
| M2TR | SR-DF FF++ | ACC | 91.2% 99.5% | 多尺度变换器捕捉不同尺度下的 伪造特征 | 2021 |
| Salvi等 | — | — | — | 视频单模态提取训练 | 2023 |
| AVoiD-DF | DefakeAVMiT | ACC | 83.70% | 多模态联合学习与时空特征融合 | 2023 |
| Multimodaltrace | FakeAVCeleb | ACC | 92.9% | 跨模态多层次混合学习 | 2023 |
| Muppalla等 | FakeAVCeleb | ACC | 90.51% | 跨模态多任务学习 | 2023 |
| Shirley等 | 混合数据集 | ACC | 96.8% | 多模态融合与注意力机制 | 2024 |
| Gandhi等 | 混合数据集 | ACC | 94% | 面部特征提取与梅尔频谱图分析 的组合 | 2024 |
| FRADE | FakeAVCeleb | AUC | 93.1% | 音频蒸馏跨模态交互 | 2024 |
| MCAN | Twitter | ACC | 80.9% 89.9% | 层叠共注意力机制 | 2021 |
| 方法 | 数据集 | 性能指标 | 性能 | 核心创新点 | 发表时间/年 |
|---|---|---|---|---|---|
| Jabeen等 | CASIA V2.0 | ACC Presicion | 93.04% 85.47% | 误差级别分析 | 2020 |
| UMMAFormer | Lav-DF | AP@0.95 AR@100 | 77.72% 97.34% | 时序数据的多模态适应 | 2023 |
| DGM4 | DGM4 | ACC Precision | 93.44% 70.9% | 浅层推理和深层推理多任务检测 | 2023 |
| Triaridis等 | — | F1 | .750(前) .751(后) | 后融合与前融合 | 2024 |
| Shuai等 | FF++ | ACC | 70%以上 | 双流网络 | 2023 |
| 方法 | 数据集 | 性能指标 | 性能 | 核心创新点 | 发表时间/年 |
|---|---|---|---|---|---|
| Jabeen等 | CASIA V2.0 | ACC Presicion | 93.04% 85.47% | 误差级别分析 | 2020 |
| UMMAFormer | Lav-DF | AP@0.95 AR@100 | 77.72% 97.34% | 时序数据的多模态适应 | 2023 |
| DGM4 | DGM4 | ACC Precision | 93.44% 70.9% | 浅层推理和深层推理多任务检测 | 2023 |
| Triaridis等 | — | F1 | .750(前) .751(后) | 后融合与前融合 | 2024 |
| Shuai等 | FF++ | ACC | 70%以上 | 双流网络 | 2023 |
| 方法 | 数据集 | 性能指标 | 性能 | 核心创新点 | 发表时间/年 |
|---|---|---|---|---|---|
| FakeBench | FakeBench | ACC | 78.03%(GPT-4V) | 可解释性检测 | 2024 |
| FakeShied | MMTD-Set | ACC | 93%(Deepfake) 99%(AIGC) | 可解释性检测 | 2024 |
| SIDA | SID-Set | ACC | 87.3% | 多分类可解释性检测 | 2024 |
| VLForgery | VLF | ACC | 93.82%(部分合成) 88%以上(定位) | 结合MLLMs进行生成 图像的可解释性检测 | 2025 |
| ForgeryGPT | DGM | ACC | 81.6% | 深层次语义理解和可解释性分析 | 2024 |
| Yu等 | FF++ DFD | AUC | 99.53% 99.64% | 整合外部知识的可解释性检测 | 2025 |
| FakeVLM | FakeClue综合数据集 | ACC | 96.3%(FF++) | 伪造痕迹检测的自然语言 解释大型模型 | 2025 |
| HAQ等 | PDD | ACC | 93.7% | 基于情感转变时间标注 | 2024 |
| 方法 | 数据集 | 性能指标 | 性能 | 核心创新点 | 发表时间/年 |
|---|---|---|---|---|---|
| FakeBench | FakeBench | ACC | 78.03%(GPT-4V) | 可解释性检测 | 2024 |
| FakeShied | MMTD-Set | ACC | 93%(Deepfake) 99%(AIGC) | 可解释性检测 | 2024 |
| SIDA | SID-Set | ACC | 87.3% | 多分类可解释性检测 | 2024 |
| VLForgery | VLF | ACC | 93.82%(部分合成) 88%以上(定位) | 结合MLLMs进行生成 图像的可解释性检测 | 2025 |
| ForgeryGPT | DGM | ACC | 81.6% | 深层次语义理解和可解释性分析 | 2024 |
| Yu等 | FF++ DFD | AUC | 99.53% 99.64% | 整合外部知识的可解释性检测 | 2025 |
| FakeVLM | FakeClue综合数据集 | ACC | 96.3%(FF++) | 伪造痕迹检测的自然语言 解释大型模型 | 2025 |
| HAQ等 | PDD | ACC | 93.7% | 基于情感转变时间标注 | 2024 |
| [1] | GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. |
| [2] |
张璐, 芦天亮, 杜彦辉. 人脸视频深度伪造检测方法综述[J]. 计算机科学与探索, 2023, 17(1): 1-26.
doi: 10.3778/j.issn.1673-9418.2205035 |
| [3] | RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[J/OL]. arXiv Preprint, arXiv:1511.06434, 2016. http://arxiv.org/abs/1511.06434. |
| [4] | ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017: 2242-2251. |
| [5] | SUWAJANAKORN S, SEITZ S M, KEMELMACHER-SHLIZERMAN I. Synthesizing obama: learning lip sync from audio[J]. ACM Transactions on Graphics, 2017, 36(4):1-13. |
| [6] | THIES J, ZOLLHOFER M, STAMMINGER M, et al. Face2Face: real-time face capture and reenactment of RGB videos[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 2387-2395. |
| [7] | KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality of StyleGAN[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020: 8107-8116. |
| [8] | SIAROHIN A, LATHUILIÈRE S, TULYAKOV S, et al. First order motion model for image animation[J/OL]. arXiv Preprint, arXiv:2003.00196, 2020. http://arxiv.org/abs/2003.00196. |
| [9] | LIU W, PIAO Z, TU Z, et al. Liquid warping GAN with attention: a unified framework for human image synthesis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 5115-5133. |
| [10] | PRAJWAL K R, MUKHOPADHYAY R, NAMBOODIRI V P, et al. A lip sync expert is all you need for speech to lip generation in the wild[C]// Proceedings of the 28th ACM International Conference on Multimedia. Seattle: ACM, 2020: 484-492. |
| [11] | WU F, LIU L, HAO F, et al. Text-to-image synthesis based on object-guided joint-decoding transformer[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans: IEEE, 2022: 18092-18101. |
| [12] | SAUER A, KARRAS T, LAINE S, et al. StyleGAN-T: unlocking the power of GANs for fast large-scale text-to-image synthesis[C]// Proceedings of the 40th International Conference on Machine Learning. PMLR, 2023: 30105-30118. |
| [13] | LIU X, REN J, SIAROHIN A, et al. HyperHuman: hyper-realistic human generation with latent structural diffusion[J/OL]. arXiv Preprint, arXiv:2310.08579, 2024. http://arxiv.org/abs/2310.08579. |
| [14] | WANG Q, BAI X, WANG H, et al. InstantID: zero-shot identity-preserving generation in seconds[J/OL]. arXiv Preprint, arXiv:2401.07519, 2024. http://arxiv.org/abs/2401.07519. |
| [15] | ZHANG C, WANG C, ZHANG J, et al. DREAM-Talk: diffusion-based realistic emotional audio-driven method for single image talking face generation[J/OL]. arXiv Preprint, arXiv:2312.13578, 2023. http://arxiv.org/abs/2312.13578. |
| [16] | LI Y, LYU S. Exposing deepfake videos by detecting face warping artifacts[J/OL]. arXiv Preprint, arXiv:1811.00656, 2019. http://arxiv.org/abs/1811.00656. |
| [17] | LI Y, CHANG M C, LYU S. In ictu oculi:exposing AI created fake videos by detecting eye blinking[C]// 2018 IEEE International Workshop on Information Forensics and Security (WIFS). Hong Kong: IEEE, 2018: 1-7. |
| [18] | AGARWAL S, FARID H, FRIED O, et al. Detecting deep-fake videos from phoneme-viseme mismatches[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle: IEEE, 2020: 2814-2822. |
| [19] | LIU H, LI X, ZHOU W, et al. Spatial-phase shallow learning:rethinking face forgery detection in frequency domain[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021: 772-781. |
| [20] | CIFTCI U A, DEMIR I, YIN L. FakeCatcher: detection of synthetic portrait videos using biological signals[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024: 1. DOI:10.1109/TPAMI.2020.3009287. |
| [21] | HALIASSOS A, VOUGIOUKAS K, PETRIDIS S, et al. Lips don’t lie: a generalisable and robust approach to face forgery detection[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021: 5037-5047. |
| [22] | ZHAO H, WEI T, ZHOU W, et al. Multi-attentional deepfake detection[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021: 2185-2194. |
| [23] | WANG Z, BAO J, ZHOU W, et al. DIRE for diffusion-generated image detection[C]// 2023 IEEE/CVF International Conference on Computer Vision (ICCV). Paris: IEEE, 2023: 22388-22398. |
| [24] | ZHANG R, WANG H, LIU H, et al. Generalized face forgery detection with self-supervised face geometry information analysis network[J]. Applied Soft Computing, 2024, 166: 112143. |
| [25] | NAGRANI A, CHUNG J S, ZISSERMAN A. VoxCeleb: a large-scale speaker identification dataset[C]// Interspeech 2017. ISCA, 2017: 2616-2620. |
| [26] | CHUNG J S, NAGRANI A, ZISSERMAN A. VoxCeleb2: deep speaker recognition[C]// Interspeech 2018. ISCA, 2018: 1086-1090. |
| [27] | KHALID H, TARIQ S, KIM M, et al. FakeAVCeleb: a novel audio-video multimodal deepfake dataset[J/OL]. arXiv Preprint, arXiv:2108. 05080, 2022. http://arxiv.org/abs/2108.05080. |
| [28] | KOWALSKI M. FaceSwap[EB/OL]. 2020[2025-02-20]. https://github.com/deepfakes/faceswap. |
| [29] | Iperov. DeepFaceLab[EB/OL]. (2020-04-09)[2025-02-20]. https://github.com/iperov/DeepFaceLab. |
| [30] | Rudrabha. Wav2Lip[EB/OL]. (2020-08-18)[2025-02-20]. https://github.com/Rudrabha/Wav2Lip. |
| [31] | YANG W, ZHOU X, CHEN Z, et al. AVoiD-DF: audio-visual joint learning for detecting deepfake[J]. IEEE Transactions on Information Forensics and Security, 2023, 18: 2015-2029. |
| [32] | SANDERSON C. The VIDTIMIT database[EB/OL]. 2002[2025-02-20]. https://infoscience.epfl.ch/record/82748?ln=fr&v=%5B%27pdf%27%5D. |
| [33] | JIA Y, ZHANG Y, WEISS R J, et al. Transfer learning from speaker verification to multispeaker text-to-speech synthesis[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 4485-4495. |
| [34] | HOU Y, FU H, CHEN C, et al. PolyGlotFake: a novel multilingual and multimodal deepfake dataset[C]// Pattern Recognition, 27th International Conference. Kolkata: Springer Nature Switzerland, 2025: 180-193. |
| [35] | LI J, TU W, XIAO L. FreeVC: Towards high-quality text-free one-shot voice conversion[C]// ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes Island: IEEE, 2023: 1-5. |
| [36] | CHEN S, WANG C, WU Y, et al. Neural codec language models are zero-shot text to speech synthesizers[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2025, 33: 705-718. |
| [37] | CHANDRA N A, MURTFELDT R, QIU L, et al. Deepfake-eval-2024:a multi-modal in-the-wild benchmark of deepfakes circulated in 2024[J/OL]. arXiv Preprint, arXiv:2503.02857, 2025. http://arxiv.org/abs/2503.02857. |
| [38] | QI P, BU Y, CAO J, et al. FakeSV: a multimodal benchmark with rich social context for fake news detection on short video platforms[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(12): 14444-14452. |
| [39] | CAI Z, STEFANOV K, DHALL A, et al. Do you really mean that? content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization[C]// 2022 International Conference on Digital Image Computing:Techniques and Applications (DICTA). Sydney: IEEE, 2022: 1-10. |
| [40] | SHAO R, WU T, LIU Z. Detecting and grounding multi-modal media manipulation[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver: IEEE, 2023: 6904-6913. |
| [41] | LIAN J, LIU L, WANG Y, et al. A large-scale interpretable multi-modality benchmark for facial image forgery localization[J/OL]. arXiv Preprint, arXiv:2412.19685, 2024. http://arxiv.org/abs/2412.19685. |
| [42] | LEE C H, LIU Z, WU L, et al. MaskGAN:towards diverse and interactive facial image manipulation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020: 5548-5557. |
| [43] | NVIDIA Labs. FFHQ[EB/OL]. (2019-03-19)[2025-02-20]. https://github.com/NVlabs/ffhq-dataset. |
| [44] | XU Z, ZHANG X, LI R, et al. FakeShield: explainable image forgery detection and localization via multi-modal large language models[J/OL]. arXiv preprint: arXiv: 2410.02761, 2025. http://arxiv.org/abs/2410.02761. |
| [45] | OpenAI. GPT-4o[EB/OL]. (2024-05-13)[2025-02-20]. https://openai.com. |
| [46] | LEWIS J K, TOUBAL I E, CHEN H, et al. Deepfake video detection based on spatial, spectral, and temporal inconsistencies using multimodal deep learning[C]// 2020 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). Washington: IEEE, 2020: 1-9. |
| [47] | OpenDataLab. DeepFake detection challenge (DFDC)[EB/OL]. (2020-06-09)[2025-02-20]. https://opendatalab.org.cn/OpenDataLab/DFDC. |
| [48] | WANG J, WU Z, OUYANG W, et al. M2TR: multi-modal multi-scale transformers for deepfake detection[C]// Proceedings of the 2022 International Conference on Multimedia Retrieval. Newark: ACM, 2022: 615-623. |
| [49] | SALVI D, LIU H, MANDELLI S, et al. A robust approach to multimodal deepfake detection[J]. Journal of Imaging, 2023, 9(6): 122. |
| [50] | ANAS R M, MAHMOOD M K. MultimodalTrace:deepfake detection using audiovisual representation learning[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Vancouver: IEEE, 2023: 993-1000. |
| [51] | MUPPALLA S, JIA S, LYU S. Integrating audio-visual features for multimodal deepfake detection[C]// 2023 IEEE MIT Undergraduate Research Technology Conference (URTC). Cambridge: IEEE, 2023: 1-5. |
| [52] | SHIRLEY C P, JINGLE B J, ABISHA M B, et al. Deepfake detection using multi-modal fusion combined with attention mechanism[C]// 2024 4th International Conference on Sustainable Expert Systems (ICSES). IEEE, 2024: 1194-1199. |
| [53] | GANDHI K, KULKARNI P, SHAH T, et al. A multimodal framework for deepfake detection[J/OL]. Journal of Electrical Systems, 2024. DOI:10.53555/jes.v20i10s.6126. |
| [54] | NIE F, NI J, ZHANG J, et al. FRADE: Forgery-aware audio-distilled multimodal learning for deepfake detection[C]// Proceedings of the 32nd ACM International Conference on Multimedia. Melbourne: ACM, 2024: 6297-6306. |
| [55] | WU Y, ZHAN P, ZHANG Y, et al. Multimodal fusion with co-attention networks for fake news detection[C]// Findings of the Association for Computational Linguistics:ACL-IJCNLP 2021. Online: Association for Computational Linguistics, 2021: 2560-2569. |
| [56] | ZENGIN A Z, GÜNDÜZ Ö. Identifying topical influencers on Twitter based on user behavior and network topology[J]. Knowledge-Based Systems, 2018, 141: 211-221. |
| [57] | CAO Q, SHEN H, CEN K, et al. DeepHawkes: bridging the gap between prediction and understanding of information cascades[C]// Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Singapore: ACM, 2017: 1149-1158. |
| [58] | JABEEN S, KHAN U G, IQBAL R, et al. A deep multimodal system for provenance filtering with universal forgery detection and localization[J]. Multimedia Tools and Applications, 2021, 80(11): 17025-17044. |
| [59] | DONG J, WANG W, TAN T. CASIA image tampering detection evaluation database[C]// 2013 IEEE China Summit and International Conference on Signal and Information Processing. Beijing: IEEE, 2013: 422-426. |
| [60] | ZHANG R, WANG H, DU M, et al. UMMAFormer: a universal multimodal-adaptive transformer framework for temporal forgery localization[C]// Proceedings of the 31st ACM International Conference on Multimedia. Ottawa: ACM, 2023: 8749-8759. |
| [61] | TRIARIDIS K, MEZARIS V. Exploring multi-modal fusion for image manipulation detection and localization[C]// MultiMedia Modeling, 30th International Conference. Amsterdam: Springer Nature Switzerland, 2024: 198-211. |
| [62] | SHUAI C, ZHONG J, WU S, et al. Locate and verify: a two-stream network for improved deepfake detection[C]// Proceedings of the 31st ACM International Conference on Multimedia. Ottawa: ACM, 2023: 7131-7142. |
| [63] | ROSSLER A, COZZOLINO D, VERDOLIVA L, et al. FaceForensics++:learning to detect manipulated facial images[C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul: IEEE, 2019: 1-11. |
| [64] | LI Y, LIU X, WANG X, et al. FakeBench: probing explainable fake image detection via large multimodal models[J/OL]. arXiv Preprint, arXiv:2404.13306, 2024. http://arxiv.org/abs/2404.13306. |
| [65] | HUANG Z, HU J, LI X, et al. SIDA: social media image deepfake detection, localization and explanation with large multimodal model[J/OL]. arXiv Preprint, arXiv:2412.04292, 2025. http://arxiv.org/abs/2412.04292. |
| [66] | HE X, ZHOU Y, FAN B, et al. VLForgery face triad: detection, localization and attribution via multimodal large language models[J/OL]. arXiv Preprint, arXiv:2503.06142, 2025. http://arxiv.org/abs/2503.06142. |
| [67] | LIU J, ZHANG F, ZHU J, et al. ForgeryGPT: multimodal large language model for explainable image forgery detection and localization[J/OL]. arXiv Preprint, arXiv:2410.10238, 2025. http://arxiv.org/abs/2410.10238. |
| [68] | YU P, FEI J, GAO H, et al. Unlocking the capabilities of vision-language models for generalizable and explainable deepfake detection[J/OL]. arXiv Preprint, arXiv:2503.14853, 2025. http://arxiv.org/abs/2503.14853. |
| [69] | WEN S, YE J, FENG P, et al. Spot the fake: large multimodal model-based synthetic image detection with artifact explanation[J/OL]. arXiv Preprint, arXiv:2503.14905, 2025. http://arxiv.org/abs/2503.14905. |
| [70] | ZHENG L, CHIANG W L, SHENG Y, et al. Judging LLM-as-a-judge with MT-bench and chatbot arena[C]// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2023: 46595-46623. |
| [71] | HAQ I U, MALIK K M, MUHAMMAD K. Multimodal neurosymbolic approach for explainable deepfake detection[J]. ACM Transactions on Multimedia Computing Communications and Applications, 2024, 20(11): 1-16. |
| [72] | ARUNA S, MATTHEW G, ROSALIND P, et al. The presidential deepfakes dataset[EB/OL]. (2021-09-01)[2025-02-20]. https://www.media.mit.edu/publications/presidential-deepfakes-dataset. |
| [1] | LI Jia, SONG Delong. Research on broadband and narrowband converged technology based on trunking private network [J]. Information and Communications Technology and Policy, 2025, 51(6): 73-79. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||
2020 © Information and Communications Technology and Policy
Address: 52 Huayuan North Road, Beijing, China Phone: 010-62300192 E-mail: ictp@caict.ac.cn
