| [1] |
张钹, 朱军, 苏航. 迈向第三代人工智能[J]. 中国科学:信息科学, 2020, 50(9):1281-1302.
|
| [2] |
BROOKS R A. Intelligence without representation[J]. Artificial Intelligence, 1991, 47(1-3):139-159.
|
| [3] |
刘华平, 郭迪, 孙富春, 等. 基于形态的具身智能研究:历史回顾与前沿进展[J]. 自动化学报, 2023, 49(6):1131-1154.
|
| [4] |
沈甜雨, 陶子锐, 王亚东, 等. 具身智能研究的关键问题:自主感知、行动与进化[J]. 自动化学报, 2025, 51(1): 43-71.
|
| [5] |
王文晟, 谭宁, 黄凯, 等. 基于大模型的具身智能系统综述[J]. 自动化学报, 2025, 51(1):1-19.
|
| [6] |
YANG Z, LI L, LIN K, et al. The dawn of LMMs: preliminary explorations with GPT-4V(ision)[J]. arXiv Preprint, arXiv:2309.17421, 2024.
|
| [7] |
WANG J, SHI E, HU H, et al. Large language models for robotics: opportunities, challenges, and perspectives[J]. Journal of Automation and Intelligence, 2025, 4(1):52-64.
|
| [8] |
HU Y, LIN F, ZHANG T, et al. Look before you leap: unveiling the power of GPT-4V in robotic vision-language planning[J]. arXiv Preprint, arXiv:2311.17842, 2025.
|
| [9] |
LIU H, LI C, WU Q, et al. Visual instruction tuning[J]. arXiv Preprint, arXiv:2304.08485, 2023.
|
| [10] |
SHAFIULLAH N M M, PAXTON C, PINTO L, et al. CLIP-fields: weakly supervised semantic fields for robotic memory[J]. arXiv Preprint, arXiv:2210.05663, 2023.
|
| [11] |
JAMES S, WADA K, LAIDLOW T, et al. Coarse-to-fine q-attention: efficient learning for visual robotic manipulation via discretisation: proceedings[J]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022:13739-13748.
|
| [12] |
HUANG C, MEES O, ZENG A, et al. Audio visual language maps for robot navigation[J]. arXiv Preprint, arXiv:2303.07522, 2023.
|
| [13] |
QIN M, LI W, ZHOU J, et al. LangSplat: 3D language gaussian splatting: proceedings[J]. arXiv Preprint, arXiv:2312.16084, 2024.
|
| [14] |
SHORINWA O, TUCKER J, SMITH A, et al. Splat-MOVER: multi-stage, open-vocabulary robotic manipulation via editable gaussian splatting[J]. arXiv Preprint, arXiv:2405.04378, 2024.
|
| [15] |
兰沣卜, 赵文博, 朱凯, 等. 基于具身智能的移动操作机器人系统发展研究[J]. 中国工程科学, 2024, 26(1):139-148.
doi: 10.15302/J-SSCAE-2024.01.010
|
| [16] |
HUANG W, WANG C, ZHANG R, et al. Voxposer: composable 3d value maps for robotic manipulation with language models[J]. arXiv Preprint, arXiv:2307.05973, 2023.
|
| [17] |
ZHEN H, QIU X, CHEN P, et al. 3D-VLA: a 3d vision-language-action generative world model[J]. arXiv Preprint, arXiv:2403.09631, 2024.
|
| [18] |
WU J, YIN S, FENG N, et al. Ivideogpt: interactive videogpts are scalable world models[J]. Advances in Neural Information Processing Systems, 2024,37:68082-68119.
|
| [19] |
ZHAO M, JAIN S, SONG S. Roco: dialectic multi-robot collaboration with large language models[J]. arXiv Preprint, arXiv:2307.04738, 2023.
|
| [20] |
WU H, GAO W, XU X. Solder joint recognition using mask R-CNN method[J]. IEEE Transactions on Components, Packaging and Manufacturing Technology, 2020, 10(3):525-530.
|
| [21] |
国金证券. 2025垂直领域具身智能机器人产业化落地现状及潜力应用场景分析报告[R], 2025.
|
| [22] |
白入文, 张蔚敏, 石霖, 等. 基于具身智能的智能制造创新体系与应用模式研究[J]. 数字化转型, 2025, 2(5):4-14.
|