ChatGPT-like large language models for testing and verification of autonomous intelligent systems: a systematic review

Dun Li; Ruiguan Lin; Zisheng Wang; Yan-Fu Li

doi:10.1088/3050-2454/ae524c

您当前的位置：

首页 >

文章列表页 >

ChatGPT-like large language models for testing and verification of autonomous intelligent systems: a systematic review

Topical Reviews | 更新时间：2026-04-14

- ChatGPT-like large language models for testing and verification of autonomous intelligent systems: a systematic review
- 用于自主智能系统测试与验证的类ChatGPT大语言模型：系统综述
- Journal of Reliability Science and Engineering Pages: 1-24(2026)
- 作者机构：
  
  Department of Industrial Engineering, Tsinghua University, 100084Beijing, People's Republic of China
- 作者简介：
  
  [ "Dun Li received a PhD degree in computer science from Institut Polytechnique de Paris, France, in 2025. He is currently a Postdoctoral Researcher with the Department of Industrial Engineering, Tsinghua University, China. His research interests include large language models, Industrial Internet of Things, digital twin, and system reliability." ]
  [ "Ruiguan Lin received a PhD degree in engineering from Nanjing University of Aeronautics and Astronautics, China, in 2024. He is currently a Postdoctoral Researcher and Assistant Researcher with the Department of Industrial Engineering, Tsinghua University, China. His research interests include intelligent operation and maintenance of high-end equipment, reliability assessment and management for civil aviation systems, and maintenance decision-making for civil aircraft structures." ]
  [ "Zisheng Wang received a BS degree in Mechanical Engineering from Northeastern University, Shenyang, China, in 2018, and a PhD degree in Mechanical Engineering from Huazhong University of Science and Technology, Wuhan, China, in 2023. He is currently an Assistant Research Fellow with the Department of Industrial Engineering, Tsinghua University, China, supported by the Shuimu Tsinghua Scholar Talent Program. His research interests include intelligent monitoring and maintenance for high-end equipment and multimodal large language models." ]
  [ "Yan-Fu Li (Senior Member, IEEE) was a Faculty Member with the Laboratory of Industrial Engineering, CentraleSupélec, University of Paris-Saclay, Gif-sur-Yvette, France, from 2011 to 2016. He is currently a Professor with the Department of Industrial Engineering, Tsinghua University, Beijing, China. He has led or participated in several research projects supported by the European Union, France, and Chinese governmental funding agencies, as well as various industrial partners. He has authored or coauthored more than 100 publications in international journals, conference proceedings, and books. His research interests include reliability, availability, maintainability, and safety (RAMS) assessment and optimization for industrial systems. Dr. Li is an Associate Editor of IEEE Transactions on Reliability." ]
- 基金信息：
- DOI：10.1088/3050-2454/ae524c
  CLC：
- Received：14 August 2025，
  
  Revised：2026-01-28，
  
  Accepted：15 March 2026，
  
  Online First：31 March 2026，
  
  Published：2026-06
- 稿件说明：
移动端阅览
Dun Li, Ruiguan Lin, Zisheng Wang, et al. 用于自主智能系统测试与验证的类ChatGPT大语言模型：系统综述[J]. 可靠性科学与工程学报（英文）, 2026, 2: 022001.

Dun Li, Ruiguan Lin, Zisheng Wang, et al. ChatGPT-like large language models for testing and verification of autonomous intelligent systems: a systematic review[J]. Journal of Reliability Science and Engineering, 2026, 2: 022001.
Dun Li, Ruiguan Lin, Zisheng Wang, et al. 用于自主智能系统测试与验证的类ChatGPT大语言模型：系统综述[J]. 可靠性科学与工程学报（英文）, 2026, 2: 022001. DOI： 10.1088/3050-2454/ae524c.

Dun Li, Ruiguan Lin, Zisheng Wang, et al. ChatGPT-like large language models for testing and verification of autonomous intelligent systems: a systematic review[J]. Journal of Reliability Science and Engineering, 2026, 2: 022001. DOI： 10.1088/3050-2454/ae524c.

摘要

本文系统综述了类ChatGPT大语言模型（LLMs）如何促进自主智能系统（AIS）的测试与验证。基于生成式推理的最新进展，本研究整合了120篇同行评审文献中的证据，考察了四个关键领域：测试场景生成、漏洞检测、形式化验证及实时监控。通过对模糊测试、符号执行与强化学习的比较分析，本文揭示了LLMs在提升自动化程度、语义覆盖率与适应性的同时，在基准完整性、可解释性与资源效率方面的局限性。综述引入了结构化表格来总结代表性数据集、特定领域应用以及传统测试方法与基于LLM的测试方法之间的比较见解。本文分析了包括基准缺失、可解释性不足和伦理风险在内的主要挑战，并探讨了混合验证框架与数据质量增强等新兴研究方向。本研究旨在弥合AI安全工程与大模型推理之间的概念与实践差距，为将LLMs集成到未来AIS验证流程中提供参考路线图。

Abstract

This paper provides a systematic review of how ChatGPT-like large language models (LLMs) contribute to the testing and verification of autonomous intelligent systems (AIS). Building upon recent advances in generative reasoning

this study integrates evidence from 120 peer-reviewed works to examine four key domains: test scenario generation

vulnerability detection

formal verification

and real-time monitoring. Comparative analysis across fuzz testing

symbolic execution

and reinforcement learning highlights how LLMs improve automation

semantic coverage

and adaptability while revealing limitations in benchmark completeness

interpretability

and resource efficiency. The review introduces structured tables summarizing representative datasets

domain-specific applications

and comparative insights between traditional and LLM-based testing approaches. Key challenges-including benchmarking gaps

explainability deficits

and ethical risks-are analyzed alongside emerging research directions such as hybrid verification frameworks and data quality enhancement. This work aims to bridge conceptual and practical gaps between AI safety engineering and large-model reasoning

offering a reference roadmap for integrating LLMs into future AIS verification pipelines.

关键词

Keywords

references

Chen J , Sun J and Wang G 2022 From unmanned systems to autonomous intelligent systems Engineering 12 16 – 19

Leikas J , Koivisto R and Gotcheva N 2019 Ethical framework for designing autonomous intelligent systems J. Open Innov. 5 18

Long L N , Hanford S D , Janrathitikarn O , Sinsley G L and Miller J A 2007 A review of intelligent systems software for autonomous vehicles 2007 IEEE Symp. on Computational Intelligence in Security and Defense Applications ( IEEE ) pp 69 – 76

Chen Y , Chen S , Zhang T , Zhang S and Zheng N 2018 Autonomous vehicle testing and validation platform: Integrated simulation system with hardware in the loop 2018 IEEE Intelligent Vehicles Symp. (IV) ( IEEE ) pp 949 – 56

Illiashenko O , Kharchenko V , Babeshko I , Fesenko H and Di Giandomenico F 2023 Security-informed safety analysis of autonomous transport systems considering AI-powered cyberattacks and protection Entropy 25 1123

Aghababaeyan Z , Abdellatif M , Briand L , Ramesh S and Bagherzadeh M 2023 Black-box testing of deep neural networks through test case diversity IEEE Trans. Softw. Eng. 49 3182 – 204

Păsăreanu C S , Gopinath D and Yu H 2019 Compositional verification for autonomous systems with deep learning components: white paper Safe, Autonomous and Intelligent Vehicles pp 187 – 97

Wang H , Li Y-F and Ren J 2024 Machine learning for fault diagnosis of high-speed train traction systems: a review Front. Eng. Manage. 11 62 – 78

Dennis L A , Fisher M , Lincoln N K , Lisitsa A and Veres S M 2016 Practical verification of decision-making in agent-based autonomous systems Autom. Softw. Eng. 23 305 – 59

Zheng D , Fu X , Liu X , Xing L and Peng R 2024 Modeling and analysis of cascading failures in industrial internet of things considering sensing-control flow and service community IEEE Trans. Reliab. 74 2723 – 37

Li D , Wang H and Li Y-F 2024 Robust anomaly detection in unmanned ship systems based on large language models ESREL 2024 Collection of Extended Abstracts Part 1 p 47

Liu Y et al 2023 Summary of ChatGPT-related research and perspective towards the future of large language models Meta-Radiology 1 100017

Collins K M , Wong C , Feng J , Wei M and Tenenbaum J B 2022 Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks (arXiv:2205.05718)

Xie C and Zou D 2024 A human-like reasoning framework for multi-phases planning task with large language models (arXiv:2405.18208)

Hagendorff T , Fabi S and Kosinski M 2023 Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT Nat. Comput. Sci. 3 833 – 8

Li D , Wang H , Lin R , Li Y , Ye J , Miao D , Lin K and Zhang H 2024 Pretrained large models in telecommunications: a survey of technologies and applications 2024 IEEE Int. Conf. on Progress in Informatics and Computing (PIC) ( IEEE ) pp 117 – 21

Li Y , Katsumata K , Javanmardi E and Tsukada M 2024 Large language models for human-like autonomous driving: a survey 2024 IEEE 27th Int. Conf. on Intelligent Transportation Systems (ITSC) ( IEEE ) pp 439 – 46

Tian H , Reddy K , Feng Y , Quddus M , Demiris Y and Angeloudis P 2024 Large (vision) language models for autonomous vehicles: current trends and future directions IEEE Trans. Intell. Transp. Syst. 27 187 – 210

Chatila R and Havens J C 2019 The IEEE global initiative on ethics of autonomous and intelligent systems Robotics and Well-Being pp 11 – 6

Acharya D B , Kuppan K and Divya B 2025 Agentic AI: autonomous intelligence for complex goals–a comprehensive survey IEEE Access 13 18912 – 36

Pandey A K and Roy S S 2024 Extractive question answering over ancient scriptures texts using generative AI and natural language processing techniques IEEE Access 12 101197 – 209

Achiam J et al 2023 Gpt-4 technical report (arXiv:2303.08774)

Google DeepMind Team 2024 Introducing Gemini 2.0: the next generation of multimodal AI models (available at: https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/ https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/ ) (Accessed 23 October 2025 )

Anthropic 2024 https://www.anthropic.com/news/claude-3-family https://www.anthropic.com/news/claude-3-family (Accessed 19 March 2026 ) Introducing the next generation of Claude

Liu A et al 2024 Deepseek-v2: a strong, economical, and efficient mixture-of-experts language model (arXiv:2405.04434)

Liu A et al 2024 Deepseek-v3 technical report (arXiv:2412.19437)

Wu Z et al 2024 Deepseek-vl2: mixture-of-experts vision-language models for advanced multimodal understanding (arXiv:2412.10302)

xAI Team 2024 Grok 1.5: advancing reasoning and efficiency in the xAI model family (available at: https://x.ai/news/grok-1.5 https://x.ai/news/grok-1.5 ) (Accessed 23 October 2025 )

Jiang A Q et al 2024 Mixtral of experts (arXiv:2401.04088)

Meta AI Research Team 2024 The llama 3 herd of models (available at: https://ai.meta.com/research/publications/the-llama-3-herd-of-models/ https://ai.meta.com/research/publications/the-llama-3-herd-of-models/ ) (Accessed 23 October 2025 )

Hui B et al 2024 Qwen2.5-coder technical report (arXiv:2409.12186)

Bai S et al 2025 Qwen2.5-vl technical report (arXiv:2502.13923)

Zhou H et al 2024 Large language model (11m) for telecommunications: a comprehensive survey on principles, key techniques and opportunities IEEE Commun. Surv. Tutor. 27 1955 – 2005

Zhang H et al 2024 A careful examination of large language model performance on grade school arithmetic Advances in Neural Information Processing Systems vol 37 pp 46819 – 36

Li Y-F , Wang H and Sun M 2024 ChatGPT-like large-scale foundation models for prognostics and health management: a survey and roadmaps Reliab. Eng. Syst. Saf. 243 109850

Clarke E , Garlan D , Krogh B , Simmons R and Wing J 2001 Formal verification of autonomous systems NASA intelligent systems program

Ruospo A , Cantoro R , Sanchez E , Schiavone P D , Garofalo A and Benini L 2019 On-line testing for autonomous systems driven by risc-v processor design verification 2019 IEEE Int. Symp. on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT) ( IEEE ) pp 1 – 6

Ferrando A , Dennis L A , Ancona D , Fisher M and Mascardi V 2018 Verifying and validating autonomous systems: towards an integrated approach Runtime Verification: 18th Int. Conf., RV 2018, (Limassol, Cyprus, 10 November–13 November 2018) ( Proc. 18 ) ( Springer ) pp 263 – 81

Wang X , Guo Y and Gao Y 2024 Unmanned autonomous intelligent system in 6g non-terrestrial network Information 15 38

Sandhya Devi R S and Varshni S D 2025 Embedded large language models for enhanced human-machine interface in autonomous vehicles 2025 Int. Conf. on Multi-Agent Systems for Collaborative Intelligence (ICMSCI) ( IEEE ) pp 1143 – 50

Thapa S and Adhikari S 2024 Leveraging ChatGPT-like large language models for Alzheimer's disease: enhancing care, advancing research and overcoming challenges Smart Healthcare Systems ( CRC Press ) pp 265 – 75

Yang Y , Zhang Q , Li Ci , Simões Marta D , Batool N and Folkesson J 2024 Human-centric autonomous systems with LLMs for user command reasoning Proc. IEEE/CVF Winter Conf. on Applications of Computer Vision pp 988 – 94

Mahmud D , Hajmohamed H , Almentheri S , Alqaydi S , Aldhaheri L , Khalil R A and Saeed N 2025 Integrating LLMs with its: recent advances, potentials, challenges and future directions IEEE Trans. Intell. Transp. Syst. 26 5674 – 709

Chang C , Wang S , Zhang J , Ge J and Li Li 2024 LLMScenario: large language model driven scenario generation IEEE Trans. Syst. Man Cybern. 54 6581 – 94

Tang S , Zhang Z , Zhou J , Lei L , Zhou Y and Xue Y 2024 Legend: a top-down approach to scenario generation of autonomous driving systems assisted by large language models Proc. 39th IEEE/ACM Int. Conf. on Automated Software Engineering pp 1497 – 508

González-Santamarta M A , Rodríguez-Lera F J , Manuel Guerrero-Higueras Angel and Matellán-Olivera V 2023 Integration of large language models within cognitive architectures for autonomous robots (arXiv:2309.14945)

Xu F F , Alon U , Neubig G and Hellendoorn V J 2022 A systematic evaluation of large language models of code Proc. 6th ACM SIGPLAN Int. Symp. on Machine Programming pp 1 – 10

Ma Y et al 2024 Lampilot: an open benchmark dataset for autonomous driving with language model programs Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition pp 15141 – 51

Nouri A , Cabrero-Daniel B , Fei Z , Ronanki K , Sivencrona H and Berger C 2025 Large language models in code co-generation for safe autonomous vehicles (arXiv:2505.19658)

Luckcuck M 2023 Using formal methods for autonomous systems: five recipes for formal verification Proc. Inst. Mech. Eng. O 237 278 – 92

Grossberg S 2025 Neural network models of autonomous adaptive intelligence and artificial general intelligence: how our brains learn large language models and their meanings Front. Syst. Neurosci. 19 1630151

Lin Y , Wang X , Yang J and Wang S 2024 Core technology topic identification and evolution analysis based on patent text mining-a case study of unmanned ship Appl. Sci. 14 4661

Bhat A , Mondal A and Tripathy A 2025 LLM agents for Internet of Things (IoT) applications

Palin R , Ward D , Habli I and Rivett R 2011 Iso 26262 safety cases: compliance and assurance 6th IET Int. Conf. on System Safety 2011 ( IET ) p B12

Alaqail H and Ahmed S 2018 Overview of software testing standard ISO/IEC/IEEE 29119 Int. J. Comput. Sci. Netw. Secur. 18 112 – 6

Gosavi M A , Rhoades B B and Conrad J M 2018 Application of functional safety in autonomous vehicles using iso 26262 standard: a survey SoutheastCon 2018 ( IEEE ) pp 1 – 6

Song Q , Engström E and Runeson P 2021 Concepts in testing of autonomous systems: Academic literature and industry practice 2021 IEEE/ACM 1st Workshop on AI Engineering-Software Engineering for AI (WAIN) ( IEEE ) pp 74 – 81

Madhavan R , Lakaemper R and Kalmár-Nagy T 2009 Benchmarking and standardization of intelligent robotic systems 2009 Int. Conf. on Advanced Robotics ( IEEE ) pp 1 – 7

Chen J and Lu S 2024 An advanced driving agent with the multimodal large language model for autonomous vehicles 2024 IEEE Int. Conf. on Mobility, Operations, Services and Technologies (MOST) ( IEEE ) pp 1 – 11

Sheng Z , Xu Q , Huang J , Woodcock M , Huang H , Donaldson A F , Gu G and Huang J 2025 All you need is a fuzzing brain: an LLM-powered system for automated vulnerability detection and patching (arXiv:2509.07225)

Cheng Y , Kang H J , Shar L K , Dong C , Shi Z , Lv S and Sun L 2025 Towards reliable LLM-driven fuzz testing: vision and road ahead (arXiv:2503.00795)

Black G , Vaidyan V and Comert G 2024 Evaluating large language models for enhanced fuzzing: an analysis framework for LLM-driven seed generation IEEE Access 12 156065 – 81

Ishida S , Corrado G , Fedoseev G , Yeo H , Russell L , Shotton J , Henriques J F and Hu A 2024 Langprop: a code optimization framework using large language models applied to driving ICLR 2024 Workshop on Large Language Model (LLM) Agents

Liang X , Song S , Zheng Z , Wang H , Yu Q , Li X , Li R-H , Xiong F and Li Z 2024 Internal consistency and self-feedback in large language models: a survey (arXiv:2407.14507)

Razdan R , ilhan Akbaş M , Sell R , Bellone M , Menase M and Malayjerdi M 2023 Polyverif: an open-source environment for autonomous vehicle validation and verification research acceleration IEEE Access 11 28343 – 54

Bandi A , Nukala H S T , Tatavarthi B and Boggavarapu A 2025 Automated test case generation for software testing using generative AI Int. Conf. on Computers and Their Applications ( Springer ) pp 78 – 87

Chen Y , Ye Y , Chen Z , Zhang C and Ang M H 2024 Aro: large language model supervised robotics text2skill autonomous learning (arXiv:2403.15834)

Li F , Wang X , Li B , Wu Y , Wang Y and Yi X 2024 A study on training and developing large language models for behavior tree generation (arXiv:2401.08089)

Mitra C , Miroyan M , Jain R , Kumud V , Ranade G and Norouzi N 2024 Retllm-e: retrieval-prompt strategy for question-answering on student discussion forums Proc. AAAI Conf. on Artificial Intelligence vol 38 pp 23215 – 23

Tian H , Reddy K , Feng Y , Quddus M , Demiris Y and Angeloudis P 2024 Enhancing autonomous vehicle training with language model integration and critical scenario generation (arXiv:2404.08570)

Wang J , Huang Y , Chen C , Liu Z , Wang S and Wang Q 2024 Software testing with large language models: survey, landscape and vision IEEE Trans. Softw. Eng. 50 911 – 36

Guo Z et al 2023 Evaluating large language models: a comprehensive survey (arXiv:2310.19736)

McIntosh T R , Susnjak T , Liu T , Watters P and Halgamuge M N 2024 Inadequacies of large language model benchmarks in the era of generative artificial intelligence (arXiv:2402.09880)

Chib P S and Singh P 2024 LG-Traj: LLM guided pedestrian trajectory prediction (arXiv:2403.08032)

Rasheed Z , Waseem M , Systä K and Abrahamsson P 2024 Large language model evaluation via multi AI agents: preliminary results (arXiv:2404.01023)

Su J , Jiang C , Jin X , Qiao Y , Xiao T , Ma H , Wei R , Jing Z , Xu J and Lin J 2024 Large language models for forecasting and anomaly detection: a systematic literature review (arXiv:2402.10350)

Merten G , Dejaegere G and Sakr M 2025 Using LLMs for analyzing AIS data (arXiv:2504.07557)

Tian X , Gu J , Li B , Liu Y , Hu C , Wang Y , Zhan K , Jia P , Lang X and Zhao H 2024 Drivevlm: the convergence of autonomous driving and large vision-language models (arXiv:2402.12289)

Xu Z , Zhang Y , Xie E , Zhao Z , Guo Y , Wong K-Y K , Li Z and Zhao H 2024 DriveGPT4: interpretable end-to-end autonomous driving via large language model IEEE Robot. Autom. Lett.

Xu Z , Bai Y , Zhang Y , Li Z , Xia F , Wong K-Y K , Wang J and Zhao H 2025 DriveGPT4-v2: harnessing large language model capabilities for enhanced closed-loop autonomous driving Proc. Computer Vision and Pattern Recognition Conf. pp 17261 – 70

Cui Y , Huang S , Zhong J , Liu Z , Wang Y , Sun C , Li B , Wang X and Khajepour A 2023 Drivellm: charting the path toward full autonomous driving with large language models IEEE Trans. Intell. Veh. 9 1450 – 64

Wen L , Fu D , Li X , Cai X , Ma T , Cai P , Dou M , Shi B , He L and Qiao Y 2023 Dilu: a knowledge-driven approach to autonomous driving with large language models (arXiv:2309.16292)

Katzourakis D 2025 Systems engineering for autonomous vehicles; supervising AI using large language models (ssuperllm) (arXiv:2501.10839)

Fu D , Li X , Wen L , Dou M , Cai P , Shi B and Qiao Y 2024 Drive like a human: rethinking autonomous driving with large language models 2024 IEEE/CVF Winter Conf. on Applications of Computer Vision Workshops (WACVW) ( IEEE ) pp 910 – 9

Cui C , Yang Z , Zhou Y , Ma Y , Lu J , Li L , Chen Y , Panchal J and Wang Z 2024 Personalized autonomous driving with large language models: field experiments 2024 IEEE 27th Int. Conf. on Intelligent Transportation Systems (ITSC) ( IEEE ) pp 20 – 27

Li Y , Li L , Wu Z , Bing Z , Ai Y , Tian B , Xuanyuan Z , Knoll A C and Chen L 2024 Miningllm: towards mining 5.0 via large language models in autonomous driving and smart mining IEEE Trans. Intell. Veh. 1 – 12

Ping Y et al 2025 Multimodal large language models-enabled UAV swarm: towards efficient and intelligent autonomous aerial systems (arXiv:2506.12710)

Sezgin A 2025 Scenario-driven evaluation of autonomous agents: Integrating large language model for UAV mission reliability Drones 9 213

Tagliabue A , Kondo K , Zhao T , Peterson M , Tewari C T and How J P 2024 Real: resilience and adaptation using large language models on autonomous aerial robots 2024 IEEE 63rd Conf. on Decision and Control (CDC) ( IEEE ) pp 1539 – 46

Duvvuru V S A , Zhang B , Vierhauser M and Agrawal A 2025 LLM-agents driven automated simulation testing and analysis of small uncrewed aerial systems (arXiv:2501.11864)

Sautenkov O , Yaqoot Y , Mustafa M A , Batool F , Sam J , Lykov A , Wen C-Y and Tsetserukou D 2025 UAV-CodeAgents: scalable UAV mission planning via multi-agent react and vision-language reasoning (arXiv:2505.07236)

Cai S , Wu Y and Zhou L 2025 LLM-land: large language models for context-aware drone landing (arXiv:2505.06399)

Wu W , Li C , Wang X , Luo B and Liu Q 2025 Large language model guided progressive feature alignment for multimodal UAV object detection (arXiv:2503.06948)

Cai H , Dong J , Tan J , Deng J , Li S , Gao Z , Wang H , Su Z , Sumalee A and Zhong R 2025 FlightGPT: towards generalizable and interpretable UAV vision-and-language navigation with vision-language models (arXiv:2505.12835)

Lin F , Tian Y , Wang Y , Zhang T , Zhang X and Wang F-Y 2024 Airvista: empowering UAVs with 3D spatial reasoning abilities through a multimodal large language model agent 2024 IEEE 27th Int. Conf. on Intelligent Transportation Systems (ITSC) ( IEEE ) pp 476 – 81

Yao F , Yue Y , Liu Y , Sun X and Fu K 2024 Aeroverse: UAV-agent benchmark suite for simulating, pre-training, finetuning, and evaluating aerospace embodied world models (arXiv:2408.15511)

Li H , Liu X and Li G 2024 A benchmark for UAV-view natural language-guided tracking Electronics 13 1706

Wen Z , Zhao J , Zhang A , Bi W , Kuang B , Su Y and Wang R 2025 BiDGCNLLM: a graph–language model for drone state forecasting and separation in urban air mobility using digital twin-augmented remote ID data Drones 9 508

Khatiri S , Di Sorbo A , Zampetti F , Visaggio C A , Di Penta M and Panichella S 2024 Identifying safety–critical concerns in unmanned aerial vehicle software platforms with salient SoftwareX 27 101748

Liu G , Sun T , Li W , Li X , Liu X and Cui J 2024 EAI-sim: an open-source embodied AI simulation framework with large language models 2024 IEEE 18th Int. Conf. on Control & Automation (ICCA) ( IEEE ) pp 994 – 9

Tasneem O and Pieters R 2026 Human–robot collaborative visual inspection with large language models Robot. Comput.-Integr. Manuf. 98 103154

Yang Z , Raman S S , Shah A and Tellex S 2024 Plug in the safety chip: enforcing constraints for LLM-driven robot agents 2024 IEEE Int. Conf. on Robotics and Automation (ICRA) ( IEEE ) pp 14435 – 42

Waseem M , Bhatta K , Li C and Chang Q 2025 Pretrained LLMs as real-time controllers for robot operated serial production line (arXiv:2503.03889)

Lykov A , Cabrera M A , Konenkov M , Serpiva V , Gbagbe K F , Alabbas A , Fedoseev A , Moreno L , Khan M H and Guo Z 2024 Industry 6.0: new generation of industry driven by generative AI and swarm of heterogeneous robots (arXiv:2409.10106)

Rema C , Sousa A , Sobreira H , Costa P and Silva M F 2025 Exploring the potential of LLM-based chatbots for task scheduling in robot operations 2025 IEEE Int. Conf. on Autonomous Robot Systems and Competitions (ICARSC) ( IEEE ) pp 45 – 51

Zhang X , Yuan K , Xia L , Ma L , Liu H , Zhang X and Lyu Z 2025 LLM closed-loop application framework for industry manipulator system Proc. 4th Int. Conf. on Computer, Artificial Intelligence and Control Engineering pp 492 – 8

Rekik K , Silva G , Bashir A and Müller R 2025 Multimodal interaction for human-robot collaboration in assembly: an LLM-enhanced approach 2025 IEEE 21st Int. Conf. on Automation Science and Engineering (CASE) ( IEEE ) pp 1207 – 12

Zhang C , Zhang C , Xu Z , Xie Q , Hou J , Feng P and Zeng L 2025 Embodied intelligent industrial robotics: concepts and techniques (arXiv:2505.09305)

Oyekan J , Turner C , Bax M and Graf E 2025 Applying ontologies and knowledge augmented large language models to industrial automation: a decision-making guidance for achieving human-robot collaboration in industry 5.0 (arXiv:2505.18553)

Xu C , Chu Y , Gao Q , Wu Z , Wang J , Yue Y , Dominik W and Zhu X 2025 Autonomous unmanned surface vehicle docking using large language model guide reinforcement learning Ocean Eng. 323 120608

Yang R , Hou M , Wang J and Zhang F 2023 Oceanchat: piloting autonomous underwater vehicles in natural language (arXiv:2309.16052)

Yang R , Zhang F and Hou M 2024 Oceanplan: Hierarchical planning and replanning for natural language auv piloting in large-scale unexplored ocean environments Proc. 18th Int. Conf. on Underwater Networks & Systems pp 1 – 5

Din M U , Akram W , Bakht A B , Dong Y and Hussain I 2025 Maritime mission planning for unmanned surface vessel using large language model 2025 IEEE Int. Conf. on Simulation, Modeling and Programming for Autonomous Robots (SIMPAR) ( IEEE ) pp 1 – 6

Wen J , Li Z , Xi M and He J 2025 A LLM-assisted AUV 3D path planning scheme under ocean current interference via reinforcement learning IEEE Internet Things J. 12 39185 – 96

Zhang M , Kuang M , Shi H , Zhu J , Zhu J and Jiang X 2025 Command-agent: reconstructing warfare simulation and command decision-making using large language models Defence Technol. 56 294 – 313

Caissutti C , Gerbier E , Khorrambakht E , Marinelli P , Munafo A and Caiti A 2025 Shared autonomy through LLMs and reinforcement learning for applications to ship hull inspections (arXiv:2509.05042)

Grimaldi M , Cernicchiaro C , Rua S R , El-Masri-El-Chaarani A , Buchholz M , Michael L , Rodriguez P R , Carlucho I and Petillot Y R 2025 Advancing shared and multi-agent autonomy in underwater missions: integrating knowledge graphs and retrieval-augmented generation (arXiv:2507.20370)

Saad A , Akram W and Hussain I 2025 Aquachat++: LLM-assisted multi-ROV inspection for aquaculture net pens with integrated battery management and thruster fault tolerance (arXiv:2508.06554)

Chen R , Blow D , Abdullah A and Islam Md J 2025 Word2wave: language driven mission programming for efficient subsea deployments of marine robots 2025 IEEE Int. Conf. on Robotics and Automation (ICRA) ( IEEE ) pp 4107 – 14

Zhang W , Cai M , Zhang T , Lei G , Zhuang Y and Mao X 2024 Popeye: a unified visual-language model for multi-source ship detection from remote sensing imagery IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 17 20050 – 63

Pei D , He J , Liu K , Chen M and Zhang S 2024 Application of large language models and assessment of their ship-handling theory knowledge and skills for connected maritime autonomous surface ships Mathematics 12 2381

Kim T-Y and Choi W-S 2025 Autonomous vehicle maneuvering using vision–LLM models for marine surface vehicles J. Mar. Sci. Eng. 13 1553

Hu X , Zhang J , Liu W and Ma Z 2025 As-LLM: an LLM-based framework for industrial autonomous system relationship inference 2025 Joint Int. Conf. on Automation-Intelligence-Safety (ICAIS) & Int. Symp. on Autonomous Systems (ISAS) ( IEEE ) pp 1 – 6

Ren Y , Zhang H , Richard Yu F , Li W , Zhao P and He. Y 2024 Industrial internet of things with large language models (LLMS): an intelligence-based reinforcement learning approach IEEE Trans. Mobile Comput. 24 4136 – 52

Wang Y , Jiao R , Lang C , Zhan S S , Huang C , Wang Z , Yang Z and Zhu Q 2023 Empowering autonomous driving with large language models: a safety perspective (arXiv:2312.00812)

Sobrín-Hidalgo D , González-Santamarta M A , Guerrero-Higueras Angel M , Rodríguez-Lera F J and Matellán-Olivera V 2024 Explaining autonomy: enhancing human-robot interaction through explanation generation with large language models (arXiv:2402.04206)

Buchmann R , Eder J , Fill H-G , Frank U , Karagiannis D , Laurenzi E , Mylopoulos J , Plexousakis D and Santos M Y 2024 Large language models: expectations for semantics-driven systems engineering Data Knowl. Eng. 152 102324

Zaki O , Dunnigan M , Robu V and Flynn D 2021 Reliability and safety of autonomous systems based on semantic modelling for self-certification Robotics 10 10

Lyons J B , Clark M A , Wagner A R and Schuelke M J 2017 Certifiable trust in autonomous systems: making the intractable tangible AI Mag. 38 37 – 49

Vaid A et al 2024 Generative large language models are autonomous practitioners of evidence-based medicine (arXiv:2401.02851)

Tao Z , Lin T-E , Chen X , Li H , Wu Y , Li Y , Jin Z , Huang F , Tao D and Zhou J 2024 A survey on self-evolution of large language models (arXiv:2404.14387)

Popescu N 2022 Safety verification and validation techniques for autonomous driving systems J. Human. Appl. Sci. Res. 5 71 – 87

Zheng Xi , Mok A K , Piskac R , Lee Y J , Krishnamachari B , Zhu D , Sokolsky O and Lee I 2024 Testing learning-enabled cyber-physical systems with large-language models: a formal approach Companion Proc. 32nd ACM Int. Conf. on the Foundations of Software Engineering pp 467 – 71

Kong X , Braunl T , Fahmi M and Wang Y 2024 A superalignment framework in autonomous driving with large language models (arXiv:2406.05651)

Wu T , He S , Liu J , Sun S , Liu K , Han Q-L and Tang Y 2023 A brief overview of ChatGPT: the history, status quo and potential future development IEEE/CAA J. Autom. Sinica 10 1122 – 36

Wang L et al 2024 A survey on large language model based autonomous agents Front. Comput. Sci. 18 186345

Mann S P , Jiehao J S , Latham S R , Savulescu J , Aboy M and Earp B D 2025 Development of application-specific large language models to facilitate research ethics review (arXiv:2501.10741)

Corfmat M , Martineau J T and Régis C 2025 High-reward, high-risk technologies? An ethical and legal account of AI development in healthcare BMC Med. Ethics 26 4

Acharya K , Velasquez A and Song H H 2024 A survey on symbolic knowledge distillation of large language models IEEE Trans. Artif. Intell. 5 5928 – 48

Van Noorden R 2023 ChatGPT-like AIS are coming to major science searches Nature 620 258

Lu Y , Tian Y , Bi Y , Chen B and Peng X 2024 Diavio: LLM-empowered diagnosis of safety violations in ads simulation testing Proc. 33rd ACM SIGSOFT Int. Symp. on Software Testing and Analysis pp 376 – 88

Jha C K et al 2025 Large language models (LLMs) for verification, testing and design 2025 IEEE European Test Symp. (ETS) ( IEEE ) pp 1 – 10

Lu Q , Wang X , Jiang Y , Zhao G , Ma M and Feng S 2025 Omnitester: multimodal large language model driven scenario testing for autonomous vehicles Autom. Innov. 8 1 – 15

Zhou Y , Cui C , Peng J , Yang Z , Lu J , Panchal J , Yao B and Wang Z 2025 A hierarchical test platform for vision language model (VLM)-integrated real-world autonomous driving ACM Trans. Internet Things ( https://doi.org/10.1145/3769867 https://doi.org/10.1145/3769867 )

Esposito M , Palagiano F , Lenarduzzi V and Taibi D 2025 On large language models in mission-critical it governance: are we ready yet? 2025 IEEE/ACM 47th Int. Conf. on Software Engineering: Software Engineering in Practice (ICSE-SEIP) ( IEEE ) pp 504 – 15

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Reliability and security: from swarm robots to AI agents

Related Author

Yuping Yan

Yuhan Xie

Junfeng Tang

Yuanshuai Li

Yaochu;Yaochu Jin;Jin

Related Institution

Trustworthy and General Artificial Intelligence Laboratory, School of Engineering, Westlake University

School of Cyber Engineering, Xidian University

School of Information Science and Technology, Nantong University

⁰