Intent-Driven LLM Ensemble Planning for Flexible Multi-Robot Manipulation

Abstract

Introduction: In this study, we address intent-driven task planning for complex multi-action manipulation sequences in heterogeneous multi-robot cells. Given a perception back-end that outputs a structured object-level scene description and a human operator’s natural-language intent, we generate a precedence-consistent object-level robot-action sequence, which can then be executed by passing each such action to a lower-level motion planning module. Methods: The pipeline integrates i. perception-to-text scene encoding, ii. an ensemble of large language models (LLMs) that generate candidate action sequences based on the operator’s intent, iii. an LLM-based verifier that enforces formatting and precedence constraints, and iv. a deterministic consistency filter that rejects hallucinated objects. The pipeline is evaluated on an example task in which two robot arms work collaboratively to dismantle an electric-vehicle (EV) battery for recycling applications. A variety of components must be grasped and removed in specific sequences, determined either by human instructions or by task-order feasibility decisions made by the autonomous system. Results: On 200 real scenes with 600 operator prompts across five component classes, we used metrics of full-sequence correctness and next-task correctness to evaluate and compare five LLM-based planners (including ablation analyses of pipeline components). We also evaluated the LLM-based human interface in terms of time to execution and NASA TLX using human participant experiments. On 200 real scenes and 600 prompts, full-sequence correctness improves from 0.761 (single LLM) to 0.824 (6-LLM + verifier + deterministic filter), and next-object correctness improves from 0.866 to 0.894. Discussion: Results in our case study indicate that our ensemble-with-verification approach reliably maps operator intent to safe multi-robot plans while maintaining low user effort.

Publication DOI: https://doi.org/10.3389/frobt.2026.1727433
Divisions: College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies > Applied AI & Robotics
College of Engineering & Physical Sciences
College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies
Funding Information: The author(s) declared that financial support was received for this work and/or its publication. This work was funded by the project called “Research and Development of a Highly Automated and Safe Streamlined Process for Increasing Lithium-ion Battery Repurposing and Recycling” (REBELION) under Grant 10079049 and partially supported by the Ministry of National Education, Republic of Turkey.
Additional Information: Copyright © 2026 Erdogan, Contreras, Rastegarpanah, Chiou and Stolkin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Uncontrolled Keywords: human-robot interaction,intent recognition,large language models,multi-robot disassembly,task planning
Publication ISSN: 2296-9144
Last Modified: 02 Apr 2026 07:07
Date Deposited: 01 Apr 2026 13:44
Full Text Link: https://www.arx ... /abs/2510.17576
Related URLs: https://www.fro ... 727433/abstract (Publisher URL)
PURE Output Type: Article
Published Date: 2026-03-27
Published Online Date: 2026-03-27
Accepted Date: 2026-01-19
Authors: Erdogan, Cansu
Contreras, Cesar Alan
Rastegarpanah, Alireza (ORCID Profile 0000-0003-4264-6857)
Chiou, Manolis
Stolkin, Rustam

Download

[img]

Version: Published Version

License: Creative Commons Attribution


Export / Share Citation


Statistics

Additional statistics for this record