Core Issues:
-
Limitations of Existing Methods: Current large language model (LLM) agent frameworks and inference-time algorithms often struggle with complex planning problems.
-
Main Reasons:
-
Insufficient Validation: Many methods find it difficult to effectively validate whether the generated plans meet all constraints or only perform coarse validation at the task level.
-
Poor Complexity Adaptability: Existing methods typically use the same inference algorithm for all problem instances within the same task, regardless of complexity, without adjusting based on the specific problem’s complexity.
Proposed Solution: PlanGEN Framework
-
Definition: PlanGEN is a model-agnostic, easily scalable multi-agent framework.
-
Objective: Enhance LLM’s planning and reasoning capabilities in solving complex problems, particularly in generating effective natural language plans.
-
Core Components (Three Agents):
-
Constraint Agent: Responsible for extracting instance-specific constraints from the problem description (e.g., budget, time limits, participant schedules, required concepts, rules, etc.).
-
Verification Agent: Based on the constraints extracted by the Constraint Agent, evaluates the quality of the generated plans and provides detailed feedback and numerical reward scores.
-
Selection Agent: Dynamically selects the most appropriate inference algorithm based on the complexity of the current problem instance and historical performance (e.g., Best of N, Tree-of-Thought, REBASE). It employs an improved UCB (Upper Confidence Bound) strategy for decision-making.
Specific Implementation of PlanGEN:
The paper combines these three agents with existing inference-time algorithms to propose several framework variants:
-
PlanGEN (Best of N): Generates N plans, and the Verification Agent scores them based on constraints, selecting the highest-scoring plan.
-
PlanGEN (Tree-of-Thought – ToT): During the ToT search process, the Verification Agent scores intermediate steps/ideas to guide the search direction.
-
PlanGEN (REBASE): Similarly, in the REBASE search, the Verification Agent scores nodes to guide pruning and expansion.
-
PlanGEN (Mixture of Algorithms): This is the most comprehensive framework (as shown in Figure 1). It first generates an initial plan and validates it. If the score is not high enough, the Selection Agent intervenes to choose one of the three PlanGEN variants based on problem complexity and UCB scores to generate an updated plan. This process iterates until a high-scoring plan that meets the requirements is generated.
Experiments and Results:
-
Models: Primarily experiments were conducted using Gemini-1.5-Pro, with Gemini-2.0-Flash and GPT-4o also used to demonstrate model agnosticism.
-
Tasks: Tested across multiple benchmarks, including:
-
Natural Language Planning: NATURAL PLAN (calendar scheduling, meeting arrangements, travel planning)
-
Scientific/Mathematical Reasoning: GPQA, OlympiadBench (text-only portion)
-
Financial Reasoning: DocFinQA
Main Findings:
-
PlanGEN significantly outperforms baseline methods (Zero-shot CoT, Vanilla Multi-Agent) and achieves state-of-the-art (SOTA) results across multiple benchmarks.
-
Constraint-based iterative validation indeed enhances the performance of inference-time algorithms.
-
Adaptive algorithm selection (via the Selection Agent) is particularly effective for solving complex problems and can further improve performance.
-
Different tasks and complexities may favor different PlanGEN variants (e.g., PlanGEN (Best of N) performs well on NATURAL PLAN, while PlanGEN (Mixture of Algorithms) performs best on GPQA).
Main Contributions:
-
Proposed the novel, model-agnostic, scalable multi-agent framework PlanGEN.
-
Achieved SOTA results on multiple complex planning and reasoning benchmarks.
-
Introduced a new constraint-based validation method to improve inference-time algorithms.
-
Proposed a new adaptive reasoning algorithm selection method based on instance complexity.
In summary, the core idea of PlanGEN is: by introducing agents specifically responsible for understanding problem constraints, validating plan quality, and intelligently selecting solution strategies based on problem complexity, the ability of large language models to solve complex planning and reasoning problems can be significantly enhanced.
PlanGEN is primarily aimed at formulating “plans” or “solution paths” that involve the following characteristics:
-
Multi-step and Sequential:
-
Problems cannot be solved in one step and need to be broken down into a series of ordered steps or subtasks.
-
The solution is a sequence of actions or reasoning chains.
-
For example: creating a detailed travel itinerary, step-by-step derivation of mathematical proofs, planning project execution processes.
Constraint-bound:
-
Solutions must strictly adhere to a series of explicit or implicit constraints.
-
These constraints may include time windows, budget limits, physical rules, logical rules, resource availability, specific formatting requirements, etc.
-
For example: scheduling meetings while accommodating everyone’s free time and room bookings, explaining phenomena under the laws of physics, planning activities within budget.
Requires Complex Reasoning and Strategic Thinking:
-
The process of formulating plans is not merely a simple combination of information but requires logical deduction, causal analysis, weighing pros and cons, and selecting optimal strategies.
-
There may be multiple potential solution paths, requiring evaluation of which path is superior.
-
For example: solving complex Olympiad problems, analyzing financial documents and answering reasoning questions, selecting the best course of action among various possibilities.
Requires Verification and Validation:
-
The correctness or quality of the plan heavily depends on whether all constraints are met and whether the logic is sound.
-
Intermediate steps or final results need to be carefully checked.
-
For example: verifying whether each step of a mathematical calculation is correct, confirming whether the schedule is feasible for everyone, checking whether the reasoning process is logical.
Potentially Varying Complexity:
-
Even for the same type of task (such as chess or scheduling), different specific instances (different chess positions, different participants and time requirements) can vary greatly in complexity. The Selection Agent of PlanGEN is designed to address this varying complexity.
PlanGEN’s ability to achieve powerful planning and reasoning capabilities does not stem from inventing a revolutionary underlying AI technology, but rather from cleverly combining and optimizing existing technologies and incorporating several key “tricks” or design ideas. Its key points are as follows:
-
Explicit and Centralized Constraint Handling:
-
Trick: Instead of letting the LLM satisfy constraints based on “feelings,” a dedicated “Constraint Agent” is established, whose sole task is to explicitly identify all constraints (time, budget, rules, etc.) from the problem and list them.
-
Why It Matters: This is akin to delineating clear “red lines” and “goals” for the problem solver. Subsequent plan generation and validation have a clear basis, significantly reducing the likelihood of the LLM making mistakes due to overlooking or misunderstanding constraints. This is often overlooked or poorly handled by many traditional methods.
Rigorous and Guided Verification Feedback:
-
Trick: Establish a dedicated “Verification Agent” that uses the “constraint list” to rigorously check the generated plans or reasoning steps. It not only provides a simple “correct/incorrect” or score but also gives specific feedback indicating what was done well and which constraints were violated.
-
Why It Matters: This “strict teacher” role ensures the quality of the plans. The numerical score (Reward Score) provides a quantitative basis for algorithm selection and iteration, while detailed feedback guides subsequent improvements. This “check and feedback while doing” mechanism is far more reliable than generating once and hoping it is correct.
Adaptive Strategy Selection based on Instance Complexity:
-
Trick: Acknowledge that “there is no one-size-fits-all.” Complex and simple problems require different solution strategies. Therefore, establish a “Selection Agent” that evaluates the specific difficulty of the current problem and combines historical experience (which algorithm performed well on similar problems) to dynamically select the most appropriate inference algorithm (whether to simply try several times with Best of N or to conduct a deep search with more complex Tree-of-Thought or REBASE).
-
Why It Matters: This achieves “targeted treatment.” It avoids the situation of using “a sledgehammer to crack a nut” (using complex algorithms for simple problems wasting resources) or “using a small knife to chop down a big tree” (using simple algorithms for complex problems yielding poor results), improving overall efficiency and effectiveness. It utilizes an improved UCB algorithm, combining historical performance, exploration, diversity, and the LLM’s prior judgment.
Multi-Agent Collaboration and Specialization:
-
Trick: Decompose complex planning/reasoning tasks among different agents with specialized skills (constraint extraction, plan validation, strategy selection). These agents collaborate with the underlying LLM and inference algorithms.
-
Why It Matters: This adheres to the principle of “specialists doing specialized tasks.” Each agent focuses on its core task, reducing the difficulty of a single large model needing to excel in all areas. This modular design also makes the framework easier to expand and maintain.
Iterative Refinement and Feedback Loop:
-
Trick: Especially in the “Mixture of Algorithms” mode, PlanGEN does not generate the final plan all at once but allows for multiple iterations to gradually improve. Feedback from the Verification Agent is used to guide the next round of plan generation.
-
Why It Matters: This simulates the human thought process in solving complex problems—we often start with a preliminary idea, check it, discover issues, and modify it until satisfied. This iteration allows the system to learn from mistakes and gradually approach the optimal solution.
In summary, the “tricks” of PlanGEN lie not in a single technological breakthrough but in its systematic design philosophy:
-
Making implicit knowledge explicit (constraints)
-
Introducing strict external supervision (verification)
-
Implementing dynamic adjustments of strategies (selection)
-
Enhancing efficiency through collaboration (multi-agent)
-
Utilizing iterative feedback for improvement (iteration)