「01」Introduction
This is a paper from the National Bureau of Economic Research, published in April 2025, titled “Measuring Human Leadership Skills with AI Agents.” The Harvard University team proposed using AI Agents to simulate team collaboration, allowing for low-cost and standardized assessments of individual leadership.

The researchers conducted a pre-registered experiment to test whether AI Agents could effectively measure human leadership skills. The experiment utilized a “hidden information task,” where 249 leaders performed tasks under two conditions:
-
Human Team Condition: Leaders collaborated with three human members;
-
AI Team Condition: Leaders collaborated with three AI Agents powered by large language models.




「02」The Role and Simulation of AI Agents:
The study used large language models (LLMs) to construct AI team members.
These AI Agents were designed to role-play in the hidden information task, allowing their behavior patterns to simulate human team members.
Each AI Agent was pre-loaded with its unique, incomplete task information fragments, which formed the basis for its responses to human leaders’ inquiries.
Interaction Mechanism: Human leaders engaged in real-time, natural text conversations with three AI team members through a text chat interface. The AI Agents generated responses based on the information they were given (prompts: such as role settings, context information, situational information) in response to the leader’s questions.
「03」Experimental Task Design
-
Pre-test: Typing speed (to ensure no significant difference in efficiency when communicating with AI), fluid intelligence (CFIT III test), individual hidden information task performance (measuring personal information integration ability), emotional perception ability (PAGE test), economic decision-making ability (as an indicator of cognitive flexibility).
-
Experimental Task: The “hidden information question” served as the core task for group collaboration experiments.
-
Team Composition: Each experimental group consisted of one human leader and three team members. In the AI experimental condition, all three team members were played by AI Agents.
-
Information Structure: The information required to solve the problem was distributed incompletely and asymmetrically among the AI team members (and the human leader), forcing team members to integrate all dispersed information through communication and questioning.
-
Parallel Test Versions: To ensure fairness and comparability in the experiment, two sets of parallel task versions with consistent content structure and equivalent difficulty were designed for the AI team and the human team.
-
Control and Consistency: By controlling the prompts of the AI model and output parameters (such as temperature), the responses of the AI Agents were kept consistent throughout the experiment.
「04」Data Collection and Measurement
Data Recording: All dialogue texts between human leaders and AI Agents.
Analysis Metrics: Automatically extract behavioral features from the dialogue using NLP, such as:
-
Frequency of leader’s questions
-
Ratio of leader and AI alternating speaking
-
Proportion of pronouns used by the leader: “we” (reflecting collective consciousness) vs. “I” (reflecting individual dominance).
Human Assistance: Some language features were manually coded for supplementation and validation.
「05」Conclusion
The results showed that individual leadership performance in AI teams was highly correlated with their leadership effectiveness in human teams (correlation coefficient of 0.81). Even after controlling for typing speed, fluid intelligence, and other hard skills, the AI tests still effectively predicted the “net leadership contribution” in human teams (r=0.69).
Moreover, the experiment replicated classic leadership phenomena (such as the positive correlation between overconfidence and leadership willingness) and identified common behavioral characteristics of successful leaders (such as frequent questioning and the use of “we” language).

The PsychFlow experimental platform is promoting the exploration of the “AI × Psychology” research paradigm.
Generating experiments/scales through natural language expressions and publishing online experiments/scales.
Supporting the design of AI + psychology experiments/scales (appointment required).
Limited-time activity during the school season with no participant platform fees.
Follow us and send “02” to get the original text.
Click below to read the original text and enter the PsychFlow experimental platform.