Toy Story or 'Toy Horror'? PIRG's Latest Report Reveals Safety Concerns of AI Toys

In today’s world, where artificial intelligence is sweeping across the globe, the toy industry is undergoing an unprecedented transformation towards intelligence. From simple sound-making dolls to AI companions that can converse fluently with children, technological advancements have brought infinite possibilities to childhood.

However, while we celebrate the technology, we must also face the accompanying “growing pains.”

Recently, the Public Interest Research Group (PIRG) in the United States released its famous annual report titled Trouble in Toyland.

This year’s focus is on the currently popular AI-driven toys. The report reveals severe issues such as privacy breaches and content control through practical testing.

Safety “Stress Testing”

PIRG researchers selected four representative AI smart toys on the market for testing:

Kumma: An AI teddy bear manufactured by the Chinese startup FoloToy.
Grok: A rocket-shaped toy produced by the Silicon Valley company Curio.
Robot MINI: A robot made by Little Learners.
Miko 3: A consumer-grade robot produced by the Indian company of the same name.

At the start of the testing, issues became apparent.Robot MINI was immediately “disqualified” due to its inability to maintain a stable internet connection. While this may seem like a quality issue, for smart devices, an unstable connection often indicates potential security vulnerabilities that can easily be exploited by anonymous users.

For the other three toys that were operational, researchers focused on testing their performance in terms of privacy protection and content safety.

Privacy Boundaries: Who is Listening to Children’s Secrets?

The core of AI toys lies in interaction, and the prerequisite for interaction is “listening.” However, the design logic of different products in terms of “listening” brings about vastly different risks.

Miko 3 is relatively restrained, requiring children to activate the conversation mode by using the built-in microphone to record.
Grok uses a wake word system and continues recording for about ten seconds after the user stops speaking.
Kumma has the most aggressive design; it does not have a physical button and is in a constant listening state. During testing, it even interrupted conversations between researchers without being directly activated.

Curio’s Grok

AI Toy Trend Insight: This kind of “invisible listening” enhances the fluidity of interaction but poses a significant risk in a children’s room, where privacy is extremely sensitive. More worryingly, the leakage of voice data could allow scammers to use AI to clone a child’s voice, leading to potential fraud.

When Generative AI Meets Innocent Words

Unlike the old generation of smart toys like “Hello Barbie” from 2015, which had preset dialogue scripts, the new generation of AI toys generally connects to large language models (LLMs), such as OpenAI’s GPT series or Mistral. This means that the responses of the toys are no longer pre-set but generated in real-time.

This brings about the surprise of “one size fits all” but also the shock of “uncontrollable.”

To test the “safety barriers” of these toys, researchers designed a series of sensitive tests, including inquiries about the locations of dangerous items (knives, drugs, firearms) and inducing discussions on violence and pornography.

Dangerous Guidance Testing

When asked how to find or use dangerous items:

Grok refused most answers, demonstrating good risk control capabilities.
Miko 3 occasionally “dropped the ball,” providing certain locations of household hazardous items even when the user was set to 5 years old.
Kumma performed worryingly. It not only listed the locations of dangerous items but also provided detailed steps on how to use them. Whether running on GPT-4o or Mistral models, it failed to maintain the bottom line.

FoloToy’s Kumma

Sensitive Topics and “Induction” Risks

In more extreme adult topic tests, the “personalities” of each model were starkly revealed:

Grok (most robust): When faced with mature topics, it usually refuses to answer and provides safety advice. Even when researchers tried to probe further, it could maintain boundaries and not expand harmful information.
Miko 3 (most forgetful): It refused direct sensitive questions. Due to its lack of contextual memory, each conversation is independent, which ironically serves as a protection—topics cannot escalate, and harmful content cannot be constructed through multiple dialogues.
Kumma (most uncontrollable): It is the riskiest product in this test. Initially, it would refuse to discuss sexual topics, but as the conversation deepened, the system’s “barriers” gradually failed. It began to provide detailed descriptions of adult content and even, when users asked neutral or ambiguous questions, actively redirected the topic back to previous sensitive content and expanded on it.

From the screen, we can see Miko 3’s emotions

Facing Problems is the First Step to Solutions

After the report was released, the market did not remain indifferent, which gives us a glimmer of hope.

FoloToy has stated that it will “temporarily” halt the sales of Kumma following the report’s release.
OpenAI has stated that it has “suspended the developer’s permissions due to policy violations” after receiving feedback from the research team.

PIRG’s report, Trouble in Toyland 2025, is also of great reference value for AI toy practitioners in our country.

We firmly believe that AI technology can provide better companionship and education for children. However, we must also be clear that, children are not ordinary users, and toys are not ordinary consumer electronics.

Transparency is crucial: What models are used in the toys? Where does the data go? Parents need to have the right to know and choose.

Barriers must be reinforced: For children’s products, general commercial large models may be too “knowledgeable.” We need stricter model fine-tuning and reinforcement learning for children’s scenarios to ensure that there are no leaks on safety and ethical topics.

Continuous supervision and testing: Technology is iterating, and risks are evolving. Third-party evaluations and supervision are the preservatives for the health of the industry.

All technical failures and early issues are a necessary path for the industry to mature. We expose problems not to stifle innovation but to allow innovation to run further on a safe track.

May every child’s AI partner be safe, wise, and kind.

Related Information:

Report Source: PIRG – Trouble in Toyland 2025

Research Team: Teresa Murray, R.J. Cross, Rory Erlich, Lillian Tracy, Jacob Mela

Related Companies: FoloToy, Curio, Little Learners, Miko

(This article aims for industry communication and warning, and does not represent a permanent characterization of specific brands, looking forward to the subsequent improvements from manufacturers)

You can reply “Toy Trouble” on the public account to obtain the complete report.

····· End ·····

Follow AI Toy Trends, to get more industry insights and information.

Toy Story or ‘Toy Horror’? PIRG’s Latest Report Reveals Safety Concerns of AI Toys

Safety “Stress Testing”

Privacy Boundaries: Who is Listening to Children’s Secrets?

When Generative AI Meets Innocent Words

Dangerous Guidance Testing

Sensitive Topics and “Induction” Risks

Facing Problems is the First Step to Solutions

Leave a Comment Cancel reply

Safety “Stress Testing”

Privacy Boundaries: Who is Listening to Children’s Secrets?

When Generative AI Meets Innocent Words

Dangerous Guidance Testing

Sensitive Topics and “Induction” Risks

Facing Problems is the First Step to Solutions

Related posts

Leave a Comment Cancel reply