Institutional Insights | The Photon War: AI Robot Visual Data Becomes the Core Battlefield, Tesla and Meta Compete in the Reality Capture Race

In the current rapid iteration of artificial intelligence and robotics, a silent battle over “visual data” has begun. On September 22, Morgan Stanley released a research report stating that the Visual-Language-Action (VLA) model is the core of AI robots achieving autonomous interaction, and the key to training such models—“reality capture data”—is becoming the focus of competition among global technology and manufacturing giants.

From Tesla’s Optimus robot shifting to pure visual training, to Meta embedding ultra-high-definition cameras in wearable devices, to Brookfield collaborating with AI companies to collect scene data, the consensus in the industry is clear: “Whoever can acquire high-quality real-world scene videos on a large scale will gain an advantage in the AI robot era.”

The Essence of the “Photon War”: Visual Data is the “Fuel” for AI Robots

The Morgan Stanley report vividly illustrates the value logic of visual data with the metaphor of a “fat bluefin tuna”: on a remote island, a 600-pound bluefin tuna has zero value if it cannot be caught; only when equipped with a boat, fishing gear, and detectors does the tuna possess a million-dollar value. The value of visual data is similar—without the ability to collect and process it, the potential value of global visual data cannot be realized; when companies master the “Yao-level floating-point operations (10²⁴ times/second)” data processing capability, real-world scene data will become the core “fuel” for breakthroughs in AI robot technology.

This understanding is driving companies to deploy cameras in homes, offices, cars, and even wearable devices. Morgan Stanley analysts, in discussions with Alex Kendrick, co-founder of Figure AI (a startup focused on end-to-end generative AI for autonomous driving), noted that he clearly stated: “Whoever can acquire ultra-high-definition videos of home scenes on a large scale… will win.” This viewpoint directly addresses the core bottleneck in AI robot development—the scarcity of high-quality, multi-scene visual training data.

II. Tesla Bets on Pure Visual Training, Opening New Paths for Robot Data Collection

As a key player in the AI robot field, Tesla’s actions regarding visual data applications are closely watched. The report reveals that in May 2025, Tesla’s former Optimus project leader released a video on the X platform showing that Optimus can autonomously perform tasks through “human demonstration videos” filmed from a first-person perspective, with the long-term goal of shifting to “third-person perspectives captured by randomly deployed cameras.” This transformation marks a critical leap for Tesla from “human-controlled assistance” to “data-driven autonomous learning.”

More groundbreaking is the report from Business Insider in August 2025, which indicates that Optimus’s pre-training will be completely “de-humanized”—no longer relying on motion capture suits and remote operators in virtual reality (VR), but instead obtaining training data by “recording videos of factory workers performing tasks.” This model not only reduces training costs but also allows robots to learn complex operational logic in real industrial scenarios, enhancing practical value.

Coincidentally, the report also mentions the layout of the unlisted company Skild AI: this company is building a “robotic foundation model,” with core training data sourced from “human action videos on the internet,” further confirming the universal value of “real scene data” in robot training.

III. Giants Compete: Meta Seizes the Wearable Device Entry

On the consumer and scene fronts, technology and asset giants are also accelerating their layout for visual data collection, forming a diversified competitive landscape.

1. Meta: Wearable Devices Become the “Data Battlefield,” User Faces Carry Key Value

Meta’s layout for next-generation wearable devices directly targets visual data collection. The report indicates that Meta plans to embed two ultra-high-definition cameras in its glasses products, focusing on capturing users’ “real data of hand movements”—whether playing the piano, knitting, pouring coffee, or taking out the trash, these everyday actions will become valuable training material for AI robots. Morgan Stanley predicts that within the next two years, the ownership of such devices may reach 20 million units, nearly double the current number of Tesla cars worldwide.

Every Meta glasses user could train a “humanoid virtual avatar” in “billions of scenes in the digital universe.” The report vividly states: “These glasses may be stylishly designed, but your face has already become the ‘battlefield’ for data competition.” Although Meta’s wearable devices are still in the “proof of concept stage” and unlikely to have a substantial financial impact in the short term, Morgan Stanley’s internet team emphasizes that its “full-stack layout” (self-developed hardware + AI operating system + content ecosystem) has laid the foundation for seizing the next generation of computing platforms, with visual data collection being a core link in this layout.

2. Brookfield: Activating Real Estate Resources to Create the “Largest Pre-training Dataset”

Unlike the device-side layout of technology companies, Brookfield, a global leader in infrastructure solutions, chooses to “exchange assets for data.” The report reveals that Brookfield recently reached a cooperation agreement with Figure AI, planning to open its vast real estate portfolio—over 1 million residential units, 500 million square feet of commercial office space, and 160 million square feet of logistics warehouse space—for the collection of AI robot training data.

The core value of this cooperation lies in “scene diversity”: the environmental characteristics, object layouts, and human activity patterns in different scenes such as residential, office, and logistics can provide AI robots with multi-dimensional training material, helping robots learn to move, perceive, and act in “human-centered various scenes.” Currently, Brookfield’s assets have begun data collection work, and plans to scale up in the coming months; both parties also plan to explore long-term commercialization opportunities for “deploying humanoid robots in real estate,” forming a closed loop of “data collection – model training – scene implementation.”

IV. Investment Perspective: Tesla as a Core Target, Focus on Data Ecosystem Chain Opportunities

Morgan Stanley clearly lists Tesla as a core focus target in the report, giving it an “overweight” rating with a target price of $410. The technological breakthroughs and data accumulation related to AI robots are key variables supporting long-term valuation.

The report also highlights core risks in the industry: first, intensified competition from traditional automakers, Chinese automakers, and tech giants in the AI robot field; second, execution risks related to Tesla’s multiple factory launches and technology iterations; third, lower-than-expected adoption rates of Full Self-Driving (FSD) and average revenue per user (ARPU), leading to decreased market recognition of the value of the “Dojo supercomputer-enabled service business.”

V. Conclusion: Visual Data Reshapes the Competitive Landscape of AI Robots

The Morgan Stanley research report points out that the competition in AI robots has shifted from “algorithm iteration” to “data competition,” with visual data as the core resource for VLA model training, and the ability to acquire it will directly determine a company’s position in the industry. Whether it is Tesla focusing on video collection in industrial scenes, Meta seizing the consumer-end wearable device entry, or Brookfield activating real estate scene resources, the essence is to build technological barriers through “scene coverage + data accumulation.”

As embodied AI technology matures, the “photon war” will become increasingly fierce, and those companies that can balance data collection efficiency, user privacy, and commercialization will likely stand out in this competition, reshaping the global AI robot industry landscape.

Leave a Comment