Click the blue text to follow us
Every year before the Embedded Vision Summit, the author tries to reflect on the big picture of embedded AI and computer vision. This year, on the 15th anniversary of the summit, two trends are clearer than ever. First, AI and computer vision applications are moving from the laboratory to the real world, from scientific projects to widespread deployment. Second, multimodal AI—including text, visual, audio, and other sensory inputs—is fundamentally changing the capabilities of these systems.
The first trend—scaling—was well articulated in Medioni‘s keynote speech “AI in the Real World and Large-Scale Computer Vision Innovations.” Medioni is a member of the team responsible for Amazon’s automated checkout technology, so he knows a thing or two about large-scale computer vision. He will also discuss AI innovations that are improving the streaming experience for over 200 million Amazon Prime video users.
Following Medioni’s talk, there will be a panel discussion: “Edge AI and Large-Scale Vision: What is Real, What is Next, and What is Missing.” In this panel, Medioni will discuss with distinguished experts from Waymo, Hayden AI, and Meta Reality Labs how vision and AI projects go from an idea to being used by thousands or millions of people, and the challenges that must be overcome in the process.
Similarly, Chris Padwick from Blue River Technology (a subsidiary of John Deere) will discuss “Turning Computer Vision Products from Prototype to Powerful Products.” David Selinger will share his experiences in scaling startups in “Deep Dive: Lessons Learned in Building, Operating, and Scaling Edge AI Computer Vision Companies.” Jason Fayling will discuss how to use AI and vision to improve dealership operations in “SKAIVISION: Transforming Car Dealerships with Computer Vision.”
The second trend—multimodal intelligence—was highlighted in another keynote speech by Trevor Darrell from the University of California, Berkeley: “The Future of Visual AI: Efficient Multimodal Intelligence.” Darrell will discuss the integration of natural language processing and computer vision through visual language models (VLMs) and share his views on the current state and trajectory of advancing machine intelligence research. Particularly related to edge applications, much of his work aims to overcome barriers such as high memory and computational demands that limit the practical application of state-of-the-art models.
This summit will continue to focus on multimodal intelligence, delving into the integration and application of multimodal AI. Mumtaz Vauhkonen from Skyworks Solutions will present “Multimodal Enterprise Applications in the Era of Generative AI,” emphasizing the importance of multimodal inputs in solving AI problems. Vauhkonen will discuss the creation of high-quality datasets, multimodal data fusion techniques, and model pipelines that are crucial for building scalable enterprise applications while also addressing the challenges of bringing these applications into production.
Frantz Lohier from AWS will introduce the concept of AI agents in his talk “Introduction to AI Agent Design.” Lohier will explore how these autonomous components enhance AI development through improved decision-making and multi-agent collaboration, providing insights for the creation and integration of various types of AI agents. Niyati Prajapati from Google will discuss “Visual LLM in Multi-Agent Collaborative Systems: Architecture and Integration,” focusing on the application of visual LLMs in enhancing the capabilities and autonomy of multi-agent systems. Prajapati will provide case studies on automated quality control and warehouse robots, illustrating the practical applications of these advanced architectures.
As many product developers are eager to learn the practical aspects of incorporating multimodal AI into products, the author will collaborate with Satya Mallick, CEO of OpenCV.Org, to jointly present a three-hour training session, “Visual Language Models for Computer Vision Applications: Hands-On Introduction.” This course focuses on practical use cases of VLM technology, designed for professionals looking to expand their AI-driven computer vision skills, particularly in systems designed for edge deployment.
Reflecting on the progress of embedded AI and computer vision over the past 15 years, when the idea of computers reliably understanding images was almost science fiction. Today, machines can not only understand images and their sensing modalities but also reason about them, enabling a plethora of new applications. It is almost unimaginable what will happen in the next fifteen years!
Original link:
https://www.eetimes.com/two-big-trends-in-embedded-ai-and-vision-scaling-and-multimodal-intelligence/
|
High-End WeChat Group Introduction |
|
|
Venture Capital Group |
AI, IoT, chip founders, investors, analysts, brokers |
|
Flash Memory Group |
Covering over 5,000 global Chinese flash memory and storage chip elites |
|
Cloud Computing Group |
Public and private cloud discussions on all-flash, software-defined storage (SDS), hyper-converged, etc. |
|
AI Chip Group |
Discussions on AI chips and heterogeneous computing of GPU, FPGA, CPU |
|
5G Group |
IoT and 5G chip discussions |
|
Third Generation Semiconductor Group |
Discussions on compound semiconductors such as GaN and SiC |
|
Storage Chip Group |
Discussions on various storage media and controllers such as DRAM, NAND, 3D XPoint, etc. |
|
Automotive Electronics Group |
Discussions on automotive electronics such as MCU, power supplies, sensors, etc. |
|
Optoelectronic Devices Group |
Discussions on optoelectronic devices such as optical communication, lasers, ToF, AR, VCSEL, etc. |
|
Channel Group |
Storage and chip product quotes, market trends, channels, supply chains |

< Long press to recognize the QR code to add friends >
Join the above group chats
Long press and follow
Leading you into the new era of information revolution of storage, intelligence, and interconnectivity
WeChat ID: SSDFans