Advancements in AI and Robotics Technology (April 14-20)

1. 【OpenAI】 has released more advanced reasoning models – o3 and o4-mini. The o4-mini emphasizes cost-effectiveness, while the o3 demonstrates state-of-the-art (SOTA) performance across multiple benchmark tests. Both support visual analysis through chain of thought (CoT), allowing for the integration of visual inputs (such as images) during reasoning, breaking down logical steps, analyzing image content, and drawing conclusions. Both models can also utilize all existing ChatGPT tools, including image generation capabilities.

2. 【OpenAI】 has also launched the GPT-4.1 series models, designed for developers and accessible via API. This includes three versions: GPT-4.1, 4.1 Mini, and 4.1 Nano, focusing on high performance, low cost, and high-frequency lightweight tasks. Regardless of the version, support for developer tasks surpasses that of GPT-4o and 4o mini. All three versions support a context window of up to 1 million tokens (equivalent to about 750,000 words or 3,000 pages).

GPT-4.1 scored 55% in the SWE-Bench Verified test, with a cost of $2/8 per million input/output tokens, approximately 26% cheaper than GPT-4o; 4.1 Mini costs $0.40/1.60 per million input/output tokens, 83% lower than GPT-4o; and 4.1 Nano costs as low as $0.10/0.40 per million input/output tokens.

Advancements in AI and Robotics Technology (April 14-20)

3. 【Google】 has launched the Gemini 2.5 Flash preview. This reasoning model allows developers to dynamically configure the model’s reasoning resource allocation (i.e., “thinking budget”) based on specific needs, finding the best balance between quality (output accuracy), cost (computational expenses), and latency (response time). The underlying technology may involve activating certain network parameters based on demand or controlling the number of reasoning steps and trimming the context window.

Advancements in AI and Robotics Technology (April 14-20)

4. 【Microsoft】’s Copilot Studio has introduced a “computer usage” feature, supporting developers in building AI agents that can directly operate websites and desktop applications to automate task execution. Meanwhile, Copilot Vision in the Edge browser can analyze webpage content in real-time, including text and images, providing users with summaries, Q&A, and research assistance. Microsoft aims to integrate AI more broadly into everyday office software to enhance user experience and value.

5. Elon Musk’s 【xAI】 has made significant updates to its AI chatbot Grok, introducing two major features: Memory and Grok Studio. The Memory feature allows Grok to remember user conversation history and provide customized responses. Grok Studio is a split-screen collaboration platform that supports real-time document editing, code generation and execution, and even browser game development. Compared to OpenAI’s ChatGPT Canvas and Anthropic’s Claude Artifacts, Grok Studio is freely accessible to users.

6. 【Hugging Face】 has acquired the startup 【Pollen Robotics】, incorporating its open-source humanoid robot product Reachy into its portfolio.

Hugging Face, based in the United States, is known for hosting millions of AI models and datasets, with over 7 million users. Pollen Robotics, founded in 2016, focuses on developing open-source humanoid robots, with its flagship product Reachy-2, equipped with a bionic 7-degree-of-freedom arm, VR remote control, and advanced sensors, priced at $70,000.

Through this acquisition, Hugging Face aims to expand from a pure software platform to a platform that combines AI and hardware, integrating its LeRobot algorithm framework with the Reachy hardware platform, with the goal of making robots the next interface for AI, enabling researchers, educators, and enthusiasts to access advanced robotic platforms at a lower cost.

Below video is sourced from Yicheng AI’s Side Business Guide

7. Kuaishou’s AIGC platform 【Keling AI】 has released two new generative models. KLING 2.0 Master is a video generation model that uses the Diffusion-Transformer (DiT) architecture, capable of generating up to 10 seconds of cinematic-quality video. KOLORS 2.0 is an image generation model that supports over 60 artistic styles (such as 3D anime, oil painting, cyberpunk), surpassing Midjourney V7, FLUX 1.1 Pro, and Reve, particularly excelling in semantic understanding and style control.

8. 【Anthropic】 has upgraded Claude, adding a Research feature and achieving high integration with Google Workspace. Research is an agent-based search that can autonomously execute multi-step, multi-source queries. With the help of Google’s Gmail, Google Calendar, and Google Docs ecosystem, Claude can significantly demonstrate the application potential of AI agents.

9. 【ByteDance】 has launched Seaweed, a video generation model with 7 billion parameters, utilizing the DiT architecture, with performance comparable to larger models like Sora. It supports multimodal inputs such as text-to-video, image-to-video, and audio-driven synthesis. By default, it generates 20-second videos at 720p, 24fps, with some scenes supporting 30 seconds or even nearly 2 minutes of coherent segment generation, surpassing Sora’s 20-second limit.

10. Canadian AI unicorn 【Cohere】 has released a new model Embed-4, a multimodal embedding model designed for enterprise-level search and RAG applications.

Embed-4 features a 128K token context window and supports over 100 languages. It has been optimized for specialized industries such as finance, healthcare, and manufacturing, understanding professional terminology and data formats (such as financial statements, medical records, and supply chain plans). Utilizing technologies like Matryoshka Representation Learning, int8, and binary quantization, it saves up to 83% in storage costs and offers faster inference speeds.

Advancements in AI and Robotics Technology (April 14-20)

11. South Korean startup 【Tesollo】 showcased the robotic hand DG-5F, featuring 20 independently controlled joints, with agile and natural movements. Weighing 1.4 kg, it has a gripping force of up to 12 kg. It includes a gripping algorithm that can adaptively handle objects of different shapes and materials (such as metal, plastic, paper) without additional manual adjustments. It is 20-45% cheaper than similar products on the market (such as Shadow Robot’s Dexterous Hand).

Leave a Comment