The Future of AI Agents According to Microsoft, OpenAI, and Google

*Please give a thumbs up, this will motivate me to publish more great articles~

*When reprinting this article, please include all reference links

AI AGENT

“Artificial Intelligence Agent typically refers to an automated and/or intelligent system capable of performing tasks or achieving goals within its environment. These agents utilize various AI technologies, including machine learning, natural language processing, and rule engines, to parse complex data, make decisions, and execute actions automatically.”

Software that can automatically complete more complex tasks will be essential for generating more revenue.

According to reports from information, AI providers such as Microsoft, OpenAI, and Google are currently competing to introduce conversational artificial intelligence technology into various scenarios—by incorporating new features that can handle complex tasks with minimal customer guidance.

For example, Microsoft is developing software to automate multiple operations, such as creating, sending, and tracking customer invoices based on order history, or rewriting application code in another language and verifying that it works as expected. Employees have stated that this new software will be powered by OpenAI’s technology and will continue to enhance Microsoft’s existing Copilot tools, such as summarizing meetings or drafting emails. Microsoft plans to announce some new features at next month’s annual Build developer conference.

These features are referred to as Agents, which can work towards a goal with minimal human guidance. OpenAI, Google, and Meta Platforms, the owner of Facebook, are each developing their own versions of Agent. This is part of a broader industry effort to transform the excitement sparked by ChatGPT 18 months ago into recurring revenue for companies selling such technology. While AI chatbots have gained acclaim in the business world for their ability to generate realistic responses or suggest code to programmers, customers indicate that software capable of automatically completing more complex tasks will be essential for companies to generate more revenue.

“I see some features appearing in emails, like, ‘Would you like to have AI rewrite this note?’ but that hasn’t really changed my life,” said Dev Ittycheria, CEO of MongoDB, a major database provider. Employees and customers of MongoDB are waiting for better features, and they will not invest heavily in AI until these features are realized. He stated, “Agent workflows will be the next major breakthrough.”

OpenAI is quietly designing computer-using Agents that can control personal computers and operate multiple applications simultaneously, such as transferring data from documents to spreadsheets. Additionally, OpenAI and Meta are developing a second type of Agent that can handle complex web-based tasks, such as creating itineraries and booking travel accommodations based on those itineraries.

Google’s core AI team, DeepMind, is also developing AI Agents capable of handling complex tasks. They are working on this with the assistance of Anmol Gulati, co-founder of Adept, a startup developing computer-using Agents.

Regarding Adept, sources close to the company have revealed that this company, which has raised over $400 million in funding, plans to launch its computer-using Agent product this summer. Adept’s CEO David Luan stated that the company built its AI system from scratch and trained it by analyzing videos of people working on computers (such as creating Excel spreadsheets). Luan noted that as a result, Adept’s model can perform human-like operations on computers, such as browsing Redfin for real estate listings or logging calls in a customer relationship management system.

As the concept of Agent gradually becomes a buzzword in the AI field, many company executives may confuse customers when expanding and defining Agents. For example, some companies announced last week what they called multiple Agents, which are essentially just different versions of conversational chatbots like ChatGPT, trained to handle specific tasks, such as enhancing customer service interactions, but typically do not involve multi-step operations.

The idea of AI Agents first emerged a year ago when developers launched two open-source Agents based on large language models. Tech enthusiasts used these Agents to create demonstrations that automatically wrote podcast outlines or summarized competitive business information. However, as people discovered that these Agents were far from perfect, initial enthusiasm gradually waned. Developers found that while these Agents could list the tasks needed to achieve broad goals, their execution varied widely and they were prone to falling into repetitive behavior loops.

Some companies, like Microsoft, have not launched the most complex Agents, but are seeking to introduce Agents that can gradually enhance the automation capabilities of their existing software versions.

Earlier this year, Microsoft formed a new team under the leadership of Scott Guthrie, Executive Vice President of Cloud and AI, with the goal of developing Agent capabilities for the company’s Copilot product line. For example, a forthcoming Agent feature being developed in Microsoft’s Dynamics sales application aims to proactively suggest multi-step operations that users previously needed to guide Copilot to perform manually.

For instance, the planned feature can detect a bulk product order that a business customer has not completed, automatically draft an invoice, and ask the business if they would like to send that invoice to the ordering customer. The Agent will then automatically track customer responses and payments, recording this information in the company’s system.

Although Peter Lee, head of Microsoft’s research department, indicated last year that research teams should explore developing more complex Agent programs, such as the computer-using agents being developed by OpenAI, Microsoft researchers are still working to address how to prevent agents from going rogue, deleting files, or performing other harmful actions on user devices, an employee stated.

Software programmers are likely to be among the first professionals to experience advanced Agent technology, as Microsoft’s GitHub Copilot has already demonstrated this, a tool that can recommend lines of code in real-time as developers write code.

GitHub’s CEO Thomas Dohmke hinted that GitHub Copilot will expand its features in the coming year. He stated that when developers describe the issues they encounter in their codebase, an Agent will review the problem, propose solutions, and automatically write and execute the relevant code.

GitHub CEO Thomas Dohmke emphasized, “In the short term, we will focus on developing Agents that can handle larger tasks, which will greatly assist developers in completing various work.”

Additionally, startups like Magic and Cognition AI have gained attention for claiming that their programming Agents can perform many human programming tasks, although these claims have yet to be verified.

Recent advancements include two recent technological breakthroughs that may help AI providers develop Agents for broader applications, such as scheduling a week’s appointments, creating itineraries and maps, or accurately filling out forms using information from various data sources.Ion Stoica, a computer science professor at the University of California, Berkeley, and co-founder of AI startups Anyscale and Databricks, stated that developers are becoming more adept at using large language models (LLMs) to generate synthetic data, which is then used to train other models. This is particularly helpful in code generation, where developers can guide models to create and solve problems within certain parameters.

The second advancement mentioned by Stoica is the “grounding” process: this involves building AI models that can automatically verify the validity of other models’ outputs, such as testing whether the code generated by a model correctly solves the current problem.

Stoica added, “Next year, we expect to see significant progress in the problem-solving and reasoning capabilities of models. This will rely on the grounding process: if I can automatically confirm that an output is valid, then I can use the LLM itself to enhance the output, which will be a major breakthrough.”

Although Agent technology has not yet been widely adopted, AI providers and their customers are still laying the technical groundwork to eliminate common errors found in current chatbot technology, which hinder their application in enterprises.

Olivier Pomel, CEO of Datadog, commented: “Even if you can achieve 99% accuracy, that is still not enough for applications that require near-perfect precision.” Datadog is a company that helps businesses monitor the performance of their cloud applications.

Furthermore, grounding work involves software that can verify the results produced by AI models. This is different from OpenAI and its peers’ efforts to develop increasingly larger conversational AI models, which aim to be inherently smarter and more precise than their predecessors. OpenAI and its competitors hope that larger models will be better suited to drive Agents, but it will take time to understand how LLMs will improve rapidly.

[Read More]

After the upheaval: What is OpenAI’s current status? What is the biggest revenue driver?
OpenAI invests! How will Humane Ai Pin disrupt smartphones?
Just in! Interpretation of OpenAI’s launch of ChatGPT Plugins
OpenAI founding team: The development of superintelligence cannot be stopped
Latest! Microsoft releases Copilot, completely disrupting Office
Generative AI disrupts the front end, what should you do?
Breaking news! OpenAI officially launches multimodal GPT-4

AI AGENT

Related posts

Leave a Comment Cancel reply