Dexto is Here! The Secret Weapon for Making AI Agents Speak

Have you ever thought that one day your AI assistant could not only answer questions but also actively help you write code, edit images, send emails, and even browse the internet for information and make decisions? Not just a mechanical conversation of “you say one thing, it replies another,” but like a real “digital colleague”—with memory, judgment, execution capabilities, and the ability to discuss the next steps with you.

Today, I will take you deep into this magical tool known as the “AI Intelligent Layer” and see how it transforms a bunch of cold, hard models, tools, and data into a living AI agent system that can “think,” “act,” and “speak.”

1. From “Chatbot” to “Digital Life”: The Evolution of AI Agents

Let’s start with some background.

In recent years, large language models (LLMs) have become incredibly popular. You may have heard names like ChatGPT, Claude, and Gemini. What makes them powerful? It’s their ability to “understand language” and “generate content.” You can ask them, “Help me write a resignation letter,” and they can produce a beautifully crafted document in seconds.

But the problem is—they can only talk, not act.

If you want them to “optimize the Python script I wrote last week and push it to GitHub,” they can at most tell you how to do it, but you still have to type the commands, modify the code, and submit the PR yourself. The efficiency doesn’t improve much; instead, you just have an “AI coach”.

It’s like hiring a super-smart consultant who can talk strategy but won’t even touch the keyboard.

Thus, people began to wonder: can we make AI not just a “talking head” but a true “executor” that can “get things done”?

This is the origin of the AI Agent.

What is an AI Agent?

In simple terms, an AI Agent is an entity that can:

  • Perceive the environment (e.g., read files, browse web pages)
  • Make decisions (e.g., decide whether to research or write code first)
  • Execute actions (e.g., call APIs, modify files, send messages)
  • Learn and remember continuously (e.g., remember what you said or did last time)

It is no longer a passive answering machine but an active task executor.

Dexto is specifically designed as an “operating system-level” framework for building such AI agents.

2. What Exactly is Dexto? In a Nutshell

Dexto is a universal intelligent layer that allows you to seamlessly connect large models, tools, and data to quickly build AI applications that can “think,” “act,” and “chat” with you.

Sounds a bit abstract? Don’t worry, let me give you an analogy.

Imagine you want to create an “all-in-one AI assistant” that can:

  • Write code
  • Create images
  • Query databases
  • Play music
  • Search for information online
  • Generate podcasts for you

The traditional approach would require you to integrate APIs one by one, write a bunch of glue code, set up a backend service, and then build a frontend interface… by the time you finish, the opportunity has passed.

With Dexto, you only need to:

  1. Tell it: “I want an AI that can draw”
  2. It automatically loads the image generation tool and the corresponding large model (like Gemini Flash)
  3. Then you input: “Draw a futuristic flying car,” and it can directly generate and display the image for you

The entire process,requires not a single line of code, just configuration files to handle everything.

Isn’t it a bit like “LEGO blocks for the AI world”?

3. Core Highlights: What Makes Dexto So Powerful?

Let’s take a look at what its official slogan says:

“An all-in-one toolkit to build agentic applications that turn natural language into real-world actions.”

In translation, this means:an out-of-the-box toolkit that turns natural language into real-world actions.

This statement is packed with information, so let’s break down the technical support behind it.

✅ 1. Configuration-Driven Development: YAML Defines Everything, Say Goodbye to Hardcoding

One of Dexto’s most impressive designs isdefining agent behavior using YAML files.

What does this mean? It means you don’t have to write JavaScript/Python to control logic; instead, you use a structured configuration file to tell the agent:

  • Which large model to use?
  • What tools can be accessed?
  • How to manage memory?
  • How to handle user dialogues?

For example, here is a snippet of a configuration for an agent:

agent:
  name: coding-agent
model: openai/gpt-4o
tools:
    - filesystem
    - browser
    - terminal
memory:
    type: redis
    ttl: 86400
prompt: |
    You are a professional front-end engineer,
    skilled in using HTML/CSS/JS to build responsive pages.
    When users present requirements, please plan and execute step by step.

See? Not a single line of code! It’s all declarative configuration.

What does this mean? It means you can:

  • Quickly switch models (changing from GPT to Claude requires just one line)
  • Dynamically add or remove tools (want to add a PDF reader? Just add it)
  • Reuse agent templates (team-shared configurations, unified standards)

This is simply the “microservices architecture of the AI era.”

✅ 2. Runtime Engine: Making Agents Truly “Live”

Having configuration alone is not enough; a powerful runtime is needed to execute these instructions.

Dexto provides a complete runtime that includes the following key capabilities:

Function Description
Session Management Supports multi-user, multi-session concurrency, with independent context for each session
Memory System Automatically saves conversation history, intermediate states, and user preferences
Tool Orchestration Multiple tools can be called sequentially or in parallel, with error retry support
Multi-Modal Support Can handle text, images, files, and voice
Human-Machine Collaboration Key operations can be set for manual approval to prevent AI from going rogue

For example, in a practical scenario:

You say: “Help me create a Snake game, and open the browser for preview once done.”

What will Dexto do?

  1. Understand your intent → Plan task steps
  2. Call the code generation model → Output HTML+CSS+JS code
  3. Use <span>filesystem</span> tool → Write the code into a local file
  4. Call <span>browser</span> tool → Automatically launch the browser to open the page
  5. Reply to you: “Done! The game is open in the browser”

The entire process is fully automated, and each step has log tracking, so if something goes wrong, you can troubleshoot.

This kind of “end-to-end automation” is the true meaning of an “intelligent agent.”

✅ 3. Rich Interfaces: CLI, Web, API at Your Choice

Dexto is not only suitable for technical personnel but also caters to different user habits.

It provides three main interaction modes:

🖥️ Web UI: Visual operation, easy for beginners

Start command:

dexto

This will automatically open a web interface where you can:

  • View historical conversations
  • Switch between different agents
  • Monitor token consumption in real-time
  • Test new prompt effects

Especially suitable for debugging and demonstrations.

💻 CLI Mode: A Programmer’s Favorite, Chat Directly in the Terminal

dexto --mode cli

Enter command-line interactive mode, suitable for embedding in workflows and automation scripts.

For example, if you suddenly want to refactor while coding, just type:

dexto "Refactor this module into three functions and add TypeScript types"

It will help you complete it.

🔌 API Interface: Easily Integrate into Existing Systems

Dexto also provides RESTful API and TypeScript SDK, making it easy to integrate into your app, website, or customer service system.

For example, if you want to create a “document Q&A bot,” you can:

  1. Upload a PDF
  2. Call the Dexto API to analyze the content
  3. Automatically retrieve and answer when users ask questions

Seamlessly integrate into your product.

✅ 4. Out-of-the-Box “Agent Recipes”: Ready to Use

The most considerate feature is that Dexto comes with a bunch of pre-set “Agent Recipes,” equivalent to ready-made AI employee templates.

Currently, there are over 10 types of agents that can be installed and used directly:

Agent Name Capability Overview
Coding Agent Complete code writing, refactoring, and debugging
Nano Banana Agent Image generation and editing (based on Gemini Flash)
Podcast Agent Generate dual-dialogue podcast audio
Sora Video Agent Call OpenAI Sora to generate videos (requires permission)
Database Agent Execute SQL queries, analyze data
GitHub Agent Create PRs, review code, manage repositories
Talk2PDF Agent Free Q&A after uploading PDF
Music Agent AI composition, audio processing
Triage Agent Multiple agents collaboratively handle customer issues

Installation is also extremely simple:

# View available agents
dexto list-agents

# Install a few common ones
dexto install coding-agent podcast-agent talk2pdf-agent

# Use directly
dexto --agent coding-agent "Write a React login component"

In just a few minutes, you can have several “AI expert teams” at your disposal.

4. Technical Architecture Analysis: How Does Dexto Work?

Talking about features is not enough; let’s dive into the underlying structure and see what Dexto’s “guts” look like.

🧱 Overall Architecture Diagram (Simplified)

+---------------------+
|     User Interface    | ← CLI / Web / API
+----------+----------+
           |
           v
+-----------------------+
|    Dexto Runtime      | ← Core Engine
| - Orchestration       |
| - Session Management  |
| - Memory & Context    |
| - Tool Invocation     |
+----------+-----------+
           |
     +-----+-----+
     |           |
     v           v
+------------+ +------------------+
|   LLMs     | |     Tools        |
| (GPT,      | | (Filesystem,     |
| Claude,    | |  Browser, APIs)  |
| Gemini...) | +------------------+
+------------+
     |
     v
+------------------+
|   Storage Layer   | ← Redis, SQLite, S3...
+------------------+

The entire system is divided into four layers:

First Layer: User Interface Layer (Input)

Supports various input methods:

  • CLI: Command-line interaction
  • Web UI: Graphical interface
  • HTTP API: Programmatic calls
  • SDK: TypeScript integration

No matter which method, they will ultimately be converted into a unified request format sent to the core engine.

Second Layer: Runtime Engine (The Brain)

This is the core part of Dexto, responsible for coordinating all components.

It mainly includes the following modules:

1. Orchestrator

Function: Receives user input → Analyzes intent → Breaks down tasks → Schedules tools → Aggregates results

For example, if you say: “Convert this blog into a podcast and publish it on Xiaoyuzhou.”

It will automatically break it down into:

  1. Call <span>read_file</span> tool to read the blog content
  2. Call <span>podcast-agent</span> to generate dual-dialogue audio
  3. Call <span>upload_to_xiaoyuzhou</span> API to publish the program
  4. Return the link to you

The entire process requires no human intervention.

2. Session Manager

Each user has an independent session ID that saves:

  • The current conversation context
  • Records of tools used
  • User personalized settings
  • Temporary variables

This way, when you come back next time, the AI remembers what you said before.

3. Tool Gateway

All external capabilities are exposed through “tools”.

Each tool is a plugin that follows a unified interface:

interface Tool {
  name: string;
  description: string;
  parameters: JSONSchema;
  execute(input: any): Promise<any>;
}

Currently, there are over 30 built-in tools, including:

  • File system read/write
  • Browser control
  • Terminal command execution
  • HTTP client
  • Database connection
  • PDF parsing
  • Image processing
  • Audio synthesis

You can also connect external services via the MCP protocol (to be detailed later).

4. Memory System

Supports various storage backends:

  • In-memory (for development testing)
  • Redis (high-performance caching)
  • PostgreSQL (persistent storage)
  • SQLite (lightweight local storage)
  • S3 (object storage)

You can choose freely based on the scenario.

For example, in a production environment, use a combination of Redis and PostgreSQL for speed and stability; for local development, SQLite is sufficient.

Third Layer: Model and Tool Layer (Execution)

This is the true “hands-on layer”.

🤖 Supports 50+ Large Models

Dexto does not bind to any specific vendor; instead, it acts like a “model supermarket,” supporting almost all mainstream LLMs:

Type Examples
Cloud Service Providers OpenAI, Anthropic, Google, Groq, Azure
Open Source Models Llama 3, Mistral, Qwen, DeepSeek
On-Premise Deployment Ollama, LM Studio, Text Generation WebUI

Switching models requires just one line of configuration:

model: anthropic/claude-3-opus
# vs
model: ollama/llama3:70b

No need to change the prompt template.

🔧 Tools Plug and Play

All tools can be dynamically loaded.

For example, if you want the agent to have the ability to “send WeChat messages,” just register a new tool:

registerTool({
  name: 'send_wechat_message',
  description: 'Send WeChat messages to specified contacts',
  parameters: {
    type: 'object',
    properties: {
      contact: { type: 'string' },
      message: { type: 'string' }
    }
  },
async execute({ contact, message }) {
    return await wechatApi.send(contact, message);
  }
});

Once registered, it can be used in YAML.

Fourth Layer: Observability and Safety

As a production-grade framework, Dexto has put significant effort into observability and safety.

📊 Built-in OpenTelemetry Support

All requests will be automatically traced, supporting:

  • Distributed tracing
  • Metrics monitoring
  • Log collection

You can use tools like Jaeger, Prometheus, and Grafana for real-time monitoring:

  • Time taken for each call
  • Token usage
  • Tool invocation chains
  • Error rate statistics

This is very useful for debugging complex agent processes.

🔐 Human-in-the-Loop Mechanism

To prevent AI from “acting on its own” and causing issues, Dexto provides flexible approval policies.

For example, you can set:

  • All operations involving “deleting files” must be manually confirmed
  • Prompt reminders before “modifying the database”
  • Some tools are disabled by default and require explicit authorization

You can also enable <span>--auto-approve</span> mode for rapid iteration:

dexto --auto-approve "Refactor the entire project directory structure"

It’s great for development, and you can turn it off before going live.

5. Practical Demonstration: Create an AI Drawing Assistant in Three Steps

Having covered the theory, let’s get into some practical work.

Goal: Use Dexto to build an AI drawing assistant that supports generating images from Chinese descriptions.

Step 1: Install Dexto

npm install -g dexto

Or build from source:

git clone https://github.com/truffle-ai/dexto.git
cd dexto && pnpm install && pnpm install-cli

Step 2: Initialize and Configure

Run the first startup wizard:

dexto

It will guide you through:

  • Selecting the default LLM (recommended GPT-4o or Claude 3)
  • Entering the API key
  • Setting the storage method (recommended Redis)
  • Launching the Web UI

Once completed, the default browser will open <span>http://localhost:3000</span>

Step 3: Install the Image Agent

Back in the terminal, install the Nano Banana Agent:

dexto install nano-banana-agent

This agent is based on Google’s Gemini 2.5 Flash Image model and excels at image generation.

Step 4: Start Drawing!

Select <span>nano-banana-agent</span> in the Web UI and input the prompt:

“An orange cat in a spacesuit, watching the Earth rise on Mars, in a cyberpunk style, high definition details”

Seconds later—boom! An imaginative image appears.

You can also continue to ask:

“Change it to an anime style, add some space stations in the background”

It will edit based on the original image rather than regenerate, saving time and computing power.

The entire process is smooth and natural, like conversing with a master artist.

6. Advanced Play: Building a Multi-Agent Collaboration System

While a single agent is already powerful, what if multiple agents work together?

Dexto supports Multi-Agent Systems, enabling “teamwork”.

Scenario Example: Automatically Operating a Public Account

Imagine you want to create a “fully automated public account operation agent team” that includes:

Agent Role Responsibilities
Researcher Collect trending topics
Writer Draft articles
Editor Edit and polish
Designer Design images
Publisher Format and publish

How do they collaborate?

  1. Researcher regularly scrapes trending topics from Zhihu and Weibo
  2. Discovers the topic of “AI drawing becoming popular” → Notifies Writer
  3. Writer drafts an article titled “Introduction to Stable Diffusion”
  4. Submits it to Editor for style modification
  5. Editor thinks images are needed → Passes it to Designer
  6. Designer generates cover images + internal illustrations
  7. All materials are compiled for Publisher
  8. Publisher formats → Previews → Awaits your confirmation → Publishes

The entire process can be achieved through Dexto’s MCP (Model Context Protocol) for cross-agent communication.

7. MCP Protocol: The “Universal Language” Between Agents

MCP (Model Context Protocol) is one of Dexto’s core innovations.

Its purpose is:to provide a standard protocol that allows different AI models and tools to understand and collaborate with each other.

To draw an analogy:

  • HTTP is the communication protocol between web pages
  • SMTP is the communication protocol between emails
  • MCP is the communication protocol between AI agents

What problem does it solve?

❌ Traditional Pain Points

Previously, each tool had its own set of interface definitions, such as:

  • Tool A returns JSON
  • Tool B requires XML
  • Model C can only process plain text

This led to integration difficulties and inconsistent data formats.

✅ MCP Solution

MCP specifies a standardized data exchange format that includes:

  • Context: The background of the current task
  • Intent: What you want to do
  • Payload: Specific data
  • Metadata: Source, time, priority, etc.

This way, regardless of who initiates the request, the receiver can correctly parse and respond.

Moreover, Dexto comes with an MCP Server and Playground, making it easy for you to test your own MCP services.

8. Deployment Options: Local, Cloud, or Hybrid

One of Dexto’s biggest advantages is its ability to run anywhere.

🏠 Local Deployment (Privacy First)

Suitable for individual developers or enterprises sensitive to data.

Features:

  • All data stays local
  • Can connect to Ollama to run open-source models
  • Basic functions can be used without internet access

Command:

dexto --host 0.0.0.0 --port 3000

☁️ Cloud Deployment (High Availability)

Suitable for team collaboration or providing external services.

Can be deployed with Docker in one command:

FROM node:18
RUN npm install -g dexto
COPY . /app
WORKDIR /app
CMD ["dexto"]

Combined with Nginx + Redis + PostgreSQL to form a complete production environment.

🔄 Hybrid Architecture (Best Balance)

A more advanced approach is the “hybrid mode”:

  • Sensitive operations are executed locally (e.g., reading private documents)
  • Complex computations are done in the cloud (e.g., video generation)
  • Results are merged and returned to the user

Achieving both “security” and “performance”.

9. Who is Dexto Suitable For?

After all this, you might ask: who is this suitable for?

I have summarized four typical user types:

1. Independent Developers / Geek Players

  • Want to quickly validate AI ideas
  • Enjoy tinkering with new technologies
  • Hope to enhance personal productivity

👉 Use Dexto to create a dedicated AI assistant to help you write code, draw, and write copy, doubling your efficiency.

2. Product Managers / Designers

  • Do not understand programming but want to experience AI capabilities
  • Need to quickly prototype
  • Focus on user experience

👉 The Web UI is user-friendly, configuration is simple, and drag-and-drop operations allow zero-code interaction with AI.

3. Enterprise Technical Leaders

  • Want to embed AI capabilities into existing businesses
  • Need controllable, auditable, and scalable solutions
  • Emphasize security and compliance

👉 Dexto supports private deployment, permission control, and operation logging, meeting enterprise-level requirements.

4. AI Entrepreneurs

  • Are developing AI-native applications
  • Need to quickly iterate on MVPs
  • Hope to reduce development costs

👉 Directly build products based on Dexto, saving time on underlying framework development and focusing on business logic.

10. Future Outlook: Where Will Dexto Go?

Although Dexto is still in the Beta stage, its design philosophy and technical roadmap show great potential.

I believe its possible development directions include:

🔮 1. Becoming the “Android Operating System” of the AI Era

Just as Android unified the mobile ecosystem, Dexto is expected to become the “standard platform” for AI agents.

Third-party developers can publish their agent applications on it, forming an “AI app store.”

🧠 2. Introducing Stronger Autonomous Decision-Making Capabilities

Current agents are more “executors”; in the future, they may include:

  • Goal Decomposition
  • Self-Reflection
  • Active Inquiry

Making them truly possess “human-like thinking”.

🌐 3. Building a Decentralized Agent Network

Combining blockchain or P2P technology to achieve:

  • Free trading of services between agents
  • Users owning their AI identities
  • Data sovereignty returning to individuals

Just thinking about it is exciting.

Conclusion: The Era of AI Agents Has Arrived, Are You Ready?

Friends, we are standing at the threshold of a new era.

The past decade was the era of “mobile internet + apps”; the next decade will likely be the era of “AI agents + automation”.

Dexto is one of the first “pioneers” of this era.

It may not be perfect, but it is open enough, flexible enough, and powerful enough to give us a glimpse of the future.

So, my advice is:

Don’t just be a bystander; get your hands dirty!

Even if it’s just installing Dexto to let it help you draw a picture or write a piece of code, you will feel the shock of “AI truly working for you”.

After all, the future will not wait for anyone.

What we need to do is learn to surf before the wave arrives.

📌 Resource Links:

  • GitHub Repository: https://github.com/truffle-ai/dexto[1]
  • Official Documentation: https://docs.dexto.ai[2]
  • Discord Community: https://discord.gg/GFzWFAAZcm[3]
  • Agent Registry: https://docs.dexto.ai/docs/guides/agent-registry[4]

References

[1]

https://github.com/truffle-ai/dexto: https://github.com/truffle-ai/dexto

[2]

https://docs.dexto.ai: https://docs.dexto.ai

[3]

https://discord.gg/GFzWFAAZcm: https://discord.gg/GFzWFAAZcm

[4]

https://docs.dexto.ai/docs/guides/agent-registry: https://docs.dexto.ai/docs/guides/agent-registry

Leave a Comment