AI Agents v3: Multimodal Understanding and Autonomous Workflows

AI Agents v3 Is Here

We’re thrilled to announce AI Agents v3 — a major leap forward in what autonomous AI agents can perceive, understand, and accomplish. This release introduces three capabilities our customers have been requesting: multimodal input, voice interaction, and autonomous workflow execution.

See and Understand: Vision Capabilities

AI Agents can now process images, screenshots, documents, and video frames alongside text:

Document processing — agents can read invoices, contracts, and forms, extracting structured data without manual templates
Visual inspection — manufacturing and QA teams can deploy agents that identify defects from product images
Screenshot analysis — IT support agents can understand user-submitted screenshots to diagnose issues faster
Chart interpretation — agents can analyze charts and graphs in reports, translating visual data into actionable insights

Talk Naturally: Voice Interaction

Voice-enabled agents bring natural conversation to customer-facing and internal workflows:

Real-time voice processing with sub-200ms latency for natural conversation flow
Accent and dialect adaptation supporting over 40 language variants
Emotion detection to escalate frustrated callers proactively
Voice authentication for secure identity verification without passwords

Set It and Forget It: Autonomous Workflows

The new workflow engine allows agents to execute complex, multi-step processes independently:

Conditional branching — agents make decisions based on real-time data and context
Tool integration — agents can call APIs, query databases, send notifications, and update records
Human-in-the-loop checkpoints for high-stakes decisions that require approval
Workflow templates for common patterns like customer onboarding, incident response, and order fulfillment

Upgrading

All AI Agents v2 configurations are fully compatible with v3. New capabilities can be enabled incrementally — start with vision for document processing, then expand to voice and autonomous workflows as your team is ready.