Baltic Summit 2026
24/09/2026, Pomorski Park Naukowo-Technologiczny Gdynia
Gdynia, Poland
Talk to the Future: Building Voice Agents with Azure AI Foundry and the Realtime API
voice-agent
microsoft-foundry
gpt-realtime
What
A 7-hour hands-on workshop that teaches you how to design, build, and deploy production-ready voice agents using Azure AI Foundry and the GPT-4o Realtime API — covering the full architecture from audio capture and real-time speech processing through to natural language understanding, tool integration, and speech output, built entirely in your own environment.
Why
Voice is the most natural human interface — and the GPT-4o Realtime API makes enterprise-grade conversational voice agents achievable for the first time without complex custom speech pipelines. Yet most organisations have no clear path from seeing an impressive voice demo to building something reliable, secure, and scalable in their own environment. This workshop bridges that gap entirely — covering the architecture, the implementation, the failure modes, and the production patterns that separate a compelling demo from a trustworthy enterprise voice agent.
How
The workshop runs across 7 hours structured as follows:
1. Architecture and environment check — 60 minutes: Voice agent architecture deep dive — audio capture and streaming, GPT-4o Realtime API capabilities, Azure AI Foundry as the deployment platform, latency management, interruption handling, and security considerations. Environment verification before labs begin.
2. Lab 1 — Foundation — 70 minutes: Setting up Azure AI Foundry, configuring the GPT-4o Realtime API endpoint, establishing the WebSocket connection, and building the first working voice interaction — a basic but complete voice agent handling real conversational turns with context retention.
3. Lab 2 — Natural conversation — 80 minutes: Making the voice agent feel genuinely conversational — handling natural interruptions, managing multi-turn context, tuning response latency, and implementing fallback handling for edge cases that expose the gap between demo-quality and production-quality voice behaviour.
4. Lab 3 — Tool integration — 80 minutes: Connecting the voice agent to real enterprise data and actions — integrating Power Automate flows and Azure Functions as tools the voice agent can invoke during live conversations, demonstrated against real business scenarios including live data lookup and action execution triggered entirely through voice.
5. Lab 4 — Model comparison — 45 minutes: Switching the same voice agent between GPT and Claude models — comparing processing speed, response naturalness, reasoning depth, and tool invocation accuracy across identical voice scenarios to give attendees an evidence-based model selection framework for their own voice agent implementations.
6. Production readiness — 30 minutes: Hosting options, cost management at voice scale, monitoring conversation quality in production, security hardening, and honest lessons learned from taking voice agents from prototype to production in real enterprise environments.
7. Q&A and open discussion — 30 minutes: Architecture decisions, Azure AI Foundry configuration, real-world implementation challenges, and guidance on applying voice agent patterns back at work.
Attendees bring their own laptop and environment. No lab infrastructure provided — prerequisites must be completed before arrival.
Who
Developers, solution architects, and technical leads building or evaluating voice-enabled AI experiences on the Microsoft AI platform. Skill level: Advanced.
Prerequisites:
- Active Azure subscription with Azure AI Foundry access enabled
- Active Power Platform environment with Copilot Studio and Power Automate enabled
- Basic familiarity with Azure portal and Azure AI services
- Basic understanding of WebSocket communication patterns
- Working knowledge of REST APIs and JSON
- Familiarity with either Python or JavaScript — in-browser editing used throughout, no local tooling required
Top 3 Key Takeaways
1. A complete architectural understanding of production-grade voice agents on Azure AI Foundry and the GPT-4o Realtime API — including latency management, interruption handling, and tool integration patterns that make voice agents feel genuinely natural rather than robotic.
2. Hands-on experience building a voice agent that handles real conversational scenarios, live data lookup, and action execution through voice — in your own environment against real business scenarios, not a pre-configured demo tenant.
3. An evidence-based model comparison across GPT and Claude on identical voice scenarios — giving you the practical insight to make informed model selection decisions for your own voice agent implementations from day one.