Ushur's new Voice-Guided Experience keeps a live AI voice call and a synchronized mobile visual session running concurrently — no handoffs, no context loss, no re-auth. That's a real-time media routing problem. This brief explains exactly how Cloudflare Realtime solves it, and why it's the right infrastructure layer underneath Ushur's agentic platform.
🎙️ "Hi! I'm your Ushur assistant. I'm going to help you complete your Medicaid redetermination today. First, can you confirm your member ID?"
🎙️ "Great. I see your address on file is 4821 Oak Street. Has your address changed in the last 12 months?"
🎙️ "Perfect. I need you to upload a photo of your most recent pay stub for income verification. You can tap below."
Your coverage has been renewed through December 2026. A confirmation has been sent to your phone.
The Infrastructure Gap
Ushur solved the product and compliance layers of concurrent voice-and-visual engagement. The infrastructure underneath needs to match that ambition.
Health plan members call from hospital Wi-Fi, corporate networks, and mobile carrier NATs — all of which block direct UDP. Without a managed relay, the voice call drops before the visual session even opens.
A 1-second lag between what the AI agent says and what updates on screen breaks the illusion of a synchronized experience. TCP-based transports (Twilio SIP, standard WebSocket) can't hit this consistently at scale.
Twilio Voice charges ~$0.013/min per participant. A 4-minute redetermination call with concurrent data channel = $0.052+. At 500K annual calls (IEHP), that's $26K/month — for infra alone, before Ushur's platform cost.
Every third-party SFU (Twilio, Daily, Agora) processes audio that may contain PHI — member IDs, diagnoses, claim numbers spoken aloud. HIPAA BAAs don't make media inspection go away; they just shift liability.
The Cloudflare Realtime Answer
Cloudflare Realtime is the serverless WebRTC infrastructure that removes every infrastructure problem in Ushur's Voice-Guided Experience stack — without replacing anything Ushur already built.
A Selective Forwarding Unit that routes WebRTC audio between Ushur's AI agent and the member's phone — and simultaneously routes data channel messages that sync the mobile visual screen. Both run on the same session, on the same connection, with no handoff. Serverless: Ushur never provisions a media server. Runs at 330+ cities via anycast — the nearest PoP terminates the session, keeping latency under 100ms for 95% of US health plan members.
Managed TURN relay that ensures the voice call connects even when the member is on hospital Wi-Fi, behind an enterprise proxy, or on a carrier NAT that blocks UDP. Falls back to TURN over TLS:443 — the one port that's open everywhere. $0 when used with the Realtime SFU.
Serverless JavaScript/TypeScript that runs at the edge — the same 330+ cities as the SFU. Workers handle session creation, token generation for WHIP/WHEP, PHI guardrail enforcement, and the synchronization logic that keeps the voice session and visual session in lockstep. Sub-millisecond edge execution means zero added latency to the sync loop.
One Durable Object per active Voice-Guided session. Holds the real-time state of both the voice call and the visual screen — current step, member inputs, document upload status, language detected. When the AI agent says "I see you uploaded your pay stub," the Durable Object is why that sentence is accurate. Strongly consistent, co-located with the SFU, no external database call needed.
Interactive Walkthrough
Walk through a Voice-Guided Experience session as it would run on Cloudflare Realtime. Each step shows what's happening on the member's screen, what the AI agent is saying, and what Cloudflare is doing underneath.
Member calls Ushur AI agent. Cloudflare Realtime SFU spins up a session — TURN negotiates ICE candidates, establishes voice + data channel.
AI agent sends SMS with secure link. Member taps → mobile web session opens. Durable Object binds it to the same voice session. No re-auth.
Member taps address confirmation on screen. Event fires over WebRTC data channel to Ushur's AI agent. Agent acknowledges voice: "Got it, address confirmed."
Member uploads pay stub photo directly to Cloudflare R2 via a signed URL. Worker validates, Durable Object updates state, agent confirms on voice.
Redetermination complete. Agent confirms coverage verbally. Screen shows summary. SFU session ends. Durable Object writes final audit record. Call: 3m 42s, zero agent involvement.
Through December 31, 2026
Reference Architecture
Cloudflare Realtime sits at the transport layer — it doesn't replace Ushur's AI agent, Studio, or integrations. It replaces self-managed SFU infrastructure and third-party per-minute voice billing.
Trust & Compliance
Ushur's customers are IEHP, Cigna, Aflac, CalOptima. They require HITRUST r2, HIPAA BAA, SOC 2 — and so does every vendor in their stack. Cloudflare qualifies.
The Realtime SFU forwards encrypted SRTP audio packets — it does not decode, store, or analyze media content. Member IDs and diagnoses spoken on the call never leave the encrypted stream. Cloudflare's network is the transport, not the processor.
Voice is encrypted via DTLS/SRTP. Data channel messages (screen sync events) are encrypted via DTLS. Documents uploaded to R2 are AES-256 at rest. TLS 1.3 covers all control plane traffic. Cloudflare operates HIPAA-eligible services with BAA availability.
Every session state transition — member confirmed address, document uploaded, step completed — is written to a Durable Object with a timestamp and cryptographic session ID. This log satisfies Ushur's auditability requirement and can feed directly into Ushur Insights for regulator-ready reports.
Ushur's Voice-Guided Experience supports 74 languages and auto-detects caller language. Cloudflare's anycast network terminates sessions at the nearest PoP globally — the member's voice never travels across continents. R2 location hints support data residency requirements for Irish Life, GDPR-scoped markets, and California-specific CCPA requirements.
vs. Current Vendors
Ushur already integrates with Twilio and AudioCodes for telephony. This isn't a rip-and-replace — it's a transport layer decision for the SFU and data channel portion of Voice-Guided Experience.