Prepared for Ushur Engineering & Product Team

The infrastructure behind
Voice-Guided Experience

Ushur's new Voice-Guided Experience keeps a live AI voice call and a synchronized mobile visual session running concurrently — no handoffs, no context loss, no re-auth. That's a real-time media routing problem. This brief explains exactly how Cloudflare Realtime solves it, and why it's the right infrastructure layer underneath Ushur's agentic platform.

AI Agent · Active Call 0:00

🎙️ "Hi! I'm your Ushur assistant. I'm going to help you complete your Medicaid redetermination today. First, can you confirm your member ID?"

🎙️ "Great. I see your address on file is 4821 Oak Street. Has your address changed in the last 12 months?"

🎙️ "Perfect. I need you to upload a photo of your most recent pay stub for income verification. You can tap below."

Tap to upload pay stub JPG, PNG or PDF · Max 10MB

Redetermination Complete

Your coverage has been renewed through December 2026. A confirmation has been sent to your phone.

⏱ Time to complete: 3m 42s 📞 No live agent needed
Step 1 of 4
Powered by
Cloudflare Realtime SFU WebRTC voice session · anycast 330+ cities
~48ms
Cloudflare TURN NAT traversal · TLS:443 fallback
Connected
Data Channel Screen sync · same session as voice
In sync
Workers Edge Logic Auth · session mgmt · PHI guardrails
Active
HIPAA-secure HITRUST r2 PHI encrypted No media inspection

Voice + visual sync is a hard real-time problem

Ushur solved the product and compliance layers of concurrent voice-and-visual engagement. The infrastructure underneath needs to match that ambition.

NAT & firewall traversal

Health plan members call from hospital Wi-Fi, corporate networks, and mobile carrier NATs — all of which block direct UDP. Without a managed relay, the voice call drops before the visual session even opens.

Current gap

Sub-500ms voice latency requirement

A 1-second lag between what the AI agent says and what updates on screen breaks the illusion of a synchronized experience. TCP-based transports (Twilio SIP, standard WebSocket) can't hit this consistently at scale.

Current gap

Vendor per-minute cost at IEHP/Cigna scale

Twilio Voice charges ~$0.013/min per participant. A 4-minute redetermination call with concurrent data channel = $0.052+. At 500K annual calls (IEHP), that's $26K/month — for infra alone, before Ushur's platform cost.

Cost exposure

PHI on third-party media servers

Every third-party SFU (Twilio, Daily, Agora) processes audio that may contain PHI — member IDs, diagnoses, claim numbers spoken aloud. HIPAA BAAs don't make media inspection go away; they just shift liability.

Compliance risk

One platform. Every layer of the voice-visual stack.

Cloudflare Realtime is the serverless WebRTC infrastructure that removes every infrastructure problem in Ushur's Voice-Guided Experience stack — without replacing anything Ushur already built.

Connectivity

TURN Service

Managed TURN relay that ensures the voice call connects even when the member is on hospital Wi-Fi, behind an enterprise proxy, or on a carrier NAT that blocks UDP. Falls back to TURN over TLS:443 — the one port that's open everywhere. $0 when used with the Realtime SFU.

Ushur fit: Health plan members call from every conceivable network. TURN over TLS:443 guarantees the call connects — no dropped sessions at the moment a member is trying to complete a redetermination or claim.

Edge Compute

Cloudflare Workers

Serverless JavaScript/TypeScript that runs at the edge — the same 330+ cities as the SFU. Workers handle session creation, token generation for WHIP/WHEP, PHI guardrail enforcement, and the synchronization logic that keeps the voice session and visual session in lockstep. Sub-millisecond edge execution means zero added latency to the sync loop.

Ushur fit: The auth and session orchestration that Ushur's platform already does maps directly to Workers bindings — no separate middleware, no additional network hops.

Stateful Coordination

Durable Objects

One Durable Object per active Voice-Guided session. Holds the real-time state of both the voice call and the visual screen — current step, member inputs, document upload status, language detected. When the AI agent says "I see you uploaded your pay stub," the Durable Object is why that sentence is accurate. Strongly consistent, co-located with the SFU, no external database call needed.

Ushur fit: The "no context loss, no re-authentication" promise of Voice-Guided Experience requires stateful coordination between the voice and visual planes. Durable Objects are that state, serverless and globally consistent.

Medicaid Redetermination — step by step

Walk through a Voice-Guided Experience session as it would run on Cloudflare Realtime. Each step shows what's happening on the member's screen, what the AI agent is saying, and what Cloudflare is doing underneath.

1
Call initiated

Member calls Ushur AI agent. Cloudflare Realtime SFU spins up a session — TURN negotiates ICE candidates, establishes voice + data channel.

SFU session created TURN ICE negotiation Worker generates session token
2
Visual screen opens

AI agent sends SMS with secure link. Member taps → mobile web session opens. Durable Object binds it to the same voice session. No re-auth.

Durable Object binds sessions Data channel sync active PHI guardrails enforce
3
Member interacts visually

Member taps address confirmation on screen. Event fires over WebRTC data channel to Ushur's AI agent. Agent acknowledges voice: "Got it, address confirmed."

Data channel message → agent Durable Object state updated Voice response via SFU
4
Document upload

Member uploads pay stub photo directly to Cloudflare R2 via a signed URL. Worker validates, Durable Object updates state, agent confirms on voice.

Signed R2 upload URL PHI encrypted at rest (AES-256) Audit trail written
5
Resolution & close

Redetermination complete. Agent confirms coverage verbally. Screen shows summary. SFU session ends. Durable Object writes final audit record. Call: 3m 42s, zero agent involvement.

Session ended cleanly Audit record written (HITRUST) Egress cost: ~$0.003
Ushur AI · On Call 🔒 HIPAA
🎙 "Hi! I'm going to help you with your Medicaid redetermination. Can you confirm your member ID?"
Cloudflare Realtime SFU · Session started
MED-2024-88142
Medi-Cal · CalOptima Health
🎙 "Thanks! I see your address is 4821 Oak Street, Santa Ana. Has this changed?"
Durable Object · Voice + visual bound
🎙 "Got it. I tapped 'address confirmed' for you. Now I need your most recent pay stub."
Data channel → agent acknowledged
Voice & screen in sync
🎙 "Please tap below to upload your pay stub. I'll wait while you do that."
R2 signed upload · AES-256 at rest
pay_stub_april.jpg Uploaded ✓
🎙 "All done! Your coverage is renewed through December 2026. Have a great day!"
Audit record written · Session closed
Coverage Renewed

Through December 31, 2026

3m 42s 0 agents HITRUST compliant
Step 1 of 5

How Cloudflare fits into Ushur's stack

Cloudflare Realtime sits at the transport layer — it doesn't replace Ushur's AI agent, Studio, or integrations. It replaces self-managed SFU infrastructure and third-party per-minute voice billing.

Member Phone call + mobile screen
WHIP / WebRTC
Cloudflare Realtime SFU Voice · Data channel · TURN
WHEP / WebRTC
Ushur AI Agent Voice synthesis · NLU · Studio
Workers Auth · PHI guardrails · Orchestration
Durable Objects Session state · voice + visual sync
R2 Storage PHI docs · AES-256 · zero egress
Ushur Platform Studio · Insights · CCaaS integrations (AudioCodes, Twilio)
EHR / Payer Systems Salesforce · ServiceNow · Claims platforms
No media server to manage. The SFU is serverless — Ushur never provisions, scales, or patches a media server. Cloudflare's network handles all capacity automatically.
PHI never touches Cloudflare's application layer. The SFU forwards encrypted audio bytes. Workers enforce PHI guardrails at the session edge. Cloudflare never inspects media content.
Durable Objects = the sync engine. One object per session holds the state of both the voice call and visual screen. When a member taps "address confirmed," the DO updates and the AI agent knows in <10ms.
Existing integrations unchanged. Ushur's AudioCodes/Twilio telephony stack, Studio, and CCaaS escalation paths continue to work. Cloudflare Realtime replaces the SFU transport, not Ushur's application logic.

Built for the same regulated world Ushur serves

Ushur's customers are IEHP, Cigna, Aflac, CalOptima. They require HITRUST r2, HIPAA BAA, SOC 2 — and so does every vendor in their stack. Cloudflare qualifies.

No PHI in Cloudflare's application layer

The Realtime SFU forwards encrypted SRTP audio packets — it does not decode, store, or analyze media content. Member IDs and diagnoses spoken on the call never leave the encrypted stream. Cloudflare's network is the transport, not the processor.

AES-256 + TLS 1.3 end to end

Voice is encrypted via DTLS/SRTP. Data channel messages (screen sync events) are encrypted via DTLS. Documents uploaded to R2 are AES-256 at rest. TLS 1.3 covers all control plane traffic. Cloudflare operates HIPAA-eligible services with BAA availability.

Audit trail via Durable Objects

Every session state transition — member confirmed address, document uploaded, step completed — is written to a Durable Object with a timestamp and cryptographic session ID. This log satisfies Ushur's auditability requirement and can feed directly into Ushur Insights for regulator-ready reports.

Data residency & 74-language reach

Ushur's Voice-Guided Experience supports 74 languages and auto-detects caller language. Cloudflare's anycast network terminates sessions at the nearest PoP globally — the member's voice never travels across continents. R2 location hints support data residency requirements for Irish Life, GDPR-scoped markets, and California-specific CCPA requirements.

HIPAA-eligible · BAA available
SOC 2 Type II
ISO 27001
GDPR compliant
FedRAMP (in process)
TLS 1.3 · DTLS/SRTP

Cloudflare Realtime vs. Twilio / Daily / Agora

Ushur already integrates with Twilio and AudioCodes for telephony. This isn't a rip-and-replace — it's a transport layer decision for the SFU and data channel portion of Voice-Guided Experience.

Feature
Cloudflare Realtime
Twilio / Daily / Agora
Pricing model
$0.05/GB egress · no per-minute charge
$0.013–0.035/min per participant
TURN cost
$0 when used with SFU
Separate billing or self-managed
Media server ops
Zero — fully serverless
Managed, but capacity planning required
PHI in application layer
No — SFU forwards encrypted SRTP only
Varies — depends on recording & transcription config
Voice + data channel on same session
Native — single WebRTC session
Possible, requires custom implementation
Global anycast PoPs
330+ cities
~20–50 regions
Stateful session sync
Durable Objects — built-in, co-located
External DB required (Redis, DynamoDB)
Estimated cost · 500K calls/yr · 4 min avg
~$2,400/yr (egress only)
~$52,000–$140,000/yr
For Ushur Engineering & Product

Three ways to move forward

Cloudflare Realtime is available today. The integration path with Ushur's existing stack is well-defined and can be scoped in a single technical call.

01

Technical POC

Run a 2-week POC using Cloudflare's realtime-examples (WHIP/WHEP server) against a test Ushur Voice-Guided session. Benchmark latency vs. current SFU, confirm data channel sync behavior. Cloudflare's SE team joins to help scope.

Start free →
03

Custom Pricing

At IEHP/Cigna scale (500K+ calls/year), Cloudflare offers committed-use pricing that further reduces the per-GB cost. Enterprise agreements include HIPAA BAA, dedicated support, and custom data residency configurations for international Ushur deployments (Irish Life, GDPR markets).

Get pricing →