Skip to main content

Live Transcription Features

Advanced features for real-time transcription sessions.

Session Configuration

Multi-Channel Support

Configure sessions for different audio formats:

curl -X POST "https://api.example.com/api/v1/live-transcribe/start" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"Name": "Two-party call",
"NumberOfChannels": 2,
"Language": "Auto",
"Username": "agent_001"
}'

Channel Configurations:

  • 1 Channel (Mono): Single speaker or mixed audio
  • 2 Channels (Stereo): Separate agent and customer

Network Information

Track call endpoints with Local and Remote parameters:

{
"Name": "Call from NYC to LA",
"Local": "192.168.1.100",
"Remote": "203.0.113.45",
"NumberOfChannels": 2,
"Language": "ENGLISH",
"Username": "agent_john"
}

Real-Time Transcription Payloads

Payload Structure

{
"audio": "SUQzBAAAAAAAI1NTVUUA...",
"duration": 5.5,
"sampleRate": 16000,
"frequency": 440
}

Streaming Characteristics

  • Latency: Typically 500ms - 2 seconds
  • Chunk Size: 100-500ms audio recommended
  • Update Frequency: 500ms to 2 seconds between polls

Language Support

All 100+ supported languages available for live transcription:

# Explicit language selection
{
"Language": "SPANISH"
}

# Automatic detection
{
"Language": "Auto"
}

Audio Quality Monitoring

Track audio quality metrics from payload:

  • Sample Rate: Audio sampling rate (typically 16000 Hz)
  • Frequency: Detected frequency information
  • Duration: Length of current segment

Session Metadata

Tracking Information

Store session details for audit and analytics:

{
"Name": "Premium Customer - High Priority",
"Username": "agent_johndoe",
"Local": "10.0.0.5",
"Remote": "203.0.113.100"
}

Session ID Usage

Use session ID for:

  • Getting payload/transcription data
  • Stopping the session
  • Correlating with call records
  • Post-processing and analysis

Connection Management

WebSocket Lifecycle

  1. Connect: wss://api.example.com/ws/live-transcribe/{sessionId}
  2. Receive Messages: Real-time transcription updates
  3. Disconnect: Session stops automatically

Connection Persistence

  • Automatic reconnection support
  • Heartbeat mechanism for idle connections
  • Graceful timeout handling

Error Handling

Common Error Responses

{
"error": "Session not found",
"code": "SESSION_NOT_FOUND"
}

Error Codes

  • INVALID_TOKEN: Authentication failed
  • SESSION_NOT_FOUND: Session ID doesn't exist
  • SESSION_EXPIRED: Session has ended
  • CHANNEL_ERROR: Audio channel issue
  • LANGUAGE_NOT_SUPPORTED: Unsupported language

Performance Optimization

Polling Strategy

Recommended intervals:

- Every 500ms: For real-time UI updates
- Every 1s: For normal monitoring
- Every 2s: For batch processing

Chunk Management

Audio Chunk Size: 160 samples = 10ms @ 16kHz
Send Rate: 100-500ms chunks (10-50 samples)

Bandwidth Requirements

  • Mono (1 channel): ~256 kbps
  • Stereo (2 channels): ~512 kbps

Security Features

Token-Based Authentication

  • Bearer token required for all operations
  • Token expiration enforced
  • Per-session authorization

Data Privacy

  • Audio streams encrypted in transit (WSS)
  • Session-isolated data
  • No data retention after session ends

Rate Limiting

  • Per-user session limits
  • Request throttling to prevent abuse
  • Queue management for scalability

Advanced Session Features

Metadata Tracking

Associate additional data with sessions:

// Custom metadata (if supported)
const metadata = {
campaignId: "summer_2025",
agentId: "A123",
priority: "high",
customerId: "C456"
};

Call Correlation

Link live transcription to call records:

{
"sessionId": 987654321,
"callId": "CALL-2025-11-28-001",
"startTime": "2025-11-28T10:00:00Z",
"duration": 300
}

Monitoring and Alerting

Session Health Metrics

{
"sessionId": 987654321,
"status": "active",
"uptime": 120,
"payloadsReceived": 45,
"averageLatency": 750,
"errors": 0
}

Alert Conditions

  • Session disconnection
  • Audio quality degradation
  • Language detection failure
  • High latency (>5s)
  • Channel errors

Scalability

Concurrent Sessions

Tier Limits:

  • Starter: 1 concurrent session
  • Professional: 5 concurrent sessions
  • Enterprise: Unlimited

Session Management

# Managing multiple sessions
sessions = {}

def create_session(name):
# Start new session
session_id = start_transcription(name)
sessions[name] = {
'id': session_id,
'started_at': datetime.now(),
'status': 'active'
}
return session_id

def get_all_payloads():
# Fetch all active transcriptions
payloads = {}
for name, session in sessions.items():
if session['status'] == 'active':
payloads[name] = get_payload(session['id'])
return payloads

Integration Patterns

Call Center Integration

Phone System -> API Client -> SpeechLytics -> Results Dashboard

Multi-Channel Broadcasting

Live Call -> [Channel 1: Agent]
-> [Channel 2: Customer]
-> SpeechLytics -> Separate Transcription

Real-Time Dashboard

// Real-time transcription display
class DashboardUpdater {
constructor(sessionId) {
this.sessionId = sessionId;
this.pollInterval = 1000; // 1 second
}

startUpdating() {
this.updateInterval = setInterval(() => {
this.fetchAndDisplay();
}, this.pollInterval);
}

async fetchAndDisplay() {
const payload = await this.getPayload();
this.updateUI(payload);
}

stopUpdating() {
clearInterval(this.updateInterval);
}
}

Best Practices

1. Resource Management

  • Always stop sessions explicitly
  • Clean up connections gracefully
  • Monitor memory usage for long sessions
  • Implement session pooling for scalability

2. Audio Handling

  • Validate audio quality before streaming
  • Implement jitter buffer for stability
  • Use compression for bandwidth efficiency
  • Monitor sample rates for consistency

3. Error Recovery

  • Implement exponential backoff
  • Provide fallback mechanisms
  • Log all errors comprehensively
  • Set up alerting for critical issues

4. Performance Tuning

  • Adjust polling intervals based on latency
  • Batch operations when possible
  • Use connection pooling
  • Monitor and optimize chunk sizes

Troubleshooting

Session Won't Start

  • Verify authentication token is valid
  • Check account has live transcription enabled
  • Ensure language parameter is valid
  • Verify network connectivity

No Audio Received

  • Confirm audio is being streamed
  • Check sample rate compatibility
  • Verify channel count matches configuration
  • Monitor network latency

Transcription Delays

  • Reduce polling interval
  • Check network bandwidth
  • Verify audio quality
  • Monitor server load

Next Steps