Live Transcription Features
Advanced features for real-time transcription sessions.
Session Configuration
Multi-Channel Support
Configure sessions for different audio formats:
curl -X POST "https://api.example.com/api/v1/live-transcribe/start" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"Name": "Two-party call",
"NumberOfChannels": 2,
"Language": "Auto",
"Username": "agent_001"
}'
Channel Configurations:
- 1 Channel (Mono): Single speaker or mixed audio
- 2 Channels (Stereo): Separate agent and customer
Network Information
Track call endpoints with Local and Remote parameters:
{
"Name": "Call from NYC to LA",
"Local": "192.168.1.100",
"Remote": "203.0.113.45",
"NumberOfChannels": 2,
"Language": "ENGLISH",
"Username": "agent_john"
}
Real-Time Transcription Payloads
Payload Structure
{
"audio": "SUQzBAAAAAAAI1NTVUUA...",
"duration": 5.5,
"sampleRate": 16000,
"frequency": 440
}
Streaming Characteristics
- Latency: Typically 500ms - 2 seconds
- Chunk Size: 100-500ms audio recommended
- Update Frequency: 500ms to 2 seconds between polls
Language Support
All 100+ supported languages available for live transcription:
# Explicit language selection
{
"Language": "SPANISH"
}
# Automatic detection
{
"Language": "Auto"
}
Audio Quality Monitoring
Track audio quality metrics from payload:
- Sample Rate: Audio sampling rate (typically 16000 Hz)
- Frequency: Detected frequency information
- Duration: Length of current segment
Session Metadata
Tracking Information
Store session details for audit and analytics:
{
"Name": "Premium Customer - High Priority",
"Username": "agent_johndoe",
"Local": "10.0.0.5",
"Remote": "203.0.113.100"
}
Session ID Usage
Use session ID for:
- Getting payload/transcription data
- Stopping the session
- Correlating with call records
- Post-processing and analysis
Connection Management
WebSocket Lifecycle
- Connect:
wss://api.example.com/ws/live-transcribe/{sessionId} - Receive Messages: Real-time transcription updates
- Disconnect: Session stops automatically
Connection Persistence
- Automatic reconnection support
- Heartbeat mechanism for idle connections
- Graceful timeout handling
Error Handling
Common Error Responses
{
"error": "Session not found",
"code": "SESSION_NOT_FOUND"
}
Error Codes
INVALID_TOKEN: Authentication failedSESSION_NOT_FOUND: Session ID doesn't existSESSION_EXPIRED: Session has endedCHANNEL_ERROR: Audio channel issueLANGUAGE_NOT_SUPPORTED: Unsupported language
Performance Optimization
Polling Strategy
Recommended intervals:
- Every 500ms: For real-time UI updates
- Every 1s: For normal monitoring
- Every 2s: For batch processing
Chunk Management
Audio Chunk Size: 160 samples = 10ms @ 16kHz
Send Rate: 100-500ms chunks (10-50 samples)
Bandwidth Requirements
- Mono (1 channel): ~256 kbps
- Stereo (2 channels): ~512 kbps
Security Features
Token-Based Authentication
- Bearer token required for all operations
- Token expiration enforced
- Per-session authorization
Data Privacy
- Audio streams encrypted in transit (WSS)
- Session-isolated data
- No data retention after session ends
Rate Limiting
- Per-user session limits
- Request throttling to prevent abuse
- Queue management for scalability
Advanced Session Features
Metadata Tracking
Associate additional data with sessions:
// Custom metadata (if supported)
const metadata = {
campaignId: "summer_2025",
agentId: "A123",
priority: "high",
customerId: "C456"
};
Call Correlation
Link live transcription to call records:
{
"sessionId": 987654321,
"callId": "CALL-2025-11-28-001",
"startTime": "2025-11-28T10:00:00Z",
"duration": 300
}
Monitoring and Alerting
Session Health Metrics
{
"sessionId": 987654321,
"status": "active",
"uptime": 120,
"payloadsReceived": 45,
"averageLatency": 750,
"errors": 0
}
Alert Conditions
- Session disconnection
- Audio quality degradation
- Language detection failure
- High latency (>5s)
- Channel errors
Scalability
Concurrent Sessions
Tier Limits:
- Starter: 1 concurrent session
- Professional: 5 concurrent sessions
- Enterprise: Unlimited
Session Management
# Managing multiple sessions
sessions = {}
def create_session(name):
# Start new session
session_id = start_transcription(name)
sessions[name] = {
'id': session_id,
'started_at': datetime.now(),
'status': 'active'
}
return session_id
def get_all_payloads():
# Fetch all active transcriptions
payloads = {}
for name, session in sessions.items():
if session['status'] == 'active':
payloads[name] = get_payload(session['id'])
return payloads
Integration Patterns
Call Center Integration
Phone System -> API Client -> SpeechLytics -> Results Dashboard
Multi-Channel Broadcasting
Live Call -> [Channel 1: Agent]
-> [Channel 2: Customer]
-> SpeechLytics -> Separate Transcription
Real-Time Dashboard
// Real-time transcription display
class DashboardUpdater {
constructor(sessionId) {
this.sessionId = sessionId;
this.pollInterval = 1000; // 1 second
}
startUpdating() {
this.updateInterval = setInterval(() => {
this.fetchAndDisplay();
}, this.pollInterval);
}
async fetchAndDisplay() {
const payload = await this.getPayload();
this.updateUI(payload);
}
stopUpdating() {
clearInterval(this.updateInterval);
}
}
Best Practices
1. Resource Management
- Always stop sessions explicitly
- Clean up connections gracefully
- Monitor memory usage for long sessions
- Implement session pooling for scalability
2. Audio Handling
- Validate audio quality before streaming
- Implement jitter buffer for stability
- Use compression for bandwidth efficiency
- Monitor sample rates for consistency
3. Error Recovery
- Implement exponential backoff
- Provide fallback mechanisms
- Log all errors comprehensively
- Set up alerting for critical issues
4. Performance Tuning
- Adjust polling intervals based on latency
- Batch operations when possible
- Use connection pooling
- Monitor and optimize chunk sizes
Troubleshooting
Session Won't Start
- Verify authentication token is valid
- Check account has live transcription enabled
- Ensure language parameter is valid
- Verify network connectivity
No Audio Received
- Confirm audio is being streamed
- Check sample rate compatibility
- Verify channel count matches configuration
- Monitor network latency
Transcription Delays
- Reduce polling interval
- Check network bandwidth
- Verify audio quality
- Monitor server load
Next Steps
- Getting Started - Basic setup
- Audio Intelligence - Extract insights from live sessions
- API Reference - Complete API documentation