Getting Started with Live Transcription
Enable real-time speech-to-text transcription for live calls and audio streams using SpeechLytics Live Transcription API. This guide covers the complete workflow for starting and managing live transcription sessions.
Prerequisites
- Valid SpeechLytics account credentials
- Bearer authentication token
- Audio stream source (phone call, VoIP, microphone, etc.)
- WebSocket or HTTP polling support
Architecture Overview
The Live Transcription API uses a session-based model:
- Start Session: Create a new live transcription session
- Stream Audio: Send audio data in real-time
- Get Payload: Retrieve transcription results continuously
- Stop Session: End the transcription session
Step 1: Get Authentication Token
Obtain a Bearer token using your credentials:
curl -X POST "https://api.example.com/api/v1/auth/token" \
-H "Content-Type: application/json" \
-d '{
"Username": "your_username",
"Password": "your_password"
}'
Response:
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expires": "2025-11-28T23:59:59Z",
"eventId": "evt_12345"
}
Step 2: Start a Live Transcription Session
Initiate a new live transcription session:
curl -X POST "https://api.example.com/api/v1/live-transcribe/start" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"Name": "Customer Support Call - John Doe",
"NumberOfChannels": 2,
"Language": "Auto",
"Username": "agent_123",
"Local": "192.168.1.100",
"Remote": "192.168.1.50"
}'
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| Name | string | Yes | Descriptive name for the session |
| NumberOfChannels | integer | Yes | 1 for mono, 2 for stereo |
| Language | enum | Yes | Language code or Auto for detection |
| Username | string | Yes | Identifier for the agent/user |
| Local | string | No | Local IP address |
| Remote | string | No | Remote IP address |
Response
{
"id": 987654321,
"url": "wss://api.example.com/ws/live-transcribe/987654321"
}
Response Fields
| Field | Description |
|---|---|
| id | Unique session ID for tracking |
| url | WebSocket URL for streaming audio and receiving results |
Step 3: Stream Audio and Receive Results
Option A: WebSocket Connection
Connect to the WebSocket URL provided in the start response:
// JavaScript/Node.js example
const ws = new WebSocket('wss://api.example.com/ws/live-transcribe/987654321');
ws.onopen = () => {
console.log('Connected to live transcription');
// Start streaming audio
};
ws.onmessage = (event) => {
const result = JSON.parse(event.data);
console.log('Transcription:', result.content);
console.log('Timestamp:', result.timestamp);
};
ws.onerror = (error) => {
console.error('WebSocket error:', error);
};
ws.onclose = () => {
console.log('Disconnected from live transcription');
};
Option B: HTTP Polling
Alternatively, poll the payload endpoint for transcription results:
curl -X GET "https://api.example.com/api/v1/live-transcribe/987654321/payload" \
-H "Authorization: Bearer YOUR_TOKEN"
Response:
{
"audio": "base64_encoded_audio_data",
"duration": 5.5,
"sampleRate": 16000,
"frequency": 440
}
Step 4: Monitor Transcription Progress
Continuously retrieve transcription payload:
# Poll every 500ms to 1 second
curl -X GET "https://api.example.com/api/v1/live-transcribe/987654321/payload" \
-H "Authorization: Bearer YOUR_TOKEN"
Payload Fields
| Field | Description |
|---|---|
| audio | Base64-encoded audio segment |
| duration | Duration of this segment in seconds |
| sampleRate | Audio sample rate (typically 16000 Hz) |
| frequency | Frequency information |
Step 5: Stop the Live Transcription Session
End the transcription session when the call is complete:
curl -X POST "https://api.example.com/api/v1/live-transcribe/stop" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"Id": 987654321
}'
Request Parameters
| Parameter | Type | Description |
|---|---|---|
| Id | integer | The session ID to stop |
Response
{
"isValid": true
}
Complete Workflow Example
Python (Polling Method)
import requests
import time
import base64
from threading import Thread
API_BASE = "https://api.example.com"
USERNAME = "your_username"
PASSWORD = "your_password"
class LiveTranscriber:
def __init__(self):
self.session_id = None
self.token = None
self.is_running = False
def authenticate(self):
"""Get authentication token"""
response = requests.post(
f"{API_BASE}/api/v1/auth/token",
json={"Username": USERNAME, "Password": PASSWORD}
)
self.token = response.json()['token']
self.headers = {"Authorization": f"Bearer {self.token}"}
def start_session(self, call_name, channels=2, language="Auto"):
"""Start live transcription session"""
response = requests.post(
f"{API_BASE}/api/v1/live-transcribe/start",
headers=self.headers,
json={
"Name": call_name,
"NumberOfChannels": channels,
"Language": language,
"Username": "agent_001"
}
)
data = response.json()
self.session_id = data['id']
self.websocket_url = data['url']
print(f"Started session: {self.session_id}")
print(f"WebSocket URL: {self.websocket_url}")
def get_payload(self):
"""Retrieve current transcription payload"""
response = requests.get(
f"{API_BASE}/api/v1/live-transcribe/{self.session_id}/payload",
headers=self.headers
)
return response.json()
def monitor(self):
"""Monitor transcription in separate thread"""
self.is_running = True
while self.is_running:
try:
payload = self.get_payload()
if payload.get('audio'):
print(f"Duration: {payload['duration']}s")
print(f"Sample Rate: {payload['sampleRate']}")
time.sleep(1) # Poll every second
except Exception as e:
print(f"Error: {e}")
def stop_session(self):
"""Stop the transcription session"""
response = requests.post(
f"{API_BASE}/api/v1/live-transcribe/stop",
headers=self.headers,
json={"Id": self.session_id}
)
self.is_running = False
print("Session stopped")
return response.json()
# Usage
transcriber = LiveTranscriber()
transcriber.authenticate()
transcriber.start_session("Support Call - Customer XYZ")
# Start monitoring in background thread
monitor_thread = Thread(target=transcriber.monitor)
monitor_thread.daemon = True
monitor_thread.start()
# Simulate call duration
time.sleep(30)
# Stop transcription
transcriber.stop_session()
JavaScript (WebSocket Method)
class LiveTranscriber {
constructor(apiBase, username, password) {
this.apiBase = apiBase;
this.username = username;
this.password = password;
this.token = null;
this.sessionId = null;
this.ws = null;
}
async authenticate() {
const response = await fetch(`${this.apiBase}/api/v1/auth/token`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
Username: this.username,
Password: this.password
})
});
const data = await response.json();
this.token = data.token;
}
async startSession(callName, channels = 2, language = 'Auto') {
const response = await fetch(`${this.apiBase}/api/v1/live-transcribe/start`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${this.token}`
},
body: JSON.stringify({
Name: callName,
NumberOfChannels: channels,
Language: language,
Username: 'agent_001'
})
});
const data = await response.json();
this.sessionId = data.id;
this.wsUrl = data.url;
console.log(`Started session: ${this.sessionId}`);
this.connectWebSocket();
}
connectWebSocket() {
this.ws = new WebSocket(this.wsUrl);
this.ws.onopen = () => {
console.log('Connected to live transcription');
};
this.ws.onmessage = (event) => {
const result = JSON.parse(event.data);
console.log('Transcription:', result);
this.onTranscription(result);
};
this.ws.onerror = (error) => {
console.error('WebSocket error:', error);
};
this.ws.onclose = () => {
console.log('Disconnected');
};
}
onTranscription(result) {
// Handle transcription result
// Override this method to process results
console.log(`${result.timestamp}: ${result.content}`);
}
async stopSession() {
const response = await fetch(`${this.apiBase}/api/v1/live-transcribe/stop`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${this.token}`
},
body: JSON.stringify({ Id: this.sessionId })
});
if (this.ws) this.ws.close();
return response.json();
}
}
// Usage
const transcriber = new LiveTranscriber(
'https://api.example.com',
'username',
'password'
);
await transcriber.authenticate();
await transcriber.startSession('Support Call');
// Stop after 60 seconds
setTimeout(() => transcriber.stopSession(), 60000);
Session Management
Session Limits
- Maximum session duration: 24 hours
- Automatic cleanup after session stop
- Idle timeout: 30 minutes
Active Sessions
- One session per user account
- Multiple concurrent sessions supported (enterprise tier)
- Session state persists until stopped
Audio Streaming
Supported Formats
- PCM 16-bit, 16kHz (default)
- PCM 8-bit, 8kHz
- μ-law (PCMU)
- A-law (PCMA)
Recommended Bitrate
- Mono: 256 kbps
- Stereo: 512 kbps
Chunking Strategy
- Send 100-500ms audio chunks
- Maintain consistent spacing
- Handle network latency gracefully
Best Practices
1. Session Management
- Always stop sessions explicitly
- Implement timeout handling
- Handle unexpected disconnections
2. Audio Quality
- Use appropriate sample rates
- Monitor audio levels
- Implement echo cancellation
- Use noise suppression
3. Error Handling
- Implement retry logic for failed requests
- Handle WebSocket disconnections
- Monitor for rate limiting
4. Performance
- Use polling intervals of 500ms to 2 seconds
- Batch audio chunks appropriately
- Implement backoff strategies
Troubleshooting
WebSocket Connection Failed
- Verify token validity
- Check firewall/proxy settings
- Ensure WebSocket support in network
- Try polling method as alternative
No Transcription Results
- Verify audio is being streamed
- Check session is still active
- Confirm language setting
- Check audio quality
Session Stops Unexpectedly
- Monitor for idle timeout
- Check network stability
- Verify credentials haven't expired
- Review error logs
Next Steps
- Live Speech Features - Advanced capabilities
- Audio Intelligence - Extract insights
- API Reference - Complete documentation