Skip to main content

Getting Started with Live Transcription

Enable real-time speech-to-text transcription for live calls and audio streams using SpeechLytics Live Transcription API. This guide covers the complete workflow for starting and managing live transcription sessions.

Prerequisites

  • Valid SpeechLytics account credentials
  • Bearer authentication token
  • Audio stream source (phone call, VoIP, microphone, etc.)
  • WebSocket or HTTP polling support

Architecture Overview

The Live Transcription API uses a session-based model:

  1. Start Session: Create a new live transcription session
  2. Stream Audio: Send audio data in real-time
  3. Get Payload: Retrieve transcription results continuously
  4. Stop Session: End the transcription session

Step 1: Get Authentication Token

Obtain a Bearer token using your credentials:

curl -X POST "https://api.example.com/api/v1/auth/token" \
-H "Content-Type: application/json" \
-d '{
"Username": "your_username",
"Password": "your_password"
}'

Response:

{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expires": "2025-11-28T23:59:59Z",
"eventId": "evt_12345"
}

Step 2: Start a Live Transcription Session

Initiate a new live transcription session:

curl -X POST "https://api.example.com/api/v1/live-transcribe/start" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"Name": "Customer Support Call - John Doe",
"NumberOfChannels": 2,
"Language": "Auto",
"Username": "agent_123",
"Local": "192.168.1.100",
"Remote": "192.168.1.50"
}'

Request Parameters

ParameterTypeRequiredDescription
NamestringYesDescriptive name for the session
NumberOfChannelsintegerYes1 for mono, 2 for stereo
LanguageenumYesLanguage code or Auto for detection
UsernamestringYesIdentifier for the agent/user
LocalstringNoLocal IP address
RemotestringNoRemote IP address

Response

{
"id": 987654321,
"url": "wss://api.example.com/ws/live-transcribe/987654321"
}

Response Fields

FieldDescription
idUnique session ID for tracking
urlWebSocket URL for streaming audio and receiving results

Step 3: Stream Audio and Receive Results

Option A: WebSocket Connection

Connect to the WebSocket URL provided in the start response:

// JavaScript/Node.js example
const ws = new WebSocket('wss://api.example.com/ws/live-transcribe/987654321');

ws.onopen = () => {
console.log('Connected to live transcription');
// Start streaming audio
};

ws.onmessage = (event) => {
const result = JSON.parse(event.data);
console.log('Transcription:', result.content);
console.log('Timestamp:', result.timestamp);
};

ws.onerror = (error) => {
console.error('WebSocket error:', error);
};

ws.onclose = () => {
console.log('Disconnected from live transcription');
};

Option B: HTTP Polling

Alternatively, poll the payload endpoint for transcription results:

curl -X GET "https://api.example.com/api/v1/live-transcribe/987654321/payload" \
-H "Authorization: Bearer YOUR_TOKEN"

Response:

{
"audio": "base64_encoded_audio_data",
"duration": 5.5,
"sampleRate": 16000,
"frequency": 440
}

Step 4: Monitor Transcription Progress

Continuously retrieve transcription payload:

# Poll every 500ms to 1 second
curl -X GET "https://api.example.com/api/v1/live-transcribe/987654321/payload" \
-H "Authorization: Bearer YOUR_TOKEN"

Payload Fields

FieldDescription
audioBase64-encoded audio segment
durationDuration of this segment in seconds
sampleRateAudio sample rate (typically 16000 Hz)
frequencyFrequency information

Step 5: Stop the Live Transcription Session

End the transcription session when the call is complete:

curl -X POST "https://api.example.com/api/v1/live-transcribe/stop" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"Id": 987654321
}'

Request Parameters

ParameterTypeDescription
IdintegerThe session ID to stop

Response

{
"isValid": true
}

Complete Workflow Example

Python (Polling Method)

import requests
import time
import base64
from threading import Thread

API_BASE = "https://api.example.com"
USERNAME = "your_username"
PASSWORD = "your_password"

class LiveTranscriber:
def __init__(self):
self.session_id = None
self.token = None
self.is_running = False

def authenticate(self):
"""Get authentication token"""
response = requests.post(
f"{API_BASE}/api/v1/auth/token",
json={"Username": USERNAME, "Password": PASSWORD}
)
self.token = response.json()['token']
self.headers = {"Authorization": f"Bearer {self.token}"}

def start_session(self, call_name, channels=2, language="Auto"):
"""Start live transcription session"""
response = requests.post(
f"{API_BASE}/api/v1/live-transcribe/start",
headers=self.headers,
json={
"Name": call_name,
"NumberOfChannels": channels,
"Language": language,
"Username": "agent_001"
}
)
data = response.json()
self.session_id = data['id']
self.websocket_url = data['url']
print(f"Started session: {self.session_id}")
print(f"WebSocket URL: {self.websocket_url}")

def get_payload(self):
"""Retrieve current transcription payload"""
response = requests.get(
f"{API_BASE}/api/v1/live-transcribe/{self.session_id}/payload",
headers=self.headers
)
return response.json()

def monitor(self):
"""Monitor transcription in separate thread"""
self.is_running = True
while self.is_running:
try:
payload = self.get_payload()
if payload.get('audio'):
print(f"Duration: {payload['duration']}s")
print(f"Sample Rate: {payload['sampleRate']}")
time.sleep(1) # Poll every second
except Exception as e:
print(f"Error: {e}")

def stop_session(self):
"""Stop the transcription session"""
response = requests.post(
f"{API_BASE}/api/v1/live-transcribe/stop",
headers=self.headers,
json={"Id": self.session_id}
)
self.is_running = False
print("Session stopped")
return response.json()

# Usage
transcriber = LiveTranscriber()
transcriber.authenticate()
transcriber.start_session("Support Call - Customer XYZ")

# Start monitoring in background thread
monitor_thread = Thread(target=transcriber.monitor)
monitor_thread.daemon = True
monitor_thread.start()

# Simulate call duration
time.sleep(30)

# Stop transcription
transcriber.stop_session()

JavaScript (WebSocket Method)

class LiveTranscriber {
constructor(apiBase, username, password) {
this.apiBase = apiBase;
this.username = username;
this.password = password;
this.token = null;
this.sessionId = null;
this.ws = null;
}

async authenticate() {
const response = await fetch(`${this.apiBase}/api/v1/auth/token`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
Username: this.username,
Password: this.password
})
});
const data = await response.json();
this.token = data.token;
}

async startSession(callName, channels = 2, language = 'Auto') {
const response = await fetch(`${this.apiBase}/api/v1/live-transcribe/start`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${this.token}`
},
body: JSON.stringify({
Name: callName,
NumberOfChannels: channels,
Language: language,
Username: 'agent_001'
})
});
const data = await response.json();
this.sessionId = data.id;
this.wsUrl = data.url;
console.log(`Started session: ${this.sessionId}`);
this.connectWebSocket();
}

connectWebSocket() {
this.ws = new WebSocket(this.wsUrl);

this.ws.onopen = () => {
console.log('Connected to live transcription');
};

this.ws.onmessage = (event) => {
const result = JSON.parse(event.data);
console.log('Transcription:', result);
this.onTranscription(result);
};

this.ws.onerror = (error) => {
console.error('WebSocket error:', error);
};

this.ws.onclose = () => {
console.log('Disconnected');
};
}

onTranscription(result) {
// Handle transcription result
// Override this method to process results
console.log(`${result.timestamp}: ${result.content}`);
}

async stopSession() {
const response = await fetch(`${this.apiBase}/api/v1/live-transcribe/stop`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${this.token}`
},
body: JSON.stringify({ Id: this.sessionId })
});
if (this.ws) this.ws.close();
return response.json();
}
}

// Usage
const transcriber = new LiveTranscriber(
'https://api.example.com',
'username',
'password'
);

await transcriber.authenticate();
await transcriber.startSession('Support Call');

// Stop after 60 seconds
setTimeout(() => transcriber.stopSession(), 60000);

Session Management

Session Limits

  • Maximum session duration: 24 hours
  • Automatic cleanup after session stop
  • Idle timeout: 30 minutes

Active Sessions

  • One session per user account
  • Multiple concurrent sessions supported (enterprise tier)
  • Session state persists until stopped

Audio Streaming

Supported Formats

  • PCM 16-bit, 16kHz (default)
  • PCM 8-bit, 8kHz
  • μ-law (PCMU)
  • A-law (PCMA)
  • Mono: 256 kbps
  • Stereo: 512 kbps

Chunking Strategy

  • Send 100-500ms audio chunks
  • Maintain consistent spacing
  • Handle network latency gracefully

Best Practices

1. Session Management

  • Always stop sessions explicitly
  • Implement timeout handling
  • Handle unexpected disconnections

2. Audio Quality

  • Use appropriate sample rates
  • Monitor audio levels
  • Implement echo cancellation
  • Use noise suppression

3. Error Handling

  • Implement retry logic for failed requests
  • Handle WebSocket disconnections
  • Monitor for rate limiting

4. Performance

  • Use polling intervals of 500ms to 2 seconds
  • Batch audio chunks appropriately
  • Implement backoff strategies

Troubleshooting

WebSocket Connection Failed

  • Verify token validity
  • Check firewall/proxy settings
  • Ensure WebSocket support in network
  • Try polling method as alternative

No Transcription Results

  • Verify audio is being streamed
  • Check session is still active
  • Confirm language setting
  • Check audio quality

Session Stops Unexpectedly

  • Monitor for idle timeout
  • Check network stability
  • Verify credentials haven't expired
  • Review error logs

Next Steps