Getting Started with Live Transcription

Enable real-time speech-to-text transcription for live calls and audio streams using SpeechLytics Live Transcription API. This guide covers the complete workflow for starting and managing live transcription sessions.

Prerequisites

Valid SpeechLytics account credentials
Bearer authentication token
Audio stream source (phone call, VoIP, microphone, etc.)
WebSocket or HTTP polling support

Architecture Overview

The Live Transcription API uses a session-based model:

Start Session: Create a new live transcription session
Stream Audio: Send audio data in real-time
Get Payload: Retrieve transcription results continuously
Stop Session: End the transcription session

Step 1: Get Authentication Token

Obtain a Bearer token using your credentials:

curl -X POST "https://api.example.com/api/v1/auth/token" \
  -H "Content-Type: application/json" \
  -d '{
    "Username": "your_username",
    "Password": "your_password"
  }'

Response:

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "expires": "2025-11-28T23:59:59Z",
  "eventId": "evt_12345"
}

Step 2: Start a Live Transcription Session

Initiate a new live transcription session:

curl -X POST "https://api.example.com/api/v1/live-transcribe/start" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "Name": "Customer Support Call - John Doe",
    "NumberOfChannels": 2,
    "Language": "Auto",
    "Username": "agent_123",
    "Local": "192.168.1.100",
    "Remote": "192.168.1.50"
  }'

Request Parameters

Parameter	Type	Required	Description
Name	string	Yes	Descriptive name for the session
NumberOfChannels	integer	Yes	1 for mono, 2 for stereo
Language	enum	Yes	Language code or `Auto` for detection
Username	string	Yes	Identifier for the agent/user
Local	string	No	Local IP address
Remote	string	No	Remote IP address

Response

{
  "id": 987654321,
  "url": "wss://api.example.com/ws/live-transcribe/987654321"
}

Response Fields

Field	Description
id	Unique session ID for tracking
url	WebSocket URL for streaming audio and receiving results

Step 3: Stream Audio and Receive Results

Option A: WebSocket Connection

Connect to the WebSocket URL provided in the start response:

// JavaScript/Node.js example
const ws = new WebSocket('wss://api.example.com/ws/live-transcribe/987654321');

ws.onopen = () => {
  console.log('Connected to live transcription');
  // Start streaming audio
};

ws.onmessage = (event) => {
  const result = JSON.parse(event.data);
  console.log('Transcription:', result.content);
  console.log('Timestamp:', result.timestamp);
};

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

ws.onclose = () => {
  console.log('Disconnected from live transcription');
};

Option B: HTTP Polling

Alternatively, poll the payload endpoint for transcription results:

curl -X GET "https://api.example.com/api/v1/live-transcribe/987654321/payload" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

{
  "audio": "base64_encoded_audio_data",
  "duration": 5.5,
  "sampleRate": 16000,
  "frequency": 440
}

Step 4: Monitor Transcription Progress

Continuously retrieve transcription payload:

# Poll every 500ms to 1 second
curl -X GET "https://api.example.com/api/v1/live-transcribe/987654321/payload" \
  -H "Authorization: Bearer YOUR_TOKEN"

Payload Fields

Field	Description
audio	Base64-encoded audio segment
duration	Duration of this segment in seconds
sampleRate	Audio sample rate (typically 16000 Hz)
frequency	Frequency information

Step 5: Stop the Live Transcription Session

End the transcription session when the call is complete:

curl -X POST "https://api.example.com/api/v1/live-transcribe/stop" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "Id": 987654321
  }'

Request Parameters

Parameter	Type	Description
Id	integer	The session ID to stop

Response

{
  "isValid": true
}

Complete Workflow Example

Python (Polling Method)

import requests
import time
import base64
from threading import Thread

API_BASE = "https://api.example.com"
USERNAME = "your_username"
PASSWORD = "your_password"

class LiveTranscriber:
    def __init__(self):
        self.session_id = None
        self.token = None
        self.is_running = False
    
    def authenticate(self):
        """Get authentication token"""
        response = requests.post(
            f"{API_BASE}/api/v1/auth/token",
            json={"Username": USERNAME, "Password": PASSWORD}
        )
        self.token = response.json()['token']
        self.headers = {"Authorization": f"Bearer {self.token}"}
    
    def start_session(self, call_name, channels=2, language="Auto"):
        """Start live transcription session"""
        response = requests.post(
            f"{API_BASE}/api/v1/live-transcribe/start",
            headers=self.headers,
            json={
                "Name": call_name,
                "NumberOfChannels": channels,
                "Language": language,
                "Username": "agent_001"
            }
        )
        data = response.json()
        self.session_id = data['id']
        self.websocket_url = data['url']
        print(f"Started session: {self.session_id}")
        print(f"WebSocket URL: {self.websocket_url}")
    
    def get_payload(self):
        """Retrieve current transcription payload"""
        response = requests.get(
            f"{API_BASE}/api/v1/live-transcribe/{self.session_id}/payload",
            headers=self.headers
        )
        return response.json()
    
    def monitor(self):
        """Monitor transcription in separate thread"""
        self.is_running = True
        while self.is_running:
            try:
                payload = self.get_payload()
                if payload.get('audio'):
                    print(f"Duration: {payload['duration']}s")
                    print(f"Sample Rate: {payload['sampleRate']}")
                time.sleep(1)  # Poll every second
            except Exception as e:
                print(f"Error: {e}")
    
    def stop_session(self):
        """Stop the transcription session"""
        response = requests.post(
            f"{API_BASE}/api/v1/live-transcribe/stop",
            headers=self.headers,
            json={"Id": self.session_id}
        )
        self.is_running = False
        print("Session stopped")
        return response.json()

# Usage
transcriber = LiveTranscriber()
transcriber.authenticate()
transcriber.start_session("Support Call - Customer XYZ")

# Start monitoring in background thread
monitor_thread = Thread(target=transcriber.monitor)
monitor_thread.daemon = True
monitor_thread.start()

# Simulate call duration
time.sleep(30)

# Stop transcription
transcriber.stop_session()

JavaScript (WebSocket Method)

class LiveTranscriber {
  constructor(apiBase, username, password) {
    this.apiBase = apiBase;
    this.username = username;
    this.password = password;
    this.token = null;
    this.sessionId = null;
    this.ws = null;
  }

  async authenticate() {
    const response = await fetch(`${this.apiBase}/api/v1/auth/token`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        Username: this.username,
        Password: this.password
      })
    });
    const data = await response.json();
    this.token = data.token;
  }

  async startSession(callName, channels = 2, language = 'Auto') {
    const response = await fetch(`${this.apiBase}/api/v1/live-transcribe/start`, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${this.token}`
      },
      body: JSON.stringify({
        Name: callName,
        NumberOfChannels: channels,
        Language: language,
        Username: 'agent_001'
      })
    });
    const data = await response.json();
    this.sessionId = data.id;
    this.wsUrl = data.url;
    console.log(`Started session: ${this.sessionId}`);
    this.connectWebSocket();
  }

  connectWebSocket() {
    this.ws = new WebSocket(this.wsUrl);

    this.ws.onopen = () => {
      console.log('Connected to live transcription');
    };

    this.ws.onmessage = (event) => {
      const result = JSON.parse(event.data);
      console.log('Transcription:', result);
      this.onTranscription(result);
    };

    this.ws.onerror = (error) => {
      console.error('WebSocket error:', error);
    };

    this.ws.onclose = () => {
      console.log('Disconnected');
    };
  }

  onTranscription(result) {
    // Handle transcription result
    // Override this method to process results
    console.log(`${result.timestamp}: ${result.content}`);
  }

  async stopSession() {
    const response = await fetch(`${this.apiBase}/api/v1/live-transcribe/stop`, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${this.token}`
      },
      body: JSON.stringify({ Id: this.sessionId })
    });
    if (this.ws) this.ws.close();
    return response.json();
  }
}

// Usage
const transcriber = new LiveTranscriber(
  'https://api.example.com',
  'username',
  'password'
);

await transcriber.authenticate();
await transcriber.startSession('Support Call');

// Stop after 60 seconds
setTimeout(() => transcriber.stopSession(), 60000);

Session Management

Session Limits

Maximum session duration: 24 hours
Automatic cleanup after session stop
Idle timeout: 30 minutes

Active Sessions

One session per user account
Multiple concurrent sessions supported (enterprise tier)
Session state persists until stopped

Audio Streaming

Supported Formats

PCM 16-bit, 16kHz (default)
PCM 8-bit, 8kHz
μ-law (PCMU)
A-law (PCMA)

Recommended Bitrate

Mono: 256 kbps
Stereo: 512 kbps

Chunking Strategy

Send 100-500ms audio chunks
Maintain consistent spacing
Handle network latency gracefully

Best Practices

1. Session Management

Always stop sessions explicitly
Implement timeout handling
Handle unexpected disconnections

2. Audio Quality

Use appropriate sample rates
Monitor audio levels
Implement echo cancellation
Use noise suppression

3. Error Handling

Implement retry logic for failed requests
Handle WebSocket disconnections
Monitor for rate limiting

4. Performance

Use polling intervals of 500ms to 2 seconds
Batch audio chunks appropriately
Implement backoff strategies

Troubleshooting

WebSocket Connection Failed

Verify token validity
Check firewall/proxy settings
Ensure WebSocket support in network
Try polling method as alternative

No Transcription Results

Verify audio is being streamed
Check session is still active
Confirm language setting
Check audio quality

Session Stops Unexpectedly

Monitor for idle timeout
Check network stability
Verify credentials haven't expired
Review error logs

Next Steps

Live Speech Features - Advanced capabilities
Audio Intelligence - Extract insights
API Reference - Complete documentation

Prerequisites​

Architecture Overview​

Step 1: Get Authentication Token​

Step 2: Start a Live Transcription Session​

Request Parameters​

Response​

Response Fields​

Step 3: Stream Audio and Receive Results​

Option A: WebSocket Connection​

Option B: HTTP Polling​

Step 4: Monitor Transcription Progress​

Payload Fields​

Step 5: Stop the Live Transcription Session​

Request Parameters​

Response​

Complete Workflow Example​

Python (Polling Method)​

JavaScript (WebSocket Method)​

Session Management​

Session Limits​

Active Sessions​

Audio Streaming​

Supported Formats​

Recommended Bitrate​

Chunking Strategy​

Best Practices​

1. Session Management​

2. Audio Quality​

3. Error Handling​

4. Performance​

Troubleshooting​

WebSocket Connection Failed​

No Transcription Results​

Session Stops Unexpectedly​

Next Steps​

Prerequisites

Architecture Overview

Step 1: Get Authentication Token

Step 2: Start a Live Transcription Session

Request Parameters

Response

Response Fields

Step 3: Stream Audio and Receive Results

Option A: WebSocket Connection

Option B: HTTP Polling

Step 4: Monitor Transcription Progress

Payload Fields

Step 5: Stop the Live Transcription Session

Request Parameters

Response

Complete Workflow Example

Python (Polling Method)

JavaScript (WebSocket Method)

Session Management

Session Limits

Active Sessions

Audio Streaming

Supported Formats

Recommended Bitrate

Chunking Strategy

Best Practices

1. Session Management

2. Audio Quality

3. Error Handling

4. Performance

Troubleshooting

WebSocket Connection Failed

No Transcription Results

Session Stops Unexpectedly

Next Steps