Getting Started with Speech-to-Text
Convert your pre-recorded audio files to text using SpeechLytics Speech-to-Text API. This guide will walk you through the complete workflow from authentication to retrieving your transcription results.
Prerequisites
- Valid SpeechLytics account credentials
- Audio file in a supported format (WAV, MP3, M4A, etc.)
- Bearer token (obtained from authentication endpoint)
Step 1: Get Authentication Token
First, obtain a Bearer token using your credentials:
curl -X POST "https://api.example.com/api/v1/auth/token" \
-H "Content-Type: application/json" \
-d '{
"Username": "your_username",
"Password": "your_password"
}'
Response:
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expires": "2025-11-28T23:59:59Z",
"eventId": "evt_12345"
}
Save this token for use in subsequent API calls. See Authentication Guide for more details.
Step 2: Prepare Your Audio File
Convert your audio file to Base64 format. Here are examples for different platforms:
Linux/Mac
base64 -i your_audio_file.wav -o audio_base64.txt
Windows PowerShell
[Convert]::ToBase64String([IO.File]::ReadAllBytes("your_audio_file.wav")) | Out-File -Encoding utf8 audio_base64.txt
Python
import base64
with open('your_audio_file.wav', 'rb') as f:
audio_base64 = base64.b64encode(f.read()).decode('utf-8')
print(audio_base64)
Step 3: Upload and Transcribe Your Audio
Send your audio file to the transcription endpoint:
curl -X POST "https://api.example.com/api/v1/transcribe" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"DataBase64": "SUQzBAAAAAAAI1NUVEUAAAA...",
"Filename": "call_recording.wav",
"Language": "Auto",
"Metadata": "agent_id=123;call_type=inbound",
"HasPriority": false,
"CheckFilenameExistence": false
}'
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| DataBase64 | string | Yes | Audio file encoded in Base64 format |
| Filename | string | Yes | Name of the audio file (used for reference) |
| Language | enum | Yes | Language of the audio. Options: Auto, ENGLISH, SPANISH, etc. (100+ languages supported). Default: Auto |
| Metadata | string | No | Custom metadata (e.g., call ID, agent info) |
| HasPriority | boolean | No | Set to true for faster processing. Default: false |
| CheckFilenameExistence | boolean | No | Check if filename already exists. Default: false |
Response
{
"id": 123456789,
"status": "0 - Queued"
}
Response Fields
| Field | Type | Description |
|---|---|---|
| id | integer | Unique transcription ID for tracking |
| status | enum | Current processing status (0 - Queued, 1 - InProgress, 2 - Processed, 3 - Failed, etc.) |
Step 4: Check Transcription Status
Use the transcription ID to check processing status:
curl -X GET "https://api.example.com/api/v1/transcripts/123456789/status" \
-H "Authorization: Bearer YOUR_TOKEN"
Response:
{
"id": 123456789,
"status": 2,
"statusDescription": "Processed",
"score": 95.5,
"inScope": true,
"audioType": 1,
"audioTypeDescription": "Stereo",
"duration": 180,
"name": "call_recording.wav",
"created": "2025-11-28T10:00:00Z",
"modified": "2025-11-28T10:05:00Z",
"transcription": {
"language": "en",
"leftChannel": [...],
"rightChannel": [...],
"bothChannels": [...]
},
"keywords": [...],
"topics": [...],
"sentiments": [...],
"callSummary": {...}
}
Status Values
| Code | Description | Meaning |
|---|---|---|
| 0 | Queued | Waiting to be processed |
| 1 | InProgress | Currently being processed |
| 2 | Processed | Successfully completed |
| 3 | Failed | Processing failed |
| 4 | QuotaLimit | Account quota exceeded |
| 5 | NotFound | Transcription ID not found |
| 6 | KeywordMatch | Processing completed with keyword matches |
| 7 | Live | Currently processing live transcription |
Step 5: Retrieve Full Transcripts
Get a list of all your transcripts with filters:
curl -X GET "https://api.example.com/api/v1/transcripts?Page=1&Rows=10&DateFrom=2025-11-01&DateTo=2025-11-30" \
-H "Authorization: Bearer YOUR_TOKEN"
Query Parameters
| Parameter | Type | Description |
|---|---|---|
| Page | integer | Page number (1-indexed) |
| Rows | integer | Number of rows per page (max: 100) |
| DateFrom | string | Filter by start date (ISO 8601 format) |
| DateTo | string | Filter by end date (ISO 8601 format) |
| FileName | string | Filter by filename |
| Topic | string | Filter by topic |
| Sentiment | string | Filter by sentiment |
| Tag | string | Filter by tag |
| Content | string | Search in transcription content |
| Cluster | string | Filter by cluster |
Complete Workflow Example
Python
import requests
import base64
import time
API_BASE = "https://api.example.com"
USERNAME = "your_username"
PASSWORD = "your_password"
# Step 1: Get token
auth_response = requests.post(
f"{API_BASE}/api/v1/auth/token",
json={"Username": USERNAME, "Password": PASSWORD}
)
token = auth_response.json()['token']
headers = {"Authorization": f"Bearer {token}"}
# Step 2: Prepare audio
with open('call_recording.wav', 'rb') as f:
audio_base64 = base64.b64encode(f.read()).decode('utf-8')
# Step 3: Upload audio
transcribe_response = requests.post(
f"{API_BASE}/api/v1/transcribe",
headers=headers,
json={
"DataBase64": audio_base64,
"Filename": "call_recording.wav",
"Language": "Auto",
"Metadata": "agent_id=123"
}
)
transcription_id = transcribe_response.json()['id']
print(f"Transcription started with ID: {transcription_id}")
# Step 4: Poll for completion
while True:
status_response = requests.get(
f"{API_BASE}/api/v1/transcripts/{transcription_id}/status",
headers=headers
)
status = status_response.json()
if status['status'] == 2: # Processed
print("Transcription completed!")
print(f"Transcription: {status['transcription']}")
break
elif status['status'] == 3: # Failed
print("Transcription failed!")
break
else:
print(f"Status: {status['statusDescription']}")
time.sleep(5) # Wait 5 seconds before checking again
C#
using System;
using System.IO;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
class Program
{
static async Task Main()
{
var client = new HttpClient();
var apiBase = "https://api.example.com";
// Step 1: Get token
var tokenPayload = JsonSerializer.Serialize(new
{
Username = "your_username",
Password = "your_password"
});
var tokenContent = new StringContent(tokenPayload, Encoding.UTF8, "application/json");
var tokenResponse = await client.PostAsync($"{apiBase}/api/v1/auth/token", tokenContent);
var tokenData = await tokenResponse.Content.ReadAsStringAsync();
var tokenJson = JsonDocument.Parse(tokenData);
var token = tokenJson.RootElement.GetProperty("token").GetString();
client.DefaultRequestHeaders.Add("Authorization", $"Bearer {token}");
// Step 2: Prepare audio
var audioBytes = File.ReadAllBytes("call_recording.wav");
var audioBase64 = Convert.ToBase64String(audioBytes);
// Step 3: Upload audio
var transcribePayload = JsonSerializer.Serialize(new
{
DataBase64 = audioBase64,
Filename = "call_recording.wav",
Language = "Auto",
Metadata = "agent_id=123"
});
var transcribeContent = new StringContent(transcribePayload, Encoding.UTF8, "application/json");
var transcribeResponse = await client.PostAsync($"{apiBase}/api/v1/transcribe", transcribeContent);
var transcribeData = await transcribeResponse.Content.ReadAsStringAsync();
var transcribeJson = JsonDocument.Parse(transcribeData);
var transcriptionId = transcribeJson.RootElement.GetProperty("id").GetInt64();
Console.WriteLine($"Transcription started with ID: {transcriptionId}");
// Step 4: Poll for completion
while (true)
{
var statusResponse = await client.GetAsync($"{apiBase}/api/v1/transcripts/{transcriptionId}/status");
var statusData = await statusResponse.Content.ReadAsStringAsync();
var statusJson = JsonDocument.Parse(statusData);
var status = statusJson.RootElement.GetProperty("status").GetInt32();
if (status == 2)
{
Console.WriteLine("Transcription completed!");
break;
}
else if (status == 3)
{
Console.WriteLine("Transcription failed!");
break;
}
await Task.Delay(5000);
}
}
}
Supported Languages
The Speech-to-Text API supports automatic language detection as well as explicit language selection for 100+ languages including:
- English, Spanish, French, German, Italian, Portuguese
- Chinese (Simplified & Traditional), Japanese, Korean
- Russian, Polish, Dutch, Swedish, Danish, Norwegian
- Hindi, Arabic, Hebrew, Turkish, Thai
- And many more...
For a complete list, see the Language enum in the API Reference.
Next Steps
- Learn about Features - Silence detection, quality metrics, channel analysis
- Explore Audio Intelligence - Extract insights from your transcriptions
- Check Insights - Access analytics on your call data
- View API Reference - Complete API documentation
Troubleshooting
Common Issues
Issue: 401 Unauthorized
- Verify your Bearer token is valid and not expired
- Check the Authorization header format
Issue: Large file takes too long
- Use
HasPriority: truefor faster processing - Consider splitting very large files
Issue: Transcript has low quality
- Ensure your audio file is clear and not corrupted
- Try explicit language specification instead of
Auto - Check audio levels and background noise
For more help, contact support.