Skip to main content

Getting Started with Speech-to-Text

Convert your pre-recorded audio files to text using SpeechLytics Speech-to-Text API. This guide will walk you through the complete workflow from authentication to retrieving your transcription results.

Prerequisites

  • Valid SpeechLytics account credentials
  • Audio file in a supported format (WAV, MP3, M4A, etc.)
  • Bearer token (obtained from authentication endpoint)

Step 1: Get Authentication Token

First, obtain a Bearer token using your credentials:

curl -X POST "https://api.example.com/api/v1/auth/token" \
-H "Content-Type: application/json" \
-d '{
"Username": "your_username",
"Password": "your_password"
}'

Response:

{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expires": "2025-11-28T23:59:59Z",
"eventId": "evt_12345"
}

Save this token for use in subsequent API calls. See Authentication Guide for more details.

Step 2: Prepare Your Audio File

Convert your audio file to Base64 format. Here are examples for different platforms:

Linux/Mac

base64 -i your_audio_file.wav -o audio_base64.txt

Windows PowerShell

[Convert]::ToBase64String([IO.File]::ReadAllBytes("your_audio_file.wav")) | Out-File -Encoding utf8 audio_base64.txt

Python

import base64

with open('your_audio_file.wav', 'rb') as f:
audio_base64 = base64.b64encode(f.read()).decode('utf-8')
print(audio_base64)

Step 3: Upload and Transcribe Your Audio

Send your audio file to the transcription endpoint:

curl -X POST "https://api.example.com/api/v1/transcribe" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"DataBase64": "SUQzBAAAAAAAI1NUVEUAAAA...",
"Filename": "call_recording.wav",
"Language": "Auto",
"Metadata": "agent_id=123;call_type=inbound",
"HasPriority": false,
"CheckFilenameExistence": false
}'

Request Parameters

ParameterTypeRequiredDescription
DataBase64stringYesAudio file encoded in Base64 format
FilenamestringYesName of the audio file (used for reference)
LanguageenumYesLanguage of the audio. Options: Auto, ENGLISH, SPANISH, etc. (100+ languages supported). Default: Auto
MetadatastringNoCustom metadata (e.g., call ID, agent info)
HasPrioritybooleanNoSet to true for faster processing. Default: false
CheckFilenameExistencebooleanNoCheck if filename already exists. Default: false

Response

{
"id": 123456789,
"status": "0 - Queued"
}

Response Fields

FieldTypeDescription
idintegerUnique transcription ID for tracking
statusenumCurrent processing status (0 - Queued, 1 - InProgress, 2 - Processed, 3 - Failed, etc.)

Step 4: Check Transcription Status

Use the transcription ID to check processing status:

curl -X GET "https://api.example.com/api/v1/transcripts/123456789/status" \
-H "Authorization: Bearer YOUR_TOKEN"

Response:

{
"id": 123456789,
"status": 2,
"statusDescription": "Processed",
"score": 95.5,
"inScope": true,
"audioType": 1,
"audioTypeDescription": "Stereo",
"duration": 180,
"name": "call_recording.wav",
"created": "2025-11-28T10:00:00Z",
"modified": "2025-11-28T10:05:00Z",
"transcription": {
"language": "en",
"leftChannel": [...],
"rightChannel": [...],
"bothChannels": [...]
},
"keywords": [...],
"topics": [...],
"sentiments": [...],
"callSummary": {...}
}

Status Values

CodeDescriptionMeaning
0QueuedWaiting to be processed
1InProgressCurrently being processed
2ProcessedSuccessfully completed
3FailedProcessing failed
4QuotaLimitAccount quota exceeded
5NotFoundTranscription ID not found
6KeywordMatchProcessing completed with keyword matches
7LiveCurrently processing live transcription

Step 5: Retrieve Full Transcripts

Get a list of all your transcripts with filters:

curl -X GET "https://api.example.com/api/v1/transcripts?Page=1&Rows=10&DateFrom=2025-11-01&DateTo=2025-11-30" \
-H "Authorization: Bearer YOUR_TOKEN"

Query Parameters

ParameterTypeDescription
PageintegerPage number (1-indexed)
RowsintegerNumber of rows per page (max: 100)
DateFromstringFilter by start date (ISO 8601 format)
DateTostringFilter by end date (ISO 8601 format)
FileNamestringFilter by filename
TopicstringFilter by topic
SentimentstringFilter by sentiment
TagstringFilter by tag
ContentstringSearch in transcription content
ClusterstringFilter by cluster

Complete Workflow Example

Python

import requests
import base64
import time

API_BASE = "https://api.example.com"
USERNAME = "your_username"
PASSWORD = "your_password"

# Step 1: Get token
auth_response = requests.post(
f"{API_BASE}/api/v1/auth/token",
json={"Username": USERNAME, "Password": PASSWORD}
)
token = auth_response.json()['token']
headers = {"Authorization": f"Bearer {token}"}

# Step 2: Prepare audio
with open('call_recording.wav', 'rb') as f:
audio_base64 = base64.b64encode(f.read()).decode('utf-8')

# Step 3: Upload audio
transcribe_response = requests.post(
f"{API_BASE}/api/v1/transcribe",
headers=headers,
json={
"DataBase64": audio_base64,
"Filename": "call_recording.wav",
"Language": "Auto",
"Metadata": "agent_id=123"
}
)
transcription_id = transcribe_response.json()['id']
print(f"Transcription started with ID: {transcription_id}")

# Step 4: Poll for completion
while True:
status_response = requests.get(
f"{API_BASE}/api/v1/transcripts/{transcription_id}/status",
headers=headers
)
status = status_response.json()

if status['status'] == 2: # Processed
print("Transcription completed!")
print(f"Transcription: {status['transcription']}")
break
elif status['status'] == 3: # Failed
print("Transcription failed!")
break
else:
print(f"Status: {status['statusDescription']}")
time.sleep(5) # Wait 5 seconds before checking again

C#

using System;
using System.IO;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;

class Program
{
static async Task Main()
{
var client = new HttpClient();
var apiBase = "https://api.example.com";

// Step 1: Get token
var tokenPayload = JsonSerializer.Serialize(new
{
Username = "your_username",
Password = "your_password"
});
var tokenContent = new StringContent(tokenPayload, Encoding.UTF8, "application/json");
var tokenResponse = await client.PostAsync($"{apiBase}/api/v1/auth/token", tokenContent);
var tokenData = await tokenResponse.Content.ReadAsStringAsync();
var tokenJson = JsonDocument.Parse(tokenData);
var token = tokenJson.RootElement.GetProperty("token").GetString();

client.DefaultRequestHeaders.Add("Authorization", $"Bearer {token}");

// Step 2: Prepare audio
var audioBytes = File.ReadAllBytes("call_recording.wav");
var audioBase64 = Convert.ToBase64String(audioBytes);

// Step 3: Upload audio
var transcribePayload = JsonSerializer.Serialize(new
{
DataBase64 = audioBase64,
Filename = "call_recording.wav",
Language = "Auto",
Metadata = "agent_id=123"
});
var transcribeContent = new StringContent(transcribePayload, Encoding.UTF8, "application/json");
var transcribeResponse = await client.PostAsync($"{apiBase}/api/v1/transcribe", transcribeContent);
var transcribeData = await transcribeResponse.Content.ReadAsStringAsync();
var transcribeJson = JsonDocument.Parse(transcribeData);
var transcriptionId = transcribeJson.RootElement.GetProperty("id").GetInt64();

Console.WriteLine($"Transcription started with ID: {transcriptionId}");

// Step 4: Poll for completion
while (true)
{
var statusResponse = await client.GetAsync($"{apiBase}/api/v1/transcripts/{transcriptionId}/status");
var statusData = await statusResponse.Content.ReadAsStringAsync();
var statusJson = JsonDocument.Parse(statusData);
var status = statusJson.RootElement.GetProperty("status").GetInt32();

if (status == 2)
{
Console.WriteLine("Transcription completed!");
break;
}
else if (status == 3)
{
Console.WriteLine("Transcription failed!");
break;
}

await Task.Delay(5000);
}
}
}

Supported Languages

The Speech-to-Text API supports automatic language detection as well as explicit language selection for 100+ languages including:

  • English, Spanish, French, German, Italian, Portuguese
  • Chinese (Simplified & Traditional), Japanese, Korean
  • Russian, Polish, Dutch, Swedish, Danish, Norwegian
  • Hindi, Arabic, Hebrew, Turkish, Thai
  • And many more...

For a complete list, see the Language enum in the API Reference.

Next Steps

  • Learn about Features - Silence detection, quality metrics, channel analysis
  • Explore Audio Intelligence - Extract insights from your transcriptions
  • Check Insights - Access analytics on your call data
  • View API Reference - Complete API documentation

Troubleshooting

Common Issues

Issue: 401 Unauthorized

  • Verify your Bearer token is valid and not expired
  • Check the Authorization header format

Issue: Large file takes too long

  • Use HasPriority: true for faster processing
  • Consider splitting very large files

Issue: Transcript has low quality

  • Ensure your audio file is clear and not corrupted
  • Try explicit language specification instead of Auto
  • Check audio levels and background noise

For more help, contact support.