Skip to main content

Speech-to-Text Features

Transcription Quality Metrics

SpeechLytics provides comprehensive quality metrics for every transcription to help you understand the audio characteristics and processing quality.

Quality Score

Each transcript receives a quality score (0-100) based on:

  • Audio clarity and signal-to-noise ratio
  • Language detection confidence
  • Speaker diarization accuracy
  • Overall transcription accuracy
{
"score": 95.5,
"inScope": true,
"audioType": 1,
"audioTypeDescription": "Stereo"
}

Audio Analysis

Audio Type Detection

The API automatically detects and reports the audio format:

TypeCodeDescription
Mono0Single channel audio
Stereo1Two-channel audio (left and right)

Channel-Specific Analysis

For stereo audio, SpeechLytics analyzes each channel separately:

{
"leftChannelSilence": 5,
"rightChannelSilence": 3,
"bothChannelSilence": 1,
"leftChannelMusic": 2,
"rightChannelMusic": 0,
"bothChannelMusic": 0,
"leftChannelNoise": 10,
"rightChannelNoise": 8,
"bothChannelNoise": 2,
"leftChannelLaughter": 1,
"rightChannelLaughter": 1,
"bothChannelLaughter": 0
}

Metrics Explanation

  • Silence Duration: Percentage of time with no speech
  • Music Duration: Percentage of time with background music
  • Noise Duration: Percentage of time with background noise
  • Laughter Duration: Percentage of time with laughter detected
  • Channel Termination: Whether each channel ended properly
{
"leftTerminatedOk": true,
"rightTerminatedOk": true,
"bothTerminatedOk": true
}

Transcription Content

Multi-Channel Transcription

Transcriptions are provided for different channel combinations:

{
"transcription": {
"language": "en",
"leftChannel": [
{
"start": 0.5,
"end": 2.3,
"duration": 1.8,
"content": "Hello, this is the agent speaking",
"language": "en",
"channel": "left"
}
],
"rightChannel": [
{
"start": 2.5,
"end": 4.1,
"duration": 1.6,
"content": "Hi there, I need help with my account",
"language": "en",
"channel": "right"
}
],
"bothChannels": [...]
}
}

Transcription Fields

FieldDescription
startStart time in seconds
endEnd time in seconds
durationDuration in seconds
contentThe transcribed text
languageDetected language code
channelWhich channel(s) this segment came from

Speaker Frequency Analysis

Understand speaker distribution in your conversations:

{
"speakerFrequencyPercentage": {
"left": 48.5,
"right": 51.5
}
}

This helps you identify:

  • Agent vs. customer talk time
  • Speaker balance
  • Engagement patterns

Word Frequency Analysis

Get insights into the most frequently used words:

{
"wordFrequencyLeft": [
{
"word": "customer",
"count": 12
},
{
"word": "help",
"count": 8
},
{
"word": "account",
"count": 7
}
],
"wordFrequencyRight": [...],
"wordFrequencyBoth": [...]
}

Use Cases

  • Identify key discussion topics
  • Track terminology usage
  • Detect scripting adherence
  • Monitor compliance keywords

Call Duration Tracking

Monitor conversation length and timing:

{
"duration": 180,
"ringing": 15,
"filteredPhraseDuration": 165,
"created": "2025-11-28T10:00:00Z",
"modified": "2025-11-28T10:05:00Z"
}
FieldDescription
durationTotal call length in seconds
ringingTime spent ringing before connection
filteredPhraseDurationActual conversation time (excluding silence/music/noise)
createdTimestamp when transcription was initiated
modifiedTimestamp when transcription was completed

Keyword Matching

Detect specific keywords and phrases in conversations:

{
"keywords": [
{
"name": "urgent",
"score": 95,
"isExclusive": false,
"outOfScope": false,
"tags": [
{
"name": "priority",
"frequency": 3
}
],
"channelMatch": "left"
}
]
}

Keyword Fields

FieldDescription
nameThe keyword or phrase detected
scoreConfidence score (0-100)
isExclusiveWhether keyword is restricted
outOfScopeWhether keyword is out of scope
tagsAssociated tags and frequencies
channelMatchWhich channel detected the keyword

Priority Processing

For time-sensitive transcriptions, use the HasPriority flag:

curl -X POST "https://api.example.com/api/v1/transcribe" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"DataBase64": "...",
"Filename": "urgent_call.wav",
"Language": "Auto",
"HasPriority": true
}'

Benefits:

  • Faster processing
  • Higher queue priority
  • Ideal for critical calls
  • May have additional costs

Metadata Support

Store custom metadata with each transcription:

curl -X POST "https://api.example.com/api/v1/transcribe" \
-d '{
"DataBase64": "...",
"Filename": "call.wav",
"Metadata": "agent_id=5678;campaign=retention;customer_id=9012;priority=high"
}'

Metadata can be used for:

  • Tracking agent performance
  • Campaign attribution
  • Customer segmentation
  • Call categorization

Filename Existence Check

Prevent duplicate transcriptions:

curl -X POST "https://api.example.com/api/v1/transcribe" \
-d '{
"DataBase64": "...",
"Filename": "call_2025_11_28_123456.wav",
"CheckFilenameExistence": true
}'

When enabled:

  • API checks if filename already exists
  • Returns existing result if found
  • Prevents redundant processing
  • Saves processing time and costs

Supported Audio Formats

  • WAV (.wav)
  • MP3 (.mp3)
  • M4A (.m4a)
  • OGG (.ogg)
  • FLAC (.flac)
  • AAC (.aac)

Language Support

Automatic language detection or manual specification for 100+ languages:

  • English, Spanish, French, German, Italian, Portuguese
  • Chinese (Simplified & Traditional), Japanese, Korean
  • Russian, Polish, Dutch, Swedish, Danish, Norwegian
  • Hindi, Arabic, Hebrew, Turkish, Thai
  • And many more...

For best results, specify the language if known in advance.

Best Practices

1. Audio Quality

  • Use clear audio with minimal background noise
  • Ensure proper audio levels (not too loud or too quiet)
  • Use stereo when possible for better separation

2. Processing Strategy

  • Use HasPriority: true for urgent calls only
  • Batch non-urgent transcriptions to save costs
  • Implement exponential backoff for status checks

3. Error Handling

  • Handle status code 3 (Failed) gracefully
  • Implement retry logic for transient failures
  • Monitor quota limits (status code 4)

4. Data Management

  • Store transcription IDs for future reference
  • Use metadata for call categorization
  • Archive results after processing

Next Steps