Speech-to-Text Features

Transcription Quality Metrics

SpeechLytics provides comprehensive quality metrics for every transcription to help you understand the audio characteristics and processing quality.

Quality Score

Each transcript receives a quality score (0-100) based on:

Audio clarity and signal-to-noise ratio
Language detection confidence
Speaker diarization accuracy
Overall transcription accuracy

{
  "score": 95.5,
  "inScope": true,
  "audioType": 1,
  "audioTypeDescription": "Stereo"
}

Audio Analysis

Audio Type Detection

The API automatically detects and reports the audio format:

Type	Code	Description
Mono	0	Single channel audio
Stereo	1	Two-channel audio (left and right)

Channel-Specific Analysis

For stereo audio, SpeechLytics analyzes each channel separately:

{
  "leftChannelSilence": 5,
  "rightChannelSilence": 3,
  "bothChannelSilence": 1,
  "leftChannelMusic": 2,
  "rightChannelMusic": 0,
  "bothChannelMusic": 0,
  "leftChannelNoise": 10,
  "rightChannelNoise": 8,
  "bothChannelNoise": 2,
  "leftChannelLaughter": 1,
  "rightChannelLaughter": 1,
  "bothChannelLaughter": 0
}

Metrics Explanation

Silence Duration: Percentage of time with no speech
Music Duration: Percentage of time with background music
Noise Duration: Percentage of time with background noise
Laughter Duration: Percentage of time with laughter detected
Channel Termination: Whether each channel ended properly

{
  "leftTerminatedOk": true,
  "rightTerminatedOk": true,
  "bothTerminatedOk": true
}

Transcription Content

Multi-Channel Transcription

Transcriptions are provided for different channel combinations:

{
  "transcription": {
    "language": "en",
    "leftChannel": [
      {
        "start": 0.5,
        "end": 2.3,
        "duration": 1.8,
        "content": "Hello, this is the agent speaking",
        "language": "en",
        "channel": "left"
      }
    ],
    "rightChannel": [
      {
        "start": 2.5,
        "end": 4.1,
        "duration": 1.6,
        "content": "Hi there, I need help with my account",
        "language": "en",
        "channel": "right"
      }
    ],
    "bothChannels": [...]
  }
}

Transcription Fields

Field	Description
start	Start time in seconds
end	End time in seconds
duration	Duration in seconds
content	The transcribed text
language	Detected language code
channel	Which channel(s) this segment came from

Speaker Frequency Analysis

Understand speaker distribution in your conversations:

{
  "speakerFrequencyPercentage": {
    "left": 48.5,
    "right": 51.5
  }
}

This helps you identify:

Agent vs. customer talk time
Speaker balance
Engagement patterns

Word Frequency Analysis

Get insights into the most frequently used words:

{
  "wordFrequencyLeft": [
    {
      "word": "customer",
      "count": 12
    },
    {
      "word": "help",
      "count": 8
    },
    {
      "word": "account",
      "count": 7
    }
  ],
  "wordFrequencyRight": [...],
  "wordFrequencyBoth": [...]
}

Use Cases

Identify key discussion topics
Track terminology usage
Detect scripting adherence
Monitor compliance keywords

Call Duration Tracking

Monitor conversation length and timing:

{
  "duration": 180,
  "ringing": 15,
  "filteredPhraseDuration": 165,
  "created": "2025-11-28T10:00:00Z",
  "modified": "2025-11-28T10:05:00Z"
}

Field	Description
duration	Total call length in seconds
ringing	Time spent ringing before connection
filteredPhraseDuration	Actual conversation time (excluding silence/music/noise)
created	Timestamp when transcription was initiated
modified	Timestamp when transcription was completed

Keyword Matching

Detect specific keywords and phrases in conversations:

{
  "keywords": [
    {
      "name": "urgent",
      "score": 95,
      "isExclusive": false,
      "outOfScope": false,
      "tags": [
        {
          "name": "priority",
          "frequency": 3
        }
      ],
      "channelMatch": "left"
    }
  ]
}

Keyword Fields

Field	Description
name	The keyword or phrase detected
score	Confidence score (0-100)
isExclusive	Whether keyword is restricted
outOfScope	Whether keyword is out of scope
tags	Associated tags and frequencies
channelMatch	Which channel detected the keyword

Priority Processing

For time-sensitive transcriptions, use the HasPriority flag:

curl -X POST "https://api.example.com/api/v1/transcribe" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "DataBase64": "...",
    "Filename": "urgent_call.wav",
    "Language": "Auto",
    "HasPriority": true
  }'

Benefits:

Faster processing
Higher queue priority
Ideal for critical calls
May have additional costs

Metadata Support

Store custom metadata with each transcription:

curl -X POST "https://api.example.com/api/v1/transcribe" \
  -d '{
    "DataBase64": "...",
    "Filename": "call.wav",
    "Metadata": "agent_id=5678;campaign=retention;customer_id=9012;priority=high"
  }'

Metadata can be used for:

Tracking agent performance
Campaign attribution
Customer segmentation
Call categorization

Filename Existence Check

Prevent duplicate transcriptions:

curl -X POST "https://api.example.com/api/v1/transcribe" \
  -d '{
    "DataBase64": "...",
    "Filename": "call_2025_11_28_123456.wav",
    "CheckFilenameExistence": true
  }'

When enabled:

API checks if filename already exists
Returns existing result if found
Prevents redundant processing
Saves processing time and costs

Supported Audio Formats

WAV (.wav)
MP3 (.mp3)
M4A (.m4a)
OGG (.ogg)
FLAC (.flac)
AAC (.aac)

Language Support

Automatic language detection or manual specification for 100+ languages:

English, Spanish, French, German, Italian, Portuguese
Chinese (Simplified & Traditional), Japanese, Korean
Russian, Polish, Dutch, Swedish, Danish, Norwegian
Hindi, Arabic, Hebrew, Turkish, Thai
And many more...

For best results, specify the language if known in advance.

Best Practices

1. Audio Quality

Use clear audio with minimal background noise
Ensure proper audio levels (not too loud or too quiet)
Use stereo when possible for better separation

2. Processing Strategy

Use HasPriority: true for urgent calls only
Batch non-urgent transcriptions to save costs
Implement exponential backoff for status checks

3. Error Handling

Handle status code 3 (Failed) gracefully
Implement retry logic for transient failures
Monitor quota limits (status code 4)

4. Data Management

Store transcription IDs for future reference
Use metadata for call categorization
Archive results after processing

Next Steps

Getting Started - Basic setup
Audio Intelligence - Extract insights
API Reference - Complete endpoint documentation

Transcription Quality Metrics​

Quality Score​

Audio Analysis​

Audio Type Detection​

Channel-Specific Analysis​

Metrics Explanation​

Transcription Content​

Multi-Channel Transcription​

Transcription Fields​

Speaker Frequency Analysis​

Word Frequency Analysis​

Use Cases​

Call Duration Tracking​

Keyword Matching​

Keyword Fields​

Priority Processing​

Metadata Support​

Filename Existence Check​

Supported Audio Formats​

Language Support​

Best Practices​

1. Audio Quality​

2. Processing Strategy​

3. Error Handling​

4. Data Management​

Next Steps​