Speech-to-Text Features
Transcription Quality Metrics
SpeechLytics provides comprehensive quality metrics for every transcription to help you understand the audio characteristics and processing quality.
Quality Score
Each transcript receives a quality score (0-100) based on:
- Audio clarity and signal-to-noise ratio
- Language detection confidence
- Speaker diarization accuracy
- Overall transcription accuracy
{
"score": 95.5,
"inScope": true,
"audioType": 1,
"audioTypeDescription": "Stereo"
}
Audio Analysis
Audio Type Detection
The API automatically detects and reports the audio format:
| Type | Code | Description |
|---|---|---|
| Mono | 0 | Single channel audio |
| Stereo | 1 | Two-channel audio (left and right) |
Channel-Specific Analysis
For stereo audio, SpeechLytics analyzes each channel separately:
{
"leftChannelSilence": 5,
"rightChannelSilence": 3,
"bothChannelSilence": 1,
"leftChannelMusic": 2,
"rightChannelMusic": 0,
"bothChannelMusic": 0,
"leftChannelNoise": 10,
"rightChannelNoise": 8,
"bothChannelNoise": 2,
"leftChannelLaughter": 1,
"rightChannelLaughter": 1,
"bothChannelLaughter": 0
}
Metrics Explanation
- Silence Duration: Percentage of time with no speech
- Music Duration: Percentage of time with background music
- Noise Duration: Percentage of time with background noise
- Laughter Duration: Percentage of time with laughter detected
- Channel Termination: Whether each channel ended properly
{
"leftTerminatedOk": true,
"rightTerminatedOk": true,
"bothTerminatedOk": true
}
Transcription Content
Multi-Channel Transcription
Transcriptions are provided for different channel combinations:
{
"transcription": {
"language": "en",
"leftChannel": [
{
"start": 0.5,
"end": 2.3,
"duration": 1.8,
"content": "Hello, this is the agent speaking",
"language": "en",
"channel": "left"
}
],
"rightChannel": [
{
"start": 2.5,
"end": 4.1,
"duration": 1.6,
"content": "Hi there, I need help with my account",
"language": "en",
"channel": "right"
}
],
"bothChannels": [...]
}
}
Transcription Fields
| Field | Description |
|---|---|
| start | Start time in seconds |
| end | End time in seconds |
| duration | Duration in seconds |
| content | The transcribed text |
| language | Detected language code |
| channel | Which channel(s) this segment came from |
Speaker Frequency Analysis
Understand speaker distribution in your conversations:
{
"speakerFrequencyPercentage": {
"left": 48.5,
"right": 51.5
}
}
This helps you identify:
- Agent vs. customer talk time
- Speaker balance
- Engagement patterns
Word Frequency Analysis
Get insights into the most frequently used words:
{
"wordFrequencyLeft": [
{
"word": "customer",
"count": 12
},
{
"word": "help",
"count": 8
},
{
"word": "account",
"count": 7
}
],
"wordFrequencyRight": [...],
"wordFrequencyBoth": [...]
}
Use Cases
- Identify key discussion topics
- Track terminology usage
- Detect scripting adherence
- Monitor compliance keywords
Call Duration Tracking
Monitor conversation length and timing:
{
"duration": 180,
"ringing": 15,
"filteredPhraseDuration": 165,
"created": "2025-11-28T10:00:00Z",
"modified": "2025-11-28T10:05:00Z"
}
| Field | Description |
|---|---|
| duration | Total call length in seconds |
| ringing | Time spent ringing before connection |
| filteredPhraseDuration | Actual conversation time (excluding silence/music/noise) |
| created | Timestamp when transcription was initiated |
| modified | Timestamp when transcription was completed |
Keyword Matching
Detect specific keywords and phrases in conversations:
{
"keywords": [
{
"name": "urgent",
"score": 95,
"isExclusive": false,
"outOfScope": false,
"tags": [
{
"name": "priority",
"frequency": 3
}
],
"channelMatch": "left"
}
]
}
Keyword Fields
| Field | Description |
|---|---|
| name | The keyword or phrase detected |
| score | Confidence score (0-100) |
| isExclusive | Whether keyword is restricted |
| outOfScope | Whether keyword is out of scope |
| tags | Associated tags and frequencies |
| channelMatch | Which channel detected the keyword |
Priority Processing
For time-sensitive transcriptions, use the HasPriority flag:
curl -X POST "https://api.example.com/api/v1/transcribe" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"DataBase64": "...",
"Filename": "urgent_call.wav",
"Language": "Auto",
"HasPriority": true
}'
Benefits:
- Faster processing
- Higher queue priority
- Ideal for critical calls
- May have additional costs
Metadata Support
Store custom metadata with each transcription:
curl -X POST "https://api.example.com/api/v1/transcribe" \
-d '{
"DataBase64": "...",
"Filename": "call.wav",
"Metadata": "agent_id=5678;campaign=retention;customer_id=9012;priority=high"
}'
Metadata can be used for:
- Tracking agent performance
- Campaign attribution
- Customer segmentation
- Call categorization
Filename Existence Check
Prevent duplicate transcriptions:
curl -X POST "https://api.example.com/api/v1/transcribe" \
-d '{
"DataBase64": "...",
"Filename": "call_2025_11_28_123456.wav",
"CheckFilenameExistence": true
}'
When enabled:
- API checks if filename already exists
- Returns existing result if found
- Prevents redundant processing
- Saves processing time and costs
Supported Audio Formats
- WAV (.wav)
- MP3 (.mp3)
- M4A (.m4a)
- OGG (.ogg)
- FLAC (.flac)
- AAC (.aac)
Language Support
Automatic language detection or manual specification for 100+ languages:
- English, Spanish, French, German, Italian, Portuguese
- Chinese (Simplified & Traditional), Japanese, Korean
- Russian, Polish, Dutch, Swedish, Danish, Norwegian
- Hindi, Arabic, Hebrew, Turkish, Thai
- And many more...
For best results, specify the language if known in advance.
Best Practices
1. Audio Quality
- Use clear audio with minimal background noise
- Ensure proper audio levels (not too loud or too quiet)
- Use stereo when possible for better separation
2. Processing Strategy
- Use
HasPriority: truefor urgent calls only - Batch non-urgent transcriptions to save costs
- Implement exponential backoff for status checks
3. Error Handling
- Handle status code 3 (Failed) gracefully
- Implement retry logic for transient failures
- Monitor quota limits (status code 4)
4. Data Management
- Store transcription IDs for future reference
- Use metadata for call categorization
- Archive results after processing
Next Steps
- Getting Started - Basic setup
- Audio Intelligence - Extract insights
- API Reference - Complete endpoint documentation