Named Entity Recognition (NER)
Extract entities like names, locations, organizations, and dates from conversations.
Overview
Named Entity Recognition helps you:
- Extract customer information
- Identify organizations and locations
- Recognize dates and times
- Track entities mentioned in calls
- Enable information enrichment
Supported Entity Types
| Entity Type | Examples | Use Case |
|---|---|---|
| PERSON | "John Smith", "Dr. Johnson" | Customer/agent identification |
| ORGANIZATION | "Acme Corp", "Microsoft" | Company identification |
| LOCATION | "New York", "California" | Geographic references |
| DATE | "January 15", "next Monday" | Event/appointment dates |
| TIME | "3:00 PM", "14:30" | Meeting times, deadlines |
| PHONE_NUMBER | "+1-555-0123" | Contact information |
| "john@example.com" | Email addresses | |
| MONEY | "$50", "€200" | Monetary amounts |
| PERCENTAGE | "25%", "0.5%" | Percentages and rates |
| URL | "www.example.com" | Website addresses |
Getting Entity Data
From Transcript Status
curl -X GET "https://api.example.com/api/v1/transcripts/123456789/status" \
-H "Authorization: Bearer YOUR_TOKEN"
Response includes keywords with entity information:
{
"keywords": [
{
"name": "John Smith",
"type": "PERSON",
"score": 95,
"channelMatch": "left"
},
{
"name": "Acme Corporation",
"type": "ORGANIZATION",
"score": 92,
"channelMatch": "right"
}
]
}
Entity Extraction Example
Python Implementation
import requests
class EntityExtractor:
def __init__(self, api_base, token):
self.api_base = api_base
self.headers = {"Authorization": f"Bearer {token}"}
def extract_entities(self, transcript_id):
"""Extract all entities from transcript"""
response = requests.get(
f"{self.api_base}/api/v1/transcripts/{transcript_id}/status",
headers=self.headers
)
transcript = response.json()
keywords = transcript.get('keywords', [])
# Organize entities by type
entities = {
'PERSON': [],
'ORGANIZATION': [],
'LOCATION': [],
'DATE': [],
'TIME': [],
'PHONE_NUMBER': [],
'EMAIL': [],
'MONEY': [],
'PERCENTAGE': [],
'URL': []
}
for keyword in keywords:
# Extract entity type from keyword metadata
entity_type = keyword.get('type', 'OTHER')
if entity_type in entities:
entities[entity_type].append({
'name': keyword['name'],
'score': keyword['score'],
'channel': keyword.get('channelMatch')
})
return entities
def get_persons(self, transcript_id):
"""Get all person entities"""
entities = self.extract_entities(transcript_id)
return entities['PERSON']
def get_organizations(self, transcript_id):
"""Get all organization entities"""
entities = self.extract_entities(transcript_id)
return entities['ORGANIZATION']
def get_contact_info(self, transcript_id):
"""Extract contact information"""
entities = self.extract_entities(transcript_id)
return {
'phone_numbers': entities['PHONE_NUMBER'],
'emails': entities['EMAIL'],
'urls': entities['URL']
}
def get_financial_info(self, transcript_id):
"""Extract financial information"""
entities = self.extract_entities(transcript_id)
return {
'amounts': entities['MONEY'],
'percentages': entities['PERCENTAGE']
}
# Usage
extractor = EntityExtractor("https://api.example.com", "your_token")
# Extract entities from transcript
entities = extractor.extract_entities(123456789)
print("Extracted entities:", entities)
# Get specific entity types
persons = extractor.get_persons(123456789)
print("Persons mentioned:", persons)
contact_info = extractor.get_contact_info(123456789)
print("Contact information:", contact_info)
Entity Confidence Scoring
Each entity has a confidence score:
{
"name": "John Smith",
"score": 95,
"confidence_level": "Very High"
}
Confidence Levels:
- 90-100: Very High
- 75-89: High
- 50-74: Medium
- <50: Low (needs verification)
Entity Linking
Associate entities across conversations:
def create_customer_profile(transcript_id, token):
"""Create customer profile from extracted entities"""
extractor = EntityExtractor("https://api.example.com", token)
entities = extractor.extract_entities(transcript_id)
contact_info = extractor.get_contact_info(transcript_id)
profile = {
'names': entities['PERSON'],
'organizations': entities['ORGANIZATION'],
'locations': entities['LOCATION'],
'phone_numbers': contact_info['phone_numbers'],
'emails': contact_info['emails'],
'related_dates': entities['DATE'],
'financial_info': {
'amounts': entities['MONEY'],
'percentages': entities['PERCENTAGE']
}
}
return profile
Use Cases
1. Customer Data Enrichment
- Extract customer names and contact info
- Identify customers mentioned in calls
- Update CRM with call-extracted data
- Track customer interactions
2. Compliance & Audit
- Extract sensitive information (PII)
- Track PCI compliance
- Audit data mentions
- Generate compliance reports
3. Knowledge Management
- Extract product names and versions
- Identify mentioned competitors
- Track topics and dates
- Build knowledge base
4. Analysis & Reporting
- Track mentioned organizations
- Analyze geographic distribution
- Extract financial information
- Generate business intelligence
5. Automated Actions
- Create calendar events from dates
- Add contacts from phone numbers
- Track follow-ups needed
- Trigger workflows based on entities
Entity Extraction Patterns
Contact Information Pattern
def extract_call_contact_info(transcript_id, token):
"""Extract contact information to update CRM"""
extractor = EntityExtractor("https://api.example.com", token)
entities = extractor.extract_entities(transcript_id)
return {
'customer_name': entities['PERSON'][0] if entities['PERSON'] else None,
'phone': entities['PHONE_NUMBER'][0] if entities['PHONE_NUMBER'] else None,
'email': entities['EMAIL'][0] if entities['EMAIL'] else None,
'company': entities['ORGANIZATION'][0] if entities['ORGANIZATION'] else None,
'location': entities['LOCATION'][0] if entities['LOCATION'] else None
}
Financial Transaction Pattern
def extract_financial_data(transcript_id, token):
"""Extract financial transaction details"""
extractor = EntityExtractor("https://api.example.com", token)
entities = extractor.extract_entities(transcript_id)
return {
'amounts': entities['MONEY'],
'percentages': entities['PERCENTAGE'],
'dates': entities['DATE'],
'persons': entities['PERSON']
}
Event Scheduling Pattern
def extract_appointment_info(transcript_id, token):
"""Extract appointment scheduling information"""
extractor = EntityExtractor("https://api.example.com", token)
entities = extractor.extract_entities(transcript_id)
return {
'date': entities['DATE'][0] if entities['DATE'] else None,
'time': entities['TIME'][0] if entities['TIME'] else None,
'person': entities['PERSON'][0] if entities['PERSON'] else None,
'location': entities['LOCATION'][0] if entities['LOCATION'] else None
}
PII (Personal Identifiable Information) Handling
Sensitive Entity Types
{
"sensitive_entities": [
{
"type": "PERSON",
"is_pii": true,
"encryption_required": true
},
{
"type": "PHONE_NUMBER",
"is_pii": true,
"encryption_required": true
},
{
"type": "EMAIL",
"is_pii": true,
"encryption_required": true
}
]
}
Best Practices for PII
-
Data Protection
- Encrypt sensitive entities at rest
- Use HTTPS for transmission
- Implement access controls
- Audit access logs
-
Retention
- Define retention policies
- Automatically delete after period
- Comply with privacy regulations
- Document data handling
-
Compliance
- GDPR: Right to be forgotten
- CCPA: Consumer privacy rights
- HIPAA: Healthcare data protection
- Industry-specific regulations
Entity Statistics
Generate Entity Report
def generate_entity_report(date_from, date_to, token):
"""Generate report on extracted entities"""
api_base = "https://api.example.com"
headers = {"Authorization": f"Bearer {token}"}
# Get all transcripts in date range
response = requests.get(
f"{api_base}/api/v1/transcripts",
headers=headers,
params={
'DateFrom': date_from,
'DateTo': date_to,
'Rows': 1000
}
)
transcripts = response.json().get('data', [])
# Aggregate entity statistics
entity_stats = {
'PERSON': {},
'ORGANIZATION': {},
'LOCATION': {},
'PHONE_NUMBER': 0,
'EMAIL': 0,
'MONEY': [],
'total_entities': 0
}
for transcript in transcripts:
keywords = transcript.get('keywords', [])
for keyword in keywords:
entity_type = keyword.get('type', 'OTHER')
entity_name = keyword['name']
if entity_type == 'PERSON':
entity_stats['PERSON'][entity_name] = \
entity_stats['PERSON'].get(entity_name, 0) + 1
elif entity_type == 'ORGANIZATION':
entity_stats['ORGANIZATION'][entity_name] = \
entity_stats['ORGANIZATION'].get(entity_name, 0) + 1
elif entity_type == 'PHONE_NUMBER':
entity_stats['PHONE_NUMBER'] += 1
entity_stats['total_entities'] += 1
return entity_stats
Visualization
Entity Frequency
Apple Inc: ████████░ 45 mentions
John Smith: ██████░░░ 32 mentions
San Francisco: █████░░░░ 28 mentions
Microsoft: ████░░░░░ 22 mentions
Jane Doe: ██░░░░░░░ 12 mentions
Entity Type Distribution
PERSON: ███████░░ 35%
ORGANIZATION: ██████░░░ 30%
LOCATION: ████░░░░░ 20%
MONEY: ████░░░░░ 10%
PHONE_NUMBER: ░░░░░░░░░ 2%
OTHER: ░░░░░░░░░ 3%
Troubleshooting
Entities Not Extracted
- Insufficient speech content
- Poor audio quality affecting transcription
- Entity type not supported
- Low confidence scores
Incorrect Extraction
- Similar sounding names
- Multiple entity references
- Acronyms and abbreviations
- Specialized terminology
PII Not Masked
- Entity type may not be classified as PII
- Requires explicit configuration
- Check redaction settings
Next Steps
- Sentiment Analysis - Emotional analysis
- Topic Detection - Topic identification
- Summarization - Call summaries
- Translation - Language translation