Skip to main content

Named Entity Recognition (NER)

Extract entities like names, locations, organizations, and dates from conversations.

Overview

Named Entity Recognition helps you:

  • Extract customer information
  • Identify organizations and locations
  • Recognize dates and times
  • Track entities mentioned in calls
  • Enable information enrichment

Supported Entity Types

Entity TypeExamplesUse Case
PERSON"John Smith", "Dr. Johnson"Customer/agent identification
ORGANIZATION"Acme Corp", "Microsoft"Company identification
LOCATION"New York", "California"Geographic references
DATE"January 15", "next Monday"Event/appointment dates
TIME"3:00 PM", "14:30"Meeting times, deadlines
PHONE_NUMBER"+1-555-0123"Contact information
EMAIL"john@example.com"Email addresses
MONEY"$50", "€200"Monetary amounts
PERCENTAGE"25%", "0.5%"Percentages and rates
URL"www.example.com"Website addresses

Getting Entity Data

From Transcript Status

curl -X GET "https://api.example.com/api/v1/transcripts/123456789/status" \
-H "Authorization: Bearer YOUR_TOKEN"

Response includes keywords with entity information:

{
"keywords": [
{
"name": "John Smith",
"type": "PERSON",
"score": 95,
"channelMatch": "left"
},
{
"name": "Acme Corporation",
"type": "ORGANIZATION",
"score": 92,
"channelMatch": "right"
}
]
}

Entity Extraction Example

Python Implementation

import requests

class EntityExtractor:
def __init__(self, api_base, token):
self.api_base = api_base
self.headers = {"Authorization": f"Bearer {token}"}

def extract_entities(self, transcript_id):
"""Extract all entities from transcript"""
response = requests.get(
f"{self.api_base}/api/v1/transcripts/{transcript_id}/status",
headers=self.headers
)

transcript = response.json()
keywords = transcript.get('keywords', [])

# Organize entities by type
entities = {
'PERSON': [],
'ORGANIZATION': [],
'LOCATION': [],
'DATE': [],
'TIME': [],
'PHONE_NUMBER': [],
'EMAIL': [],
'MONEY': [],
'PERCENTAGE': [],
'URL': []
}

for keyword in keywords:
# Extract entity type from keyword metadata
entity_type = keyword.get('type', 'OTHER')
if entity_type in entities:
entities[entity_type].append({
'name': keyword['name'],
'score': keyword['score'],
'channel': keyword.get('channelMatch')
})

return entities

def get_persons(self, transcript_id):
"""Get all person entities"""
entities = self.extract_entities(transcript_id)
return entities['PERSON']

def get_organizations(self, transcript_id):
"""Get all organization entities"""
entities = self.extract_entities(transcript_id)
return entities['ORGANIZATION']

def get_contact_info(self, transcript_id):
"""Extract contact information"""
entities = self.extract_entities(transcript_id)
return {
'phone_numbers': entities['PHONE_NUMBER'],
'emails': entities['EMAIL'],
'urls': entities['URL']
}

def get_financial_info(self, transcript_id):
"""Extract financial information"""
entities = self.extract_entities(transcript_id)
return {
'amounts': entities['MONEY'],
'percentages': entities['PERCENTAGE']
}

# Usage
extractor = EntityExtractor("https://api.example.com", "your_token")

# Extract entities from transcript
entities = extractor.extract_entities(123456789)
print("Extracted entities:", entities)

# Get specific entity types
persons = extractor.get_persons(123456789)
print("Persons mentioned:", persons)

contact_info = extractor.get_contact_info(123456789)
print("Contact information:", contact_info)

Entity Confidence Scoring

Each entity has a confidence score:

{
"name": "John Smith",
"score": 95,
"confidence_level": "Very High"
}

Confidence Levels:

  • 90-100: Very High
  • 75-89: High
  • 50-74: Medium
  • <50: Low (needs verification)

Entity Linking

Associate entities across conversations:

def create_customer_profile(transcript_id, token):
"""Create customer profile from extracted entities"""
extractor = EntityExtractor("https://api.example.com", token)

entities = extractor.extract_entities(transcript_id)
contact_info = extractor.get_contact_info(transcript_id)

profile = {
'names': entities['PERSON'],
'organizations': entities['ORGANIZATION'],
'locations': entities['LOCATION'],
'phone_numbers': contact_info['phone_numbers'],
'emails': contact_info['emails'],
'related_dates': entities['DATE'],
'financial_info': {
'amounts': entities['MONEY'],
'percentages': entities['PERCENTAGE']
}
}

return profile

Use Cases

1. Customer Data Enrichment

  • Extract customer names and contact info
  • Identify customers mentioned in calls
  • Update CRM with call-extracted data
  • Track customer interactions

2. Compliance & Audit

  • Extract sensitive information (PII)
  • Track PCI compliance
  • Audit data mentions
  • Generate compliance reports

3. Knowledge Management

  • Extract product names and versions
  • Identify mentioned competitors
  • Track topics and dates
  • Build knowledge base

4. Analysis & Reporting

  • Track mentioned organizations
  • Analyze geographic distribution
  • Extract financial information
  • Generate business intelligence

5. Automated Actions

  • Create calendar events from dates
  • Add contacts from phone numbers
  • Track follow-ups needed
  • Trigger workflows based on entities

Entity Extraction Patterns

Contact Information Pattern

def extract_call_contact_info(transcript_id, token):
"""Extract contact information to update CRM"""
extractor = EntityExtractor("https://api.example.com", token)
entities = extractor.extract_entities(transcript_id)

return {
'customer_name': entities['PERSON'][0] if entities['PERSON'] else None,
'phone': entities['PHONE_NUMBER'][0] if entities['PHONE_NUMBER'] else None,
'email': entities['EMAIL'][0] if entities['EMAIL'] else None,
'company': entities['ORGANIZATION'][0] if entities['ORGANIZATION'] else None,
'location': entities['LOCATION'][0] if entities['LOCATION'] else None
}

Financial Transaction Pattern

def extract_financial_data(transcript_id, token):
"""Extract financial transaction details"""
extractor = EntityExtractor("https://api.example.com", token)
entities = extractor.extract_entities(transcript_id)

return {
'amounts': entities['MONEY'],
'percentages': entities['PERCENTAGE'],
'dates': entities['DATE'],
'persons': entities['PERSON']
}

Event Scheduling Pattern

def extract_appointment_info(transcript_id, token):
"""Extract appointment scheduling information"""
extractor = EntityExtractor("https://api.example.com", token)
entities = extractor.extract_entities(transcript_id)

return {
'date': entities['DATE'][0] if entities['DATE'] else None,
'time': entities['TIME'][0] if entities['TIME'] else None,
'person': entities['PERSON'][0] if entities['PERSON'] else None,
'location': entities['LOCATION'][0] if entities['LOCATION'] else None
}

PII (Personal Identifiable Information) Handling

Sensitive Entity Types

{
"sensitive_entities": [
{
"type": "PERSON",
"is_pii": true,
"encryption_required": true
},
{
"type": "PHONE_NUMBER",
"is_pii": true,
"encryption_required": true
},
{
"type": "EMAIL",
"is_pii": true,
"encryption_required": true
}
]
}

Best Practices for PII

  1. Data Protection

    • Encrypt sensitive entities at rest
    • Use HTTPS for transmission
    • Implement access controls
    • Audit access logs
  2. Retention

    • Define retention policies
    • Automatically delete after period
    • Comply with privacy regulations
    • Document data handling
  3. Compliance

    • GDPR: Right to be forgotten
    • CCPA: Consumer privacy rights
    • HIPAA: Healthcare data protection
    • Industry-specific regulations

Entity Statistics

Generate Entity Report

def generate_entity_report(date_from, date_to, token):
"""Generate report on extracted entities"""
api_base = "https://api.example.com"
headers = {"Authorization": f"Bearer {token}"}

# Get all transcripts in date range
response = requests.get(
f"{api_base}/api/v1/transcripts",
headers=headers,
params={
'DateFrom': date_from,
'DateTo': date_to,
'Rows': 1000
}
)

transcripts = response.json().get('data', [])

# Aggregate entity statistics
entity_stats = {
'PERSON': {},
'ORGANIZATION': {},
'LOCATION': {},
'PHONE_NUMBER': 0,
'EMAIL': 0,
'MONEY': [],
'total_entities': 0
}

for transcript in transcripts:
keywords = transcript.get('keywords', [])
for keyword in keywords:
entity_type = keyword.get('type', 'OTHER')
entity_name = keyword['name']

if entity_type == 'PERSON':
entity_stats['PERSON'][entity_name] = \
entity_stats['PERSON'].get(entity_name, 0) + 1
elif entity_type == 'ORGANIZATION':
entity_stats['ORGANIZATION'][entity_name] = \
entity_stats['ORGANIZATION'].get(entity_name, 0) + 1
elif entity_type == 'PHONE_NUMBER':
entity_stats['PHONE_NUMBER'] += 1

entity_stats['total_entities'] += 1

return entity_stats

Visualization

Entity Frequency

Apple Inc:           ████████░ 45 mentions
John Smith: ██████░░░ 32 mentions
San Francisco: █████░░░░ 28 mentions
Microsoft: ████░░░░░ 22 mentions
Jane Doe: ██░░░░░░░ 12 mentions

Entity Type Distribution

PERSON:         ███████░░ 35%
ORGANIZATION: ██████░░░ 30%
LOCATION: ████░░░░░ 20%
MONEY: ████░░░░░ 10%
PHONE_NUMBER: ░░░░░░░░░ 2%
OTHER: ░░░░░░░░░ 3%

Troubleshooting

Entities Not Extracted

  • Insufficient speech content
  • Poor audio quality affecting transcription
  • Entity type not supported
  • Low confidence scores

Incorrect Extraction

  • Similar sounding names
  • Multiple entity references
  • Acronyms and abbreviations
  • Specialized terminology

PII Not Masked

  • Entity type may not be classified as PII
  • Requires explicit configuration
  • Check redaction settings

Next Steps