English (United States) Ultra High-Volume labeled speech

Name: English (United States) Ultra High-Volume labeled speech
SKU: 7f100b7b3609
Availability: InStock

Dataset successfully added to the Quote List

Dataset ID:

USE_UHV001

Dataset Name:

English (United States) Ultra High-Volume labeled speech

Common Use Cases:

ASR, Conversational AI, Speech Analytics, Automatic Captioning, In Car HMI & Entertainment, Virtual Assistant

Language:

English

Country:

United States

Language Code:

eng

Country Code:

USA

Product Type

Audio

Detailed Product Type

Broadcast Speech

Unit

1196 hours

Recording Device

N/A

Recording Condition

Low background noise

Contributors

20472

Utterances

423371

Unique Words

110265

Sample Rate (kHz):

Channels

Data Format

wav

Source

Appen Global

Additional Info:

Customised packaging available
High quality labelled speech datasets of web-sourced licensable broadcast audio data, curated to ensure representative speaker demographic distributions, and filtered through human quality checks.
12.6M total words
Utterance-level labelling includes: speech transcription, accent identification, speaker identification, verification, gender and age-group detection, domain classification.
Domains include: Agriculture & plants, Animals & Pets, Art & Culture, Beauty & Fashion, Career, Clothing, Education, Entertainment, Family & Relationships, Finance & Insurance, Food, Health, History, Hospitality, Legal, Leisure, News & Politics, Religion & Spirituality, Retail, Science & Technology, Social Networks, Sports, Telecom, Travel, Weather, Others

Year of Collection

2022

Get Started with Off-the-Shelf AI Training Datasets

Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.

Talk to an expert