Dataset ID:
USE_ASR001
Dataset Name:
English (United States) scripted microphone
Common Use Cases:
ASR, Virtual Assistant, Chatbot
Language:
English
Country:
United States
Language Code:
eng
Country Code:
USA
Product Type
Audio
Detailed Product Type
Scripted Speech
Unit
62 hours
Recording Device
Microphone
Recording Condition
Low background noise (studio)
Contributors
200
Utterances
80,000
Unique Words
18,318
Sample Rate (kHz):
48
Channels
2
Data Format
raw PCM or wav PCM
Source
Appen Global
Additional Info:
- Dataset is fully transcribed and timestamped
- Dataset is formatted according to SALA II/SpeechDAT style conventions
- Dataset is accompanied by a pronunciation lexicon containing all transcribed words
- Each speaker read 400 prompts including digits, natural numbers, personal and city names, telephone numbers, generic command and control items, phonetically rich sentences and words
Year of Collection
2009
Get Started with Off-the-Shelf AI Training Datasets
Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.
Talk to an expert