Dataset ID:
CGA_ASR001
Dataset Name:
Arabic (United Arab Emirates (UAE)/ Saudi Arabia) scripted microphone
Common Use Cases:
ASR, Virtual Assistant, Chatbot
Language:
Arabic
Country:
United Arab Emirates (UAE) - Saudi Arabia
Language Code:
ara
Country Code:
ARE - SAU
Product Type
Audio
Detailed Product Type
Scripted Speech
Unit
86 hours
Recording Device
Microphone
Recording Condition
Low background noise (home/office)
Contributors
150
Utterances
42,000
Unique Words
19,245
Sample Rate (kHz):
16
Channels
4
Data Format
raw PCM
Source
Appen Global
Additional Info:
- Fully transcribed with acoustic event tagging derived from the SpeechDAT conventions
- Dataset is accompanied by a pronunciation lexicon containing all transcribed words
- All transcriptions fully vowelized
- 280 prompts per speaker including 30 Person names (first name and family name) from a set of 15, 10 single isolated digits 0-10, 8-digit sequences (randomly generated), 200 phonetically balanced sentences, 30 x 10-word phonetically balanced word strings
Year of Collection
2003
Get Started with Off-the-Shelf AI Training Datasets
Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.
Talk to an expert