Dataset ID:
EAR_ASR001
Dataset Name:
Arabic (Eastern Algeria) conversational telephony
Common Use Cases:
ASR, Conversational AI, Speech Analytics
Language:
Arabic
Country:
Algeria
Language Code:
ara
Country Code:
DZA
Product Type
Audio
Detailed Product Type
Conversational Speech
Unit
29 hours
Recording Device
Mobile phone and landline
Recording Condition
Low background noise (home/office)
Contributors
496
Utterances
32,899
Unique Words
15,314
Sample Rate (kHz):
8
Channels
2
Data Format
alaw
Source
Appen Global
Additional Info:
- Dataset is fully transcribed and timestamped
- Dataset is accompanied by a pronunciation lexicon containing all transcribed words
- For the majority of calls, both speakers (in-line/out-line) were collected and transcribed however, for a smaller number of calls, only one half of the conversation was collected and transcribed
- 8% landline, 92% mobile
Year of Collection
2007
Get Started with Off-the-Shelf AI Training Datasets
Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.
Talk to an expert