Off-the-shelf (OTS) Datasets

Arabic (Modern Standard Arabic) scripted microphone

Dataset ID:
MSA_ASR001
Dataset Name:
Arabic (Modern Standard Arabic) scripted microphone
Common Use Cases:
ASR, Virtual Assistant, Chatbot
Language:
Arabic
Country:
Tunisia
Language Code:
ara
Country Code:
TUN
Product Type
Audio
Detailed Product Type
Scripted Speech
Unit
12 hours
Recording Device
Microphone
Recording Condition
Mixed (quiet home/office, public, outdoor)
Contributors
78
Utterances
4,908
Unique Words
40,000
Sample Rate (kHz):
16
Channels
1
Data Format
wav
Source
GlobalPhone
Additional Info:
  • Part of a multilingual corpus; tiered package prices available with purchase of multiple Global Phone languages or the full corpus
  • Dataset is fully transcribed and the transcription is available both in original script and in Romanized form
  • Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web to cover a wide domain with large vocabulary
  • Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Year of Collection
1996, 1999, 2000

Get Started with Off-the-Shelf AI Training Datasets

Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.

Talk to an expert