Off-the-shelf (OTS) Datasets

Arabic (Levantine) scripted microphone

Dataset ID:
ARU_ASR002
Dataset Name:
Arabic (Levantine) scripted microphone
Common Use Cases:
ASR, Virtual Assistant, Chatbot
Language:
Arabic
Country:
United Arab Emirates
Language Code:
ara
Country Code:
UAE
Product Type
Audio
Detailed Product Type
Scripted Speech
Unit
32 hours
Recording Device
Microphone
Recording Condition
Low background noise (studio)
Contributors
100
Utterances
Available upon request
Unique Words
Available upon request
Sample Rate (kHz):
48
Channels
1
Data Format
wav
Source
Appen Global
Additional Info:
  • Studio-recording scripted audio with corresponding text prompts. Sessions were approx 20 mins long with 210 prompts per session. Levantine Arabic speakers from Syria, Lebanon, Palestine and Jordan. Prompts include yes/no questions and answers, open-ended questions and answers, brand/company names, addresses, points of interest, medicine/disease names, synthetic person names, media names (TV, movies, books etc), synthesised digit strings (phone number, credit card, PIN number, time, date), and phonetically rich sentences. Transcription can be developed upon request.
Year of Collection
2023

Get Started with Off-the-Shelf AI Training Datasets

Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.

Talk to an expert