Off-the-shelf (OTS) Datasets

Spanish (Latin America) scripted microphone

Dataset ID:
ESL_ASR001
Dataset Name:
Spanish (Latin America) scripted microphone
Common Use Cases:
ASR, Virtual Assistant, Chatbot
Language:
Spanish
Country:
Costa Rica
Language Code:
spa
Country Code:
CRI
Product Type
Audio
Detailed Product Type
Scripted Speech
Unit
17 hours
Recording Device
Microphone
Recording Condition
Mixed (quiet home/office, public, outdoor)
Contributors
100
Utterances
6,898
Unique Words
Available on request
Sample Rate (kHz):
16
Channels
1
Data Format
wav
Source
GlobalPhone
Additional Info:
  • Part of a multilingual corpus; tiered package prices available with purchase of multiple Global Phone languages or the full corpus
  • Dataset is fully transcribed and the transcription is available both in original script and in Romanized form
  • Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web to cover a wide domain with large vocabulary
  • Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Year of Collection
1996, 1997

Get Started with Off-the-Shelf AI Training Datasets

Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.

Talk to an expert