Off-the-shelf (OTS) Datasets

Mandarin Chinese (China) scripted microphone

Dataset ID:
MAC_ASR002
Dataset Name:
Mandarin Chinese (China) scripted microphone
Common Use Cases:
ASR, Virtual Assistant, Chatbot
Language:
Mandarin Chinese
Country:
China
Language Code:
cmn
Country Code:
CHN
Product Type
Audio
Detailed Product Type
Scripted Speech
Unit
26 hours
Recording Device
Microphone
Recording Condition
Mixed (quiet home/office, public, outdoor)
Contributors
132
Utterances
10,225
Unique Words
Available on request
Sample Rate (kHz):
16
Channels
1
Data Format
wav
Source
GlobalPhone
Additional Info:
  • Part of a multilingual corpus; tiered package prices available with purchase of multiple Global Phone languages or the full corpus
  • Dataset is fully transcribed and the transcription is available both in original script and in Romanized form
  • Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web to cover a wide domain with large vocabulary
  • Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Year of Collection
1996

Get Started with Off-the-Shelf AI Training Datasets

Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.

Talk to an expert