Off-the-shelf (OTS) Datasets

Chinese (multinational foreigner) scripted smartphone

Dataset ID:
FOREIGNER_ASR001_CN
Dataset Name:
Chinese (multinational foreigner) scripted smartphone
Common Use Cases:
ASR, Conversational AI, Speech Analytics
Language:
Mandarin Chinese
Country:
China
Language Code:
cmn
Country Code:
CHN
Product Type
Audio
Detailed Product Type
Scripted Speech
Unit
200 hours
Recording Device
Mobile phone
Recording Condition
Low background noise
Contributors
309
Utterances
Unique Words
Sample Rate (kHz):
16
Channels
1
Data Format
wav
Source
Appen China
Additional Info:
  • Dataset contains audio with corresponding text prompts.
  • This database contains 200 hours of foreigners speaking Chinese from the following countries: Argentina, Egypt, Australia, Russia, the Philippines, Kazakhstan, Korea, Kyrgyzstan, Canada, Kuala Lumpur, Kenya, Laos, Malaysia, Mauritius, the United States, Mongolia, South Africa, Japan, Tajikistan, Thailand, Turkey, Hong Kong, Singapore, India, Indonesia, Vietnam
  • There is no data from South Korea, Brazil, or data recorded by minors.
  • Each session lasts about an hour; sentence duration ranges between 3-10 seconds
  • The content is in the form of an individual reading while being recorded on a mobile phone in a home/office environment.
  • Sensitive data and personal information has been scrubbed.
Year of Collection
2020

Get Started with Off-the-Shelf AI Training Datasets

Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.

Talk to an expert