Off-the-shelf (OTS) Datasets

Dongbei dialect (China) Conversational Speech

Dataset ID:
DONGBEI_ASR001_CN
Dataset Name:
Dongbei dialect (China) Conversational Speech
Common Use Cases:
ASR, Conversational AI, Speech Analytics
Language:
Dongbei dialect
Country:
China
Language Code:
cmn
Country Code:
CHN
Product Type
Audio
Detailed Product Type
Conversational Speech
Unit
84.6 hours
Recording Device
Recording pen/microphone
Recording Condition
Low background noise
Contributors
268
Utterances
Unique Words
Sample Rate (kHz):
16
Channels
1
Data Format
wav
Source
Appen China
Additional Info:
  • Audio only; transcription in development for Q1 2025
  • Audio recordings cover 19 districts: Shenyang Heping District, Shenhe District, Huanggu District, Dadong District, Tiexi District, Lvyuan District, Chaoyang District, Kuancheng District, Erdao District, Nanguan District, Daoli District, Nangang District, Daowai District, Pingfang District, Songbei District, Xiangfang District, Hulan District, Acheng District and Shuangcheng District
  • Northeast suburb accents not included, and no minors were recorded.
  • Each recording session contains 20-30 minutes of free dialogue between 2-5 people.
  • Sensitive data and personal information has been scrubbed.
Year of Collection
2020

Get Started with Off-the-Shelf AI Training Datasets

Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.

Talk to an expert