Off-the-Shelf AI Training Datasets

Chinese news text summaries corpus

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training

Dataset IDDMXWB_corpus_CN

TypeText

Unit20000 summaries

LanguageChinese

CountryChina

Code Q&A Dataset

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training

Dataset IDDM_CNRD

TypeText

Unit12 million pairs

LanguageEnglish

CountryN/A

Croatian (Croatia) conversational telephony

More info

Dataset successfully added to the Quote List

Common Use CasesASR, Conversational AI, Speech Analytics

Dataset IDCRO_ASR001

TypeAudio

Unit39 hours

LanguageCroatian

CountryCroatia

Dari (Afghanistan) broadcast

More info

Dataset successfully added to the Quote List

Common Use CasesASR, Automatic Captioning, Keyword Spotting

Dataset IDDAR_BRC001

TypeAudio

Unit49 hours

LanguageDari

CountryAfghanistan

Dari (Afghanistan) conversational telephony

More info

Dataset successfully added to the Quote List

Common Use CasesASR, Conversational AI, Speech Analytics

Dataset IDDAR_ASR001

TypeAudio

Unit40 hours

LanguageDari

CountryAfghanistan

Dongbei dialect (China) Conversational Speech

More info

Dataset successfully added to the Quote List

Common Use CasesASR, Conversational AI, Speech Analytics

Dataset IDDONGBEI_ASR002_CN

TypeAudio

Unit75.2 hours

LanguageDongbei dialect

CountryChina

Chinese news text summaries corpus

Dataset successfully added to the Quote List

Code Q&A Dataset

Dataset successfully added to the Quote List

Croatian (Croatia) conversational telephony

Dataset successfully added to the Quote List

Dari (Afghanistan) broadcast

Dataset successfully added to the Quote List

Dari (Afghanistan) conversational telephony

Dataset successfully added to the Quote List

Dongbei dialect (China) Conversational Speech

Dataset successfully added to the Quote List

Get Started with Off-the-Shelf AI Training Datasets