Off-the-Shelf AI Training Datasets

Chinese instruction set sentence corpus

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training

Dataset IDZLJ_corpus_CN

TypeText

Unit200000 sentences

LanguageChinese

CountryChina

Chinese multidisciplinary test questions corpus

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training

Dataset IDMTQ_CN

TypeText

Unit319970 sentences

LanguageChinese

CountryChina

Chinese news text summaries corpus

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training

Dataset IDDMXWB_corpus_CN

TypeText

Unit20000 summaries

LanguageChinese

CountryChina

Code Q&A Dataset

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training

Dataset IDDM_CNRD

TypeText

Unit12 million pairs

LanguageEnglish

CountryN/A

Croatian (Croatia) scripted microphone

More info

Dataset successfully added to the Quote List

Common Use CasesASR, Virtual Assistant, Chatbot

Dataset IDCRO_ASR002

TypeAudio

Unit11 hours

LanguageCroatian

CountryCroatia

Croatian (Croatia) scripted smartphone

More info

Dataset successfully added to the Quote List

Common Use CasesASR, Virtual Assistant, Chatbot

Dataset IDCRO_ASR003_CN

TypeAudio

Unit263 hours

LanguageCroatian

CountryCroatia

Chinese instruction set sentence corpus

Dataset successfully added to the Quote List

Chinese multidisciplinary test questions corpus

Dataset successfully added to the Quote List

Chinese news text summaries corpus

Dataset successfully added to the Quote List

Code Q&A Dataset

Dataset successfully added to the Quote List

Croatian (Croatia) scripted microphone

Dataset successfully added to the Quote List

Croatian (Croatia) scripted smartphone

Dataset successfully added to the Quote List

Get Started with Off-the-Shelf AI Training Datasets