Off-the-Shelf AI Training Datasets

Cantonese (China) Traditional Part of Speech Dictionary

More info

Dataset successfully added to the Quote List

Common Use CasesASR, TTS, Language Modelling

Dataset IDyue_HKG_POS

TypeText

Unit10,000 words

LanguageCantonese

CountryChina

Cantonese (China) Traditional Pronunciation Dictionary

More info

Dataset successfully added to the Quote List

Common Use CasesASR, TTS, Language Modelling

Dataset IDyue_HKG_PHON

TypeText

Unit40,000 words

LanguageCantonese

CountryChina

Catalan (Spain) Pronunciation Dictionary

More info

Dataset successfully added to the Quote List

Common Use CasesASR, TTS, Language Modelling

Dataset IDcat_ESP_PHON

TypeText

Unit10,000 words

LanguageCatalan

CountrySpain

Cebuano (Philippines) Pronunciation Dictionary

More info

Dataset successfully added to the Quote List

Common Use CasesASR, TTS, Language Modelling

Dataset IDceb_PHL_PHON

TypeText

Unit21,000 words

LanguageCebuano

CountryPhilippines

Chinese and English related texts

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training

Dataset IDGLWB_CN

TypeText

Unit400000

LanguageEnglish/Chinese

CountryN/A

Chinese command and control prompt response corpus

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training, Command and Control, TV Player, Device Control

Dataset IDDSDH_corpus_CN

TypeText

Unit20000 sentences

LanguageChinese

CountryChina

Cantonese (China) Traditional Part of Speech Dictionary

Dataset successfully added to the Quote List

Cantonese (China) Traditional Pronunciation Dictionary

Dataset successfully added to the Quote List

Catalan (Spain) Pronunciation Dictionary

Dataset successfully added to the Quote List

Cebuano (Philippines) Pronunciation Dictionary

Dataset successfully added to the Quote List

Chinese and English related texts

Dataset successfully added to the Quote List

Chinese command and control prompt response corpus

Dataset successfully added to the Quote List

Get Started with Off-the-Shelf AI Training Datasets