Off-the-Shelf AI Training Datasets

Arabic (MSA) Pronunciation Dictionary

More info

Dataset successfully added to the Quote List

Common Use CasesASR, TTS, Language Modelling

Dataset IDarb_MSA_PHON

TypeText

Unit40,000 words

LanguageArabic (Standard)

CountryN/A

Business-to-business printed text document OCR

More info

Dataset successfully added to the Quote List

Common Use CasesDocument Processing, Document Search, Text detection

Dataset IDIMG_OCR_B2B

TypeImage

Unit5,838 documents

LanguageN/A

CountryN/A

Chinese and English related texts

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training

Dataset IDGLWB_CN

TypeText

Unit400000

LanguageEnglish/Chinese

CountryN/A

Code Q&A Dataset

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training

Dataset IDDM_CNRD

TypeText

Unit12 million pairs

LanguageEnglish

CountryN/A

Handwritten text document OCR

More info

Dataset successfully added to the Quote List

Common Use CasesDocument Processing, Document Search, Text detection

Dataset IDIMG_OCR_Handwritten

TypeImage

Unit663 images

LanguageN/A

CountryN/A

Ukrainian (Ukraine) Pronunciation Dictionary

More info

Dataset successfully added to the Quote List

Common Use CasesASR, TTS, Language Modelling

Dataset IDukr_UKR_PHON

TypeText

Unit6,000 words

LanguageUkrainian

CountryUkraine

Off-the-shelf (OTS) Datasets

Arabic (MSA) Pronunciation Dictionary

Dataset successfully added to the Quote List

Business-to-business printed text document OCR

Dataset successfully added to the Quote List

Chinese and English related texts

Dataset successfully added to the Quote List

Code Q&A Dataset

Dataset successfully added to the Quote List

Handwritten text document OCR

Dataset successfully added to the Quote List

Ukrainian (Ukraine) Pronunciation Dictionary

Dataset successfully added to the Quote List

Get Started with Off-the-Shelf AI Training Datasets

Off-the-shelf (OTS) Datasets

Arabic (MSA) Pronunciation Dictionary

Dataset successfully added to the Quote List

Business-to-business printed text document OCR

Dataset successfully added to the Quote List

Chinese and English related texts

Dataset successfully added to the Quote List

Code Q&A Dataset

Dataset successfully added to the Quote List

Handwritten text document OCR

Dataset successfully added to the Quote List

Ukrainian (Ukraine) Pronunciation Dictionary

Dataset successfully added to the Quote List

Get Started with Off-the-Shelf AI Training Datasets

Get in touch