Off-the-Shelf AI Training Datasets

Arabic (UAE) printed text annotated OCR

More info

Dataset successfully added to the Quote List

Common Use CasesDocument Processing, Document Search, Text detection

Dataset IDIMG_OCR_ARU002_CN

TypeImage

Unit20000 images

LanguageArabic

CountryUnited Arab Emirates

Baby crying audio

More info

Dataset successfully added to the Quote List

Common Use CasesBaby Monitor, Security & Other Consumer Applications

Dataset IDCRY_ASR001_CN

TypeAudio

Unit70 hours

LanguageN/A

CountryChina

Business-to-business printed text document OCR

More info

Dataset successfully added to the Quote List

Common Use CasesDocument Processing, Document Search, Text detection

Dataset IDIMG_OCR_B2B

TypeImage

Unit5,838 documents

LanguageN/A

CountryN/A

Chinese command and control prompt response corpus

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training, Command and Control, TV Player, Device Control

Dataset IDDSDH_corpus_CN

TypeText

Unit20000 sentences

LanguageChinese

CountryChina

Dari (Afghanistan) broadcast

More info

Dataset successfully added to the Quote List

Common Use CasesASR, Automatic Captioning, Keyword Spotting

Dataset IDDAR_BRC001

TypeAudio

Unit49 hours

LanguageDari

CountryAfghanistan

English (United States) receipts in development

More info

Dataset successfully added to the Quote List

Common Use CasesImage recognition, Object recognition, OCR, Text detection

Dataset IDIMG_OCR_USE_RECEIPTS

TypeImage

Unit4500 images

LanguageEnglish

CountryUnited States

Arabic (UAE) printed text annotated OCR

Dataset successfully added to the Quote List

Baby crying audio

Dataset successfully added to the Quote List

Business-to-business printed text document OCR

Dataset successfully added to the Quote List

Chinese command and control prompt response corpus

Dataset successfully added to the Quote List

Dari (Afghanistan) broadcast

Dataset successfully added to the Quote List

English (United States) receipts **in development**

Dataset successfully added to the Quote List

Get Started with Off-the-Shelf AI Training Datasets

English (United States) receipts in development