Off-the-Shelf AI Training Datasets

Arabic (UAE) printed text annotated OCR

More info

Dataset successfully added to the Quote List

Common Use CasesDocument Processing, Document Search, Text detection

Dataset IDIMG_OCR_ARU002_CN

TypeImage

Unit20000 images

LanguageArabic

CountryUnited Arab Emirates

Arabic NER news text

More info

Dataset successfully added to the Quote List

Common Use CasesNER, Content Classification, Search Engines

Dataset IDARB_NER001

TypeText

Unit20,774 sentences

LanguageArabic (Standard)

CountryN/A

Business-to-business printed text document OCR

More info

Dataset successfully added to the Quote List

Common Use CasesDocument Processing, Document Search, Text detection

Dataset IDIMG_OCR_B2B

TypeImage

Unit5,838 documents

LanguageN/A

CountryN/A

Chinese command and control prompt response corpus

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training, Command and Control, TV Player, Device Control

Dataset IDDSDH_corpus_CN

TypeText

Unit20000 sentences

LanguageChinese

CountryChina

Dutch (Netherlands & Belgium) scripted in-car

More info

Dataset successfully added to the Quote List

Common Use CasesASR, Virtual Assistant, In Car HMI & Entertainment

Dataset IDDutch and Flemish SpeechDat-Car

TypeAudio

Unit27 hours

LanguageDutch

CountryNetherland - Belgium

English (United States) Adversarial prompts for LLM red teaming in development

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training, LLM Red teaming

Dataset IDeng_USA_LLM002

TypeText

Unit500 prompts

LanguageEnglish

CountryUnited States

Arabic (UAE) printed text annotated OCR

Dataset successfully added to the Quote List

Arabic NER news text

Dataset successfully added to the Quote List

Business-to-business printed text document OCR

Dataset successfully added to the Quote List

Chinese command and control prompt response corpus

Dataset successfully added to the Quote List

Dutch (Netherlands & Belgium) scripted in-car

Dataset successfully added to the Quote List

English (United States) Adversarial prompts for LLM red teaming **in development**

Dataset successfully added to the Quote List

Get Started with Off-the-Shelf AI Training Datasets

English (United States) Adversarial prompts for LLM red teaming in development