Off-the-Shelf AI Training Datasets

Arabic NER news text

More info

Dataset successfully added to the Quote List

Common Use CasesNER, Content Classification, Search Engines

Dataset IDARB_NER001

TypeText

Unit20,774 sentences

LanguageArabic (Standard)

CountryN/A

Chinese command and control prompt response corpus

More info

Dataset successfully added to the Quote List

Common Use CasesLLM training, Command and Control, TV Player, Device Control

Dataset IDDSDH_corpus_CN

TypeText

Unit20000 sentences

LanguageChinese

CountryChina

Dari (Afghanistan) broadcast

More info

Dataset successfully added to the Quote List

Common Use CasesASR, Automatic Captioning, Keyword Spotting

Dataset IDDAR_BRC001

TypeAudio

Unit49 hours

LanguageDari

CountryAfghanistan

Dutch (Netherlands & Belgium) scripted in-car

More info

Dataset successfully added to the Quote List

Common Use CasesASR, Virtual Assistant, In Car HMI & Entertainment

Dataset IDDutch and Flemish SpeechDat-Car

TypeAudio

Unit27 hours

LanguageDutch

CountryNetherland - Belgium

English (United States) product labels in development

More info

Dataset successfully added to the Quote List

Common Use CasesImage recognition, Object recognition, Retail

Dataset IDIMG_OCR_USE_ProductLabels

TypeImage

Unit60000 images

LanguageEnglish

CountryUnited States

English (United States) Ultra High-Volume labeled speech

More info

Dataset successfully added to the Quote List

Common Use CasesASR, Conversational AI, Speech Analytics, Automatic Captioning, In Car HMI & Entertainment, Virtual Assistant

Dataset IDUSE_UHV001

TypeAudio

Unit1196 hours

LanguageEnglish

CountryUnited States

Arabic NER news text

Dataset successfully added to the Quote List

Chinese command and control prompt response corpus

Dataset successfully added to the Quote List

Dari (Afghanistan) broadcast

Dataset successfully added to the Quote List

Dutch (Netherlands & Belgium) scripted in-car

Dataset successfully added to the Quote List

English (United States) product labels **in development**

Dataset successfully added to the Quote List

English (United States) Ultra High-Volume labeled speech

Dataset successfully added to the Quote List

Get Started with Off-the-Shelf AI Training Datasets

English (United States) product labels in development