Filters
Search
Product type
Language
Country
Year of Collection

English (United States) Harmful and harmless prompts and responses **in development**

More info
Common Use CasesLLM training, LLM Red teaming, Chatbot
Dataset IDeng_USA_LLM001
TypeText
Unit300 prompts
LanguageEnglish
CountryUnited States

English (United States) Ultra High-Volume labeled speech

More info
Common Use CasesASR, Conversational AI, Speech Analytics, Automatic Captioning, In Car HMI & Entertainment, Virtual Assistant
Dataset IDUSE_UHV001
TypeAudio
Unit1196 hours
LanguageEnglish
CountryUnited States

English Inverse text normalisation

More info
Common Use CasesASR, Language Modelling, Closed Captioning
Dataset IDENG_ITN001
TypeText
Unit4454 test cases
LanguageEnglish
CountryN/A

English NER news text

More info
Common Use CasesNER, Content Classification, Search Engines
Dataset IDENG_NER001
TypeText
Unit22,768 sentences
LanguageEnglish
CountryN/A

Farsi/Persian NER news text

More info
Common Use CasesNER, Content Classification, Search Engines
Dataset IDFAR_NER001
TypeText
Unit19,584 sentences
LanguageIranian Persian
CountryIran

Finnish (Finland) printed text OCR

More info
Common Use CasesDocument Processing, Document Search, Text detection
Dataset IDIMG_OCR_FIN_CN
TypeImage
Unit7293 images
LanguageFinnish
CountryFinland

Get Started with Off-the-Shelf AI Training Datasets

Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.

Talk to an expert