Off-the-shelf (OTS) Datasets

Croatian (Croatia) scripted microphone

Dataset ID:
CRO_ASR002
Dataset Name:
Croatian (Croatia) scripted microphone
Common Use Cases:
ASR, Virtual Assistant, Chatbot
Language:
Croatian
Country:
Croatia
Language Code:
hrv
Country Code:
HRV
Product Type
Audio
Detailed Product Type
Scripted Speech
Unit
11 hours
Recording Device
Microphone
Recording Condition
Mixed (quiet home/office, public, outdoor)
Contributors
94
Utterances
4,499
Unique Words
23,929
Sample Rate (kHz):
16
Channels
1
Data Format
wav
Source
GlobalPhone
Additional Info:
  • Part of a multilingual corpus; tiered package prices available with purchase of multiple Global Phone languages or the full corpus
  • Dataset is fully transcribed and the transcription is available both in original script and in Romanized form
  • Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web to cover a wide domain with large vocabulary
  • Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Year of Collection
1996, 1997

Get Started with Off-the-Shelf AI Training Datasets

Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.

Talk to an expert