Dataset ID:
GLOBALPHONE
Dataset Name:
GlobalPhone Multilingual Text & Speech Database
Common Use Cases:
ASR, Language Identification, Multilingual Speech Synthesis, Virtual Assistant, Chatbot
Language:
N/A
Country:
Global coverage
Language Code:
N/A
Country Code:
N/A
Product Type
Audio
Detailed Product Type
Scripted Speech
Unit
450 hours
Recording Device
Microphone
Recording Condition
Mixed (quiet home/office, public, outdoor)
Contributors
1942
Utterances
169,755
Unique Words
Available on request
Sample Rate (kHz):
16
Channels
1
Data Format
wav
Source
GlobalPhone
Additional Info:
- Global Phone multilingual corpus, languages can be sold separately or in multi-language packages. Tiered package pricing available.
- GLOBALPHONE provides multilingual speech and text data in 20 Languages: Arabic, Bulgarian, Chinese-Mandarin, Chinese-Shanghai, Croatian, Czech, French, German, Hausa, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, Tamil, Thai, Turkish, and Vietnamese.
- Dataset is fully transcribed and the transcription is available both in original script and in Romanized form
- In each language, news article sentences were read by about 100 native speakers. The articles cover national and international political news, as well as economic news from 1995-2011. The speech is available in 16bit, 16kHz mono quality recorded with a close-speaking microphone and the same recording equipment was used for all languages.
- Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Year of Collection
1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
Get Started with Off-the-Shelf AI Training Datasets
Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.
Talk to an expert