Dataset ID:
MTQ_CN
Dataset Name:
Chinese multidisciplinary test questions corpus
Common Use Cases:
LLM training
Language:
Chinese
Country:
China
Language Code:
cmn
Country Code:
CHN
Product Type
Text
Detailed Product Type
LLM training
Unit
319970 sentences
Recording Device
N/A
Recording Condition
N/A
Contributors
N/A
Utterances
N/A
Unique Words
N/A
Sample Rate (kHz):
N/A
Channels
1
Data Format
json
Source
Appen China
Additional Info:
- Corpus containing 8 sections of middle-high school prompt response pairs with metadata Subject, Grade, Knowledge Area, Question Type, Question, Answer, Difficulty. Question categories included are:
- Geography - 30k sentences (DLT001_CN);
- Chemistry - 40k sentences (HXT001_CN);
- History - 40k sentences (LST001_CN:);
- Biology - 40k sentences (SWT001_CN);
- Math - 30k sentences (SXT001_CN);
- Physics - 40k sentences (WLT001_CN);
- Chinese language - 10k sentences (YWT001_CN);
- Political - 40k sentences (ZZT001_CN)
Year of Collection
2024
Get Started with Off-the-Shelf AI Training Datasets
Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.
Talk to an expert