Chinese multidisciplinary test questions corpus

Name: Chinese multidisciplinary test questions corpus
SKU: 1700002963a4
Availability: InStock

Dataset successfully added to the Quote List

Dataset ID:

MTQ_CN

Dataset Name:

Chinese multidisciplinary test questions corpus

Common Use Cases:

LLM training

Language:

Chinese

Country:

China

Language Code:

cmn

Country Code:

CHN

Product Type

Text

Detailed Product Type

LLM training

Unit

319970 sentences

Recording Device

N/A

Recording Condition

N/A

Contributors

N/A

Utterances

N/A

Unique Words

N/A

Sample Rate (kHz):

N/A

Channels

Data Format

json

Source

Appen China

Additional Info:

Corpus containing 8 sections of middle-high school prompt response pairs with metadata Subject, Grade, Knowledge Area, Question Type, Question, Answer, Difficulty. Question categories included are:
Geography - 30k sentences (DLT001_CN);
Chemistry - 40k sentences (HXT001_CN);
History - 40k sentences (LST001_CN:);
Biology - 40k sentences (SWT001_CN);
Math - 30k sentences (SXT001_CN);
Physics - 40k sentences (WLT001_CN);
Chinese language - 10k sentences (YWT001_CN);
Political - 40k sentences (ZZT001_CN)

Year of Collection

2024

Get Started with Off-the-Shelf AI Training Datasets

Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.

Talk to an expert