Dataset ID:
DM_CNRD
Dataset Name:
Code Q&A Dataset
Common Use Cases:
LLM training
Language:
English
Country:
N/A
Language Code:
eng
Country Code:
N/A
Product Type
Text
Detailed Product Type
LLM training
Unit
12 million pairs
Recording Device
N/A
Recording Condition
N/A
Contributors
N/A
Utterances
Available upon request
Unique Words
Available upon request
Sample Rate (kHz):
N/A
Channels
N/A
Data Format
json
Source
Appen China
Additional Info:
- This is a text dataset of coding questions and answers in English, sourced through web-spidering with subsequent clean up and filtering. Programming languages include: JavaScript, Python, Java, C#, PHP, C++, SQL, R, C, Swift. Topics include: computer, scientific research technology, wholesale and retail, finance, entertainment and other industries
Year of Collection
2024
Get Started with Off-the-Shelf AI Training Datasets
Appen’s extensive catalog of off-the-shelf (OTS) datasets spans multiple data types and industries, providing comprehensive coverage for various AI applications. These datasets are crafted to the highest standards of quality and accuracy, ensuring reliable training data for AI models.
Talk to an expert