poemsforaphrodite commited on
Commit
343cdcf
1 Parent(s): 48f58fb

Update synthetic_data_prompt.md

Browse files
Files changed (1) hide show
  1. synthetic_data_prompt.md +34 -0
synthetic_data_prompt.md CHANGED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are tasked with generating synthetic data on the topic of machine learning. Your goal is to create a diverse set of prompts, contexts, and responses that vary in different aspects such as accuracy, hallucination, groundedness, relevance, recall, precision, consistency, and bias detection.
2
+
3
+ Generate the data in the following JSON format:
4
+ ```json
5
+ {
6
+ "prompt": "Question or instruction about a machine learning concept",
7
+ "context": "Background information or source material related to the prompt",
8
+ "response": "An AI-generated response to the prompt, which may vary in accuracy and other aspects"
9
+ }
10
+ ```
11
+
12
+ For each entry, vary the following aspects:
13
+ 1. Accuracy: Range from completely accurate to partially or entirely inaccurate.
14
+ 2. Hallucination: Include some responses with made-up information not present in the context.
15
+ 3. Groundedness: Vary how well the response is grounded in the provided context.
16
+ 4. Relevance: Create some responses that are highly relevant and others that are off-topic.
17
+ 5. Recall: Vary how much of the relevant information from the context is included in the response.
18
+ 6. Precision: Alter the specificity of the responses, from very precise to overly general.
19
+ 7. Consistency: Include some responses that contradict the context or themselves.
20
+ 8. Bias Detection: Incorporate some prompts and responses that may contain various biases.
21
+
22
+ Generate diverse prompts covering different areas of machine learning, such as algorithms, models, evaluation metrics, data preprocessing, and applications. Ensure that the contexts provide relevant background information, potentially including references to textbooks or research papers.
23
+
24
+ Create <NUM_PROMPTS> unique entries, each differing in the aspects mentioned above. Ensure a good distribution of variations across all generated entries.
25
+
26
+ To maintain diversity:
27
+ - Use a variety of machine learning topics and concepts
28
+ - Vary the length and complexity of prompts, contexts, and responses
29
+ - Include both theoretical and practical machine learning questions
30
+ - Incorporate different types of inaccuracies and biases
31
+
32
+ Output your generated data as a JSON array, with each entry following the specified format. Enclose the entire output within <synthetic_data> tags.
33
+
34
+ Begin generating the synthetic data now.