Qwen2.5-1.5B-Instruct-QAGenerator

Qwen2.5-1.5B-Instruct-QAGenerator is a small Qwen model designed to generate questions and answers from provided text, transforming free text into a format suitable for training specialized language models.

Intended Use

This model is intended for:

  • Generating question-answer pairs from non-fiction descriptive text.
  • Creating training data for specialized language models.
  • Assisting in educational content and comprehension.

Input Format

For best results, ensure the input text is clean English without newlines or unusual formatting. The input must be wrapped in the following format:

You are an AI designed to generate questions and answers for the provided text in JSON format [{"question": ___, "answer": ___}, ...].
You may only refer to the information given in the text. Only write your questions and answers. Nothing else.
<text>Insert your text here.</text>

Example

Input

This text from United Nations - Big Data for Sustainable Development wrapped in the above prompt format.

You are an AI designed to generate questions and answers for the provided text in JSON format [{"question": ___, "answer": ___}, ...].
You may only refer to the information given in the text. Only write your questions and answers. Nothing else.
<text>The volume of data in the world is increasing exponentially. In 2020, 64.2 zettabytes of data were created, that is a 314 percent increase from 2015. An increased demand for information due to the COVID-19 pandemics also contribute to higher-than-expected growth. A large share of this output is “data exhaust,” or passively collected data deriving from everyday interactions with digital products or services, including mobile phones, credit cards, and social media. This deluge of digital data is known as big data. Data is growing because it is increasingly being gathered by inexpensive and numerous information‐sensing, mobile devices and because the world’s capacity for storing information has roughly doubled every 40 months since the 1980s.</text>

Output

Pretty printed version

[
  {
    "question": "How has the volume of data grown over the years?",
    "answer": "The volume of data has grown exponentially, with 64.2 zettabytes created in 2020, a 314 percent increase from 2015."
  },
  {
    "question": "What are some sources of data output besides big data?",
    "answer": "Data output includes 'data exhaust', which is passively collected data from everyday interactions with digital products or services."
  },
  {
    "question": "Why is the data growth increasing and why is it called big data?",
    "answer": "The data growth is increasing because inexpensive and numerous information-sensing mobile devices are gathering data, and the world's capacity for storing information is doubling roughly every 40 months."
  },
  {
    "question": "When did the rate of data growth begin to increase, and how often has it doubled since?",
    "answer": "Data growth started to increase after the 1980s, and the rate has doubled approximately every 40 months."
  }
]

Limitations

  • Mainly for non-fiction descriptive text.
  • Not suitable for short, informal texts like chat logs or tweets.
  • May produce hallucinations despite efforts to minimize them.
  • Questions and answers have a general focus that may or may not align with specific user needs.

Ethical Considerations

Users should be aware of potential biases in the training data and use the generated content responsibly to avoid misinformation.

Downloads last month
10
Safetensors
Model size
1.54B params
Tensor type
BF16
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for agentlans/Qwen2.5-1.5B-Instruct-QAGenerator

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(72)
this model

Dataset used to train agentlans/Qwen2.5-1.5B-Instruct-QAGenerator