Spaces:

open-r1
/

README

Running

App Files Files Community

README / R1-Responses to go from Unstructured to Structured

bhaviktheslider

Create R1-Responses to go from Unstructured to Structured

10eeafd verified 3 months ago

raw

history blame

2.24 kB

	---
	license: apache-2.0
	---

	# MasterControl AIML Team 🚀

	## Overview
	The MasterControl AIML team supports the Hugging Face initiative of re-creating DeepSeek R1 training, recognizing it as one of the most impactful open-source projects today.

	We aim to contribute to reasoning datasets, specifically those where:
	- A real-world problem involves generating complex structured output
	- It is accompanied by step-by-step reasoning and unstructured input

	## Challenges in Integrating Generative AI into Systems of Record (SoR)
	Integrating Generative AI into Systems of Record (SoR) for health, life sciences, and manufacturing quality is challenging due to:
	- These systems rely on strictly structured data formats (e.g., JSON, XML, templates).
	- LLM outputs are unstructured and do not conform to regular expressions or context-free grammars.

	## Techniques for Structured Output Generation
	To enforce structured output generation, we explore:
	- Strict schema prompting
	- Post-processing and output validation
	- Reformulating text generation into transitions between finite-state machine states

	## DeepSeek R1 Approach
	A key challenge is fitting hybrid structured and unstructured historical manufacturing production records to master templates.
	We aim to leverage the DeepSeek R1 model, which uses:
	- Pure reinforcement learning to train a base language model
	- Learning to reason without human supervision

	## Model Used
	- We used Deepseek's Distilled 7b to creating reasoning responses to go from unstructured to structured.

	## Purpose of Reasoning Responses
	- The reasoning responses are created in such a way, that if the model is presented with unstructured text and a schema of rules, it needs to convert it into a structured schema. These responses can be used for any unstructured to structured creation.

	## Next Step: Reasoning Data
	Our first step is curating and contributing to reasoning datasets that facilitate structured output generation.

	## HF-Link
	https://huggingface.co/datasets/MasterControlAIML/R1-Reasoning-Unstructured-To-Structured

	---

	---
	license: apache-2.0
	---

	# MasterControl AIML Team 🚀

	## Overview
	The MasterControl AIML team supports the Hugging Face initiative of re-creating DeepSeek R1 training, recognizing it as one of the most impactful open-source projects today.

	We aim to contribute to reasoning datasets, specifically those where:
	- A real-world problem involves generating complex structured output
	- It is accompanied by step-by-step reasoning and unstructured input

	## Challenges in Integrating Generative AI into Systems of Record (SoR)
	Integrating Generative AI into Systems of Record (SoR) for health, life sciences, and manufacturing quality is challenging due to:
	- These systems rely on strictly structured data formats (e.g., JSON, XML, templates).
	- LLM outputs are unstructured and do not conform to regular expressions or context-free grammars.

	## Techniques for Structured Output Generation
	To enforce structured output generation, we explore:
	- Strict schema prompting
	- Post-processing and output validation
	- Reformulating text generation into transitions between finite-state machine states

	## DeepSeek R1 Approach
	A key challenge is fitting hybrid structured and unstructured historical manufacturing production records to master templates.
	We aim to leverage the DeepSeek R1 model, which uses:
	- Pure reinforcement learning to train a base language model
	- Learning to reason without human supervision

	## Model Used
	- We used Deepseek's Distilled 7b to creating reasoning responses to go from unstructured to structured.

	## Purpose of Reasoning Responses
	- The reasoning responses are created in such a way, that if the model is presented with unstructured text and a schema of rules, it needs to convert it into a structured schema. These responses can be used for any unstructured to structured creation.

	## Next Step: Reasoning Data
	Our first step is curating and contributing to reasoning datasets that facilitate structured output generation.

	## HF-Link
	https://huggingface.co/datasets/MasterControlAIML/R1-Reasoning-Unstructured-To-Structured

	---