Spaces:

open-r1
/

README

Running

App Files Files Community

Create R1-Responses to go from Unstructured to Structured

by bhaviktheslider - opened Jan 31

base: refs/heads/main

←

from: refs/pr/7

Discussion Files changed

+43

-0

Files changed (1) hide show

R1-Responses to go from Unstructured to Structured +43 -0

R1-Responses to go from Unstructured to Structured ADDED Viewed

	@@ -0,0 +1,43 @@

+---
+license: apache-2.0
+---
+# MasterControl AIML Team 🚀
+## Overview
+The **MasterControl AIML team** supports the **Hugging Face** initiative of re-creating **DeepSeek R1 training**, recognizing it as one of the most **impactful open-source projects** today.
+We aim to contribute to **reasoning datasets**, specifically those where:
+- A **real-world problem** involves **generating complex structured output**
+- It is accompanied by **step-by-step reasoning and unstructured input**
+## Challenges in Integrating Generative AI into Systems of Record (SoR)
+Integrating **Generative AI** into **Systems of Record (SoR)** for **health, life sciences, and manufacturing quality** is challenging due to:
+- These systems rely on **strictly structured data formats** (e.g., **JSON, XML, templates**).
+- **LLM outputs** are **unstructured** and do not conform to **regular expressions** or **context-free grammars**.
+## Techniques for Structured Output Generation
+To enforce structured output generation, we explore:
+- **Strict schema prompting**
+- **Post-processing and output validation**
+- **Reformulating text generation** into transitions between **finite-state machine states**
+## DeepSeek R1 Approach
+A key challenge is **fitting hybrid structured and unstructured historical manufacturing production records** to **master templates**.
+We aim to leverage the **DeepSeek R1 model**, which uses:
+- **Pure reinforcement learning** to train a **base language model**
+- Learning to **reason without human supervision**
+## Model Used
+- We used **Deepseek's Distilled 7b** to creating reasoning responses to go from unstructured to structured.
+## Purpose of Reasoning Responses
+- The reasoning responses are created in such a way, that if the model is presented with **unstructured text and a schema of rules, it needs to convert it into a structured schema**. These responses can be used for any unstructured to structured creation.
+## Next Step: Reasoning Data
+Our **first step** is curating and contributing to **reasoning datasets** that facilitate structured output generation.
+## HF-Link
+**https://huggingface.co/datasets/MasterControlAIML/R1-Reasoning-Unstructured-To-Structured**
+---