Create R1-Responses to go from Unstructured to Structured

#7
R1-Responses to go from Unstructured to Structured ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # MasterControl AIML Team πŸš€
6
+
7
+ ## Overview
8
+ The **MasterControl AIML team** supports the **Hugging Face** initiative of re-creating **DeepSeek R1 training**, recognizing it as one of the most **impactful open-source projects** today.
9
+
10
+ We aim to contribute to **reasoning datasets**, specifically those where:
11
+ - A **real-world problem** involves **generating complex structured output**
12
+ - It is accompanied by **step-by-step reasoning and unstructured input**
13
+
14
+ ## Challenges in Integrating Generative AI into Systems of Record (SoR)
15
+ Integrating **Generative AI** into **Systems of Record (SoR)** for **health, life sciences, and manufacturing quality** is challenging due to:
16
+ - These systems rely on **strictly structured data formats** (e.g., **JSON, XML, templates**).
17
+ - **LLM outputs** are **unstructured** and do not conform to **regular expressions** or **context-free grammars**.
18
+
19
+ ## Techniques for Structured Output Generation
20
+ To enforce structured output generation, we explore:
21
+ - **Strict schema prompting**
22
+ - **Post-processing and output validation**
23
+ - **Reformulating text generation** into transitions between **finite-state machine states**
24
+
25
+ ## DeepSeek R1 Approach
26
+ A key challenge is **fitting hybrid structured and unstructured historical manufacturing production records** to **master templates**.
27
+ We aim to leverage the **DeepSeek R1 model**, which uses:
28
+ - **Pure reinforcement learning** to train a **base language model**
29
+ - Learning to **reason without human supervision**
30
+
31
+ ## Model Used
32
+ - We used **Deepseek's Distilled 7b** to creating reasoning responses to go from unstructured to structured.
33
+
34
+ ## Purpose of Reasoning Responses
35
+ - The reasoning responses are created in such a way, that if the model is presented with **unstructured text and a schema of rules, it needs to convert it into a structured schema**. These responses can be used for any unstructured to structured creation.
36
+
37
+ ## Next Step: Reasoning Data
38
+ Our **first step** is curating and contributing to **reasoning datasets** that facilitate structured output generation.
39
+
40
+ ## HF-Link
41
+ **https://huggingface.co/datasets/MasterControlAIML/R1-Reasoning-Unstructured-To-Structured**
42
+
43
+ ---