Spaces:
Running
Running
--- | |
license: apache-2.0 | |
--- | |
# MasterControl AIML Team π | |
## Overview | |
The **MasterControl AIML team** supports the **Hugging Face** initiative of re-creating **DeepSeek R1 training**, recognizing it as one of the most **impactful open-source projects** today. | |
We aim to contribute to **reasoning datasets**, specifically those where: | |
- A **real-world problem** involves **generating complex structured output** | |
- It is accompanied by **step-by-step reasoning and unstructured input** | |
## Challenges in Integrating Generative AI into Systems of Record (SoR) | |
Integrating **Generative AI** into **Systems of Record (SoR)** for **health, life sciences, and manufacturing quality** is challenging due to: | |
- These systems rely on **strictly structured data formats** (e.g., **JSON, XML, templates**). | |
- **LLM outputs** are **unstructured** and do not conform to **regular expressions** or **context-free grammars**. | |
## Techniques for Structured Output Generation | |
To enforce structured output generation, we explore: | |
- **Strict schema prompting** | |
- **Post-processing and output validation** | |
- **Reformulating text generation** into transitions between **finite-state machine states** | |
## DeepSeek R1 Approach | |
A key challenge is **fitting hybrid structured and unstructured historical manufacturing production records** to **master templates**. | |
We aim to leverage the **DeepSeek R1 model**, which uses: | |
- **Pure reinforcement learning** to train a **base language model** | |
- Learning to **reason without human supervision** | |
## Model Used | |
- We used **Deepseek's Distilled 7b** to creating reasoning responses to go from unstructured to structured. | |
## Purpose of Reasoning Responses | |
- The reasoning responses are created in such a way, that if the model is presented with **unstructured text and a schema of rules, it needs to convert it into a structured schema**. These responses can be used for any unstructured to structured creation. | |
## Next Step: Reasoning Data | |
Our **first step** is curating and contributing to **reasoning datasets** that facilitate structured output generation. | |
## HF-Link | |
**https://huggingface.co/datasets/MasterControlAIML/R1-Reasoning-Unstructured-To-Structured** | |
--- |