QuantFactory
/

bueble-lm-2b-GGUF

+---
+language:
+- de
+tags:
+- german
+- causal-lm
+- text-generation
+library_name: transformers
+pipeline_tag: text-generation
+license: apache-2.0
+---
+[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
+# QuantFactory/bueble-lm-2b-GGUF
+This is quantized version of [flair/bueble-lm-2b](https://huggingface.co/flair/bueble-lm-2b) created using llama.cpp
+# Original Model Card
+# BübleLM
+<div align="center" style="margin-bottom: 2rem; margin-top: 2rem">
+    <img src="https://pieter.ai/resources/buble-logo.png" alt="BübleLM Logo" style="max-height: 450px; width: auto;"/>
+    <h1 style="margin-top: 1rem;">BübleLM</h1>
+    <p><em>A small German LM</em></p>
+</div>
+BübleLM is a German language model based on Gemma-2-2B, adapted using [trans-tokenization](https://pieter.ai/trans-tokenization/) with a custom German SentencePiece tokenizer. The model demonstrates how language-specific tokenization can significantly improve performance while maintaining the base model's capabilities.
+## Model Details
+- **Architecture**: Based on Gemma-2B decoder-only architecture
+- **Parameters**: 2 billion
+- **Tokenizer**: Custom German SentencePiece tokenizer (20k vocabulary)
+  - Fertility rate: 1.78 tokens per word
+  - Optimized for German morphological structures
+  - Trained on the same corpus as the model
+- **Context Length**: 8192 tokens
+- **Training Hardware**: Single node with 4x NVidia A100-SXM4-80GB GPUs
+## Training Data
+Trained on 3.5B tokens from Occiglot-FineWeb project, including:
+- Contemporary web content (OSCAR 2015-2023)
+- Legislative documents (EurLex, ParlamInt)
+- News data (Tagesschau)
+- Wiki sources
+Data sampling weights:
+- Wikipedia: 4x
+- News/Parliamentary: 2x
+- Other sources: 1x
+## Performance
+Key improvements over Gemma-2-2B baseline:
+- HellaSwag-DE: +71% (47.9% vs 28.0%)
+- ARC-DE: +41% (32.3% vs 22.9%)
+- Average zero-shot: +40% (35.8% vs 25.5%)
+→ BübleLM-2B consistently outperforms both the base Gemma-2-2B and other German models like LLäMmlein-1B across most tasks.
+<table class="model-comparison">
+  <thead>
+    <tr>
+      <th align="left">Model</th>
+      <th align="center" colspan="2">ARC-DE</th>
+      <th align="center" colspan="2">HellaSwag-DE</th>
+      <th align="center">TruthfulQA-DE</th>
+      <th align="center">Average</th>
+    </tr>
+    <tr>
+      <th></th>
+      <th align="center">0-shot</th>
+      <th align="center">3-shot</th>
+      <th align="center">0-shot</th>
+      <th align="center">3-shot</th>
+      <th align="center">0-shot</th>
+      <th align="center">0-shot</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><a href="https://huggingface.co/google/gemma-2-2b" target="_blank">Gemma-2-2B</a></td>
+      <td align="center">22.9</td>
+      <td align="center">23.1</td>
+      <td align="center">28.0</td>
+      <td align="center">27.6</td>
+      <td align="center">25.5</td>
+      <td align="center">25.5</td>
+    </tr>
+    <tr>
+      <td><a href="https://huggingface.co/LSX-UniWue/LLaMmlein_120M" target="_blank">LLäMmlein-120M</a></td>
+      <td align="center">24.7 ↑+8%</td>
+      <td align="center">-</td>
+      <td align="center">32.0 ↑+14%</td>
+      <td align="center">-</td>
+      <td align="center">25.0 ↓-2%</td>
+      <td align="center">27.2 ↑+7%</td>
+    </tr>
+    <tr>
+      <td><a href="https://huggingface.co/LSX-UniWue/LLaMmlein_1B" target="_blank">LLäMmlein-1B</a></td>
+      <td align="center">30.0 ↑+31%</td>
+      <td align="center">-</td>
+      <td align="center"><strong>48.5</strong> ↑+73%</td>
+      <td align="center">-</td>
+      <td align="center">23.4 ↓-8%</td>
+      <td align="center">34.0 ↑+33%</td>
+    </tr>
+    <tr>
+      <td><a href="https://huggingface.co/VAGOsolutions/SauerkrautLM-Gemma-2b" target="_blank">Sauerkraut-Gemma-2B</a></td>
+      <td align="center">28.0 ↑+22%</td>
+      <td align="center">34.6 ↑+50%</td>
+      <td align="center">37.2 ↑+33%</td>
+      <td align="center">44.1 ↑+60%</td>
+      <td align="center"><strong>32.9</strong> ↑+29%</td>
+      <td align="center">32.7 ↑+28%</td>
+    </tr>
+    <tr>
+      <td><strong>BübleLM (Ours)</strong></td>
+      <td align="center"><strong>32.3</strong> ↑+41%</td>
+      <td align="center"><strong>35.2</strong> ↑+52%</td>
+      <td align="center">47.9 ↑+71%</td>
+      <td align="center"><strong>46.6</strong> ↑+69%</td>
+      <td align="center">27.2 ↑+7%</td>
+      <td align="center"><strong>35.8</strong> ↑+40%</td>
+    </tr>
+  </tbody>
+</table>
+*Performance evaluated on German versions of ARC (knowledge-based QA), HellaSwag (commonsense reasoning), and TruthfulQA (truthfulness). Values show accuracy in percentages, with arrows indicating relative improvement over Gemma-2B baseline. Best results shown in bold.*
+## Safety & Ethics
+### Toxicity
+- Perplexity: 52.97 on German TextDetox dataset
+- Toxic content appears more out-of-distribution compared to baseline
+### Gender Bias
+- Evaluated using perplexity differences between traditional and gender-inclusive forms
+- Slight preference for gender-inclusive language (not statistically significant)
+- Example: "Lehrer" vs "Lehrer*innen" (∆PPL = -9.61)
+## Usage
+**Note**: This is a base language model, not an instruction-tuned model. It is not optimized for chat or instruction following. For best results, use standard text completion rather than chat templates.
+Also make sure you have the sentencepiece tokenizer installed:
+```bash
+pip install sentencepiece
+```
+```python
+from transformers import pipeline
+pipe = pipeline("text-generation", model="flair/bueble-lm-2b")
+pipe("Ich bin")
+```
+Or with the full model api:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("flair/bueble-lm-2b")
+model = AutoModelForCausalLM.from_pretrained(
+    "flair/bueble-lm-2b",
+    device_map="auto",
+    torch_dtype=torch.bfloat16
+)
+# Basic text completion
+text = "Berlin ist eine Stadt, die"
+inputs = tokenizer(text, return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=256)
+print(tokenizer.decode(outputs[0]))
+```
+For instruction-tuning experiments or chat applications, we recommend fine-tuning the model first with appropriate German instruction datasets.
+## Limitations
+- Limited vocabulary size (20k tokens) compared to multilingual models (250k for Gemma)
+- Performance may vary on specialized domains not well-represented in training data
+- Higher fertility rate (1.78) due to smaller vocabulary size
+- Inherits base limitations from Gemma architecture
+## Citation
+```bibtex
+@article{delobelle2024buble,
+    title={BübleLM: A small German LM},
+    author={Delobelle, Pieter and Akbik, Alan and others},
+    year={2024}
+}
+```