Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨, xLSTM

Recent Activity

upvoted a collection 5 days ago

liked a model 6 days ago

google/gemma-4-12B-it

commentedon a paper 6 days ago

KletterMix: Climbing Toward High-Quality German Pretraining Data

View all activity

Organizations

commented 2 papers 6 days ago

KletterMix: Climbing Toward High-Quality German Pretraining Data

Paper • 2606.03773 • Published 7 days ago • 17 •

KletterMix: Climbing Toward High-Quality German Pretraining Data

Paper • 2606.03773 • Published 7 days ago • 17 •

commented 2 papers 11 days ago

GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German

Paper • 2605.30214 • Published 13 days ago • 1 •

LLMSurgeon: Diagnosing Data Mixture of Large Language Models

Paper • 2605.30348 • Published 13 days ago • 1 •

New activity in openeurollm/Dolci-Instruct-SFT-translated 20 days ago

More information about translation process

#2 opened 20 days ago by

commented 3 papers 27 days ago

BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs

Paper • 2604.02045 • Published Apr 2 • 38 •

BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs

Paper • 2604.02045 • Published Apr 2 • 38 •

A Causal Language Modeling Detour Improves Encoder Continued Pretraining

Paper • 2605.12438 • Published 29 days ago • 7 •

commented a paper about 1 month ago

On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

Paper • 1511.09249 • Published Nov 30, 2015 • 1 •

commented a paper about 2 months ago

Decoding Text Spans for Efficient and Accurate Named-Entity Recognition

Paper • 2604.20447 • Published Apr 22 • 2 •

commented 2 papers 3 months ago

Effective Distillation to Hybrid xLSTM Architectures

Paper • 2603.15590 • Published Mar 16 • 34 •

TildeOpen LLM: Leveraging Curriculum Learning to Achieve Equitable Language Representation

Paper • 2603.08182 • Published Mar 9 • 2 •

New activity in stefan-it/Groundsource 3 months ago

Help Addding New Metadata!

#1 opened 3 months ago by

commented 4 papers 4 months ago

Avey-B

Paper • 2602.15814 • Published Feb 17 • 3 •

Avey-B

Paper • 2602.15814 • Published Feb 17 • 3 •

Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning

Paper • 2602.11149 • Published Feb 11 • 18 •

FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale

Paper • 2601.22146 • Published Jan 29 • 12 •

commented 2 papers 6 months ago

Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published Dec 17, 2025 • 18 •

Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining

Paper • 2511.21613 • Published Nov 26, 2025 • 2 •

New activity in pdelobelle/baguettotron-nl 7 months ago

Fine-Tuning example

#1 opened 7 months ago by