1722 170 72

Stefan Schweter PRO

stefan-it

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models

Recent Activity

commented on a paper 7 days ago

Evaluating the Quality of Benchmark Datasets for Low-Resource Languages: A Case Study on Turkish

commented on a paper 8 days ago

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance

upvoted a paper 8 days ago

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance

View all activity

Organizations

stefan-it's activity

commented a paper 7 days ago

Evaluating the Quality of Benchmark Datasets for Low-Resource Languages: A Case Study on Turkish

Paper • 2504.09714 • Published 8 days ago •

commented a paper 8 days ago

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance

Paper • 2504.08716 • Published 11 days ago • 9 •

upvoted a paper 8 days ago

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance

Paper • 2504.08716 • Published 11 days ago • 9

upvoted a paper 12 days ago

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Paper • 2504.07096 • Published 12 days ago • 72

upvoted a paper 13 days ago

Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation

Paper • 2504.06225 • Published 14 days ago • 1

reacted to jsulz's post with 🔥 14 days ago

Post

3640

Huge week for

xet-team as Llama 4 is the first major model on Hugging Face uploaded with Xet providing the backing! Every byte downloaded comes through our infrastructure.

Using Xet on Hugging Face is the fastest way to download and iterate on open source models and we've proved it with Llama 4 giving a boost of ~25% across all models.

We expect builders on the Hub to see even more improvements, helping power innovation across the community.

With the models on our infrastructure, we can peer in and see how well our dedupe performs across the Llama 4 family. On average, we're seeing ~25% dedupe, providing huge savings to the community who iterate on these state-of-the-art models. The attached image shows a few selected models and how they perform on Xet.

Thanks to the

meta-llama team for launching on Xet!

upvoted a paper 15 days ago

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Paper • 2504.03624 • Published 18 days ago • 13

upvoted 3 papers 18 days ago

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Paper • 2504.00178 • Published 21 days ago • 1

Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical Documents

Paper • 2504.00414 • Published 21 days ago • 1

Overcoming Vocabulary Constraints with Pixel-level Fallback

Paper • 2504.02122 • Published 19 days ago • 2

liked a dataset 20 days ago

huggingface-legal/takedown-notices

Viewer • Updated 4 days ago • 31 • 1.3k • 23

posted an update 23 days ago

Post

2236

Wohoo 🥳 I have finished my 2025 GPU workstation build and I am very excited to train new awesome open source models on it.

I built my last GPU workstation 5 years ago featuring an AMD Ryzen 5900X, 64GB of G.SKILL Trident Z RGB on an ASRock X570 Taichi cooled by an Alphacool Eisbär 420. GPU was a Zotac RTX 3090 AMP Extreme. Unfortunately, I was never satisfied with the case - some Fractal Define 7, as it is definitely too small, airflow is not optimal as I had to open the front door all the time and it also arrived with a partly damaged side panel.

For my new build, I've used the following components: an outstanding new AMD Ryzen 9950X3D with 64GB of Corsair Dominator Titanium (what a name). As a huge Noctua fan - warm greetings to my Austrian neighbors - I am using the brand new Noctua NH-D15 G2 on an ASRock X870E Taichi in an amazing Lian Li LANCOOL III chassis. One joke that only NVIDIA Blackwell users will understand: you definitely need a tempered glass panel to check if your GPU cables/connectors start melting 😂 And the best is yet to come: I returned my previously bought Zotac RTX 5090 Solid to the eBay seller (because of... missing ROPs, only NVIDIA Blackwell users will again understand) and bought a Zotac 5090 AMP Extreme INFINITY (yes, the long name indicates that this is the flagship model from Zotac) from a more trustworthy source (NBB in Germany).

I am so happy to start training and fine-tuning new open source models - stay tuned!!!

2 replies

upvoted a collection 26 days ago

E3C-Projected

Collection

This collection contains the projected datasets of English layer one of e3c into Greek, Italian, Polish, Slovak, and Slovenian • 11 items • Updated Jan 8 • 1

reacted to wassemgtk's post with ❤️ 27 days ago

Post

2082

For fun, a new project: SuperTokenizer! A BPE tokenizer trained on C4 to beat GPT-4. Byte-level, A100-powered, and open-source. Messing around with tokens!
https://github.com/wassemgtk/SuperTokenizer