Kalle Hilsenbek's picture

Kalle Hilsenbek

Bachstelze

·

https://bachstelze.gitlab.io/multisource/

Bachstelze

AI & ML interests

Combining BERT with instructions for explainable AI: gitlab.com/Bachstelze/instructionbert

Recent Activity

updated a dataset 1 minute ago

Bachstelze/BabyLM-10M-2025-shuffled

updated a model 2 days ago

Bachstelze/smolSynformer

updated a model 2 days ago

Bachstelze/smolSynformerPeft

View all activity

Organizations

None yet

Bachstelze's activity

updated a dataset 1 minute ago

Bachstelze/BabyLM-10M-2025-shuffled

Viewer • Updated 1 minute ago • 171k • 10

updated 2 models 2 days ago

Bachstelze/smolSynformer

Updated 2 days ago • 201

Bachstelze/smolSynformerPeft

Updated 2 days ago

published a model 2 days ago

Bachstelze/smolSynformerPeft

Updated 2 days ago

published a dataset 3 days ago

Bachstelze/BabyLM-10M-2025-shuffled

Viewer • Updated 1 minute ago • 171k • 10

commented a paper 5 days ago

A Refined Analysis of Massive Activations in LLMs

Paper • 2503.22329 • Published 17 days ago • 14 •

upvoted a paper 5 days ago

A Refined Analysis of Massive Activations in LLMs

Paper • 2503.22329 • Published 17 days ago • 14

published a model 5 days ago

Bachstelze/smolSynformer

Updated 2 days ago • 201

updated a dataset 9 days ago

Bachstelze/GEC_CoT_explanation

Preview • Updated 9 days ago • 19

published a dataset 9 days ago

Bachstelze/GEC_CoT_explanation

Preview • Updated 9 days ago • 19

updated a dataset 9 days ago

Bachstelze/PAWS_CoT_explanation

Preview • Updated 9 days ago • 18

published a dataset 9 days ago

Bachstelze/PAWS_CoT_explanation

Preview • Updated 9 days ago • 18

upvoted a paper 20 days ago

V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms

Paper • 2503.17422 • Published 25 days ago • 5

upvoted a paper about 2 months ago

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

Paper • 2502.11196 • Published Feb 16 • 22

commented a paper about 2 months ago

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

Paper • 2502.11196 • Published Feb 16 • 22 •

commented on Announcing AI Energy Score Ratings 2 months ago

Thanks for your effort in energy efficiency. You worked up my curiosity!
Why do smolLM-135m and smolLm-1.7B nearly have the same score besides a 10 times model size difference? Does the identical context size mostly cause it?
Could you please enable encoder-decoder models? They should be in theory more efficient because the input has to be encoded only once and can be reused in every decoding step.

commented a paper 3 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 120 •

commented on Is Attention Interpretable in Transformer-Based Large Language Models? Let’s Unpack the Hype 3 months ago

Good write-up, though it is missing the dominant attention sink in current decoder-only models:
https://colab.research.google.com/drive/1Fcgug4a6rv9F-Wej0rNveiM_SMNZOtrr?usp=sharing

upvoted an article 3 months ago

Article

Is Attention Interpretable in Transformer-Based Large Language Models? Let’s Unpack the Hype

By

•

Jan 28

• 4

New activity in answerdotai/ModernBERT-base 3 months ago

ModernBART wen?

#38 opened 3 months ago by