Blackroot
/

Differential-Microllama-Base

Model card Files Files and versions Community

Blackroot commited on 20 days ago

Commit

aa6c58d

·

verified ·

1 Parent(s): ceae081

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 Test network using differential attention instead of classical attention. Other than some alterations to the attention, this is otherwise the same configuration as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
 # Scripts:
-`inference.py` to run the model with some test prompts
-`test_train.py` runs with the exact configurations used to train this model and is the reproduction script. Data is assumed to be in JSONL format with `"text":"example text", "text":"..."`
 # Notes:
 Compared to the control model of Smollm2, this is bordering on incoherent. Potentially this model size is too small to correctly leverage differential attention. It's clearly picked up on some ideas in language, but is generally worse than the control model using GQA in terms of human output.

 Test network using differential attention instead of classical attention. Other than some alterations to the attention, this is otherwise the same configuration as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
 # Scripts:
+- `inference.py` to run the model with some test prompts
+- `test_train.py` runs with the exact configurations used to train this model and is the reproduction script. Data is assumed to be in JSONL format with `"text":"example text", "text":"..."`
 # Notes:
 Compared to the control model of Smollm2, this is bordering on incoherent. Potentially this model size is too small to correctly leverage differential attention. It's clearly picked up on some ideas in language, but is generally worse than the control model using GQA in terms of human output.