lhl PRO
AI & ML interests
Articles
Organizations
leonardlin's activity
Tonight I wrote up a WandB report (the panel editor is super broken in Firefox 😔) that sums up some of the more interesting bits from the results: https://wandb.ai/augmxnt/train-bench/reports/torchtune-vs-axolotl-vs-unsloth-Trainer-Comparison--Vmlldzo4MzU3NTAx
I also have an accompanying model and dataset (and codebase) for those curious to poke around:
* augmxnt/Qwen2-7B-Instruct-deccp
* augmxnt/deccp
I'll just add that I'm sure it's spam now, that space is attached to another one of my models as well (and obviously not running either). Also the user's other space is straight out linking to something shady: https://huggingface.co/spaces/elseodelasgalletas/detector-de-ia (I can't report as I'm rate limited)
I mean, it's obviously not running my model (it's a brand new JA/EN ablation), so not sure why it'd be attached...
Also, I tested the new https://huggingface.co/DataPilot/ArrowPro-7B-KUJIRA model and it appears to be the real deal, very impressive performance, trained by a 15-yo (!) @Holy-fox - note that using my sampler settings detailed improved the score as well (as otherwise it suffered from looping errors as well).
I'll be aiming for beating that on the Llama 3 8B, and beating Command R Plus for the 70B in the coming days.
I'll just add a note on the sampler parameters for testing that I found improved performance for virtually every model I tested: temperature 0.2, min_p 0.1, frequency_penalty 0.5 (a frequency/repetition penalty is required to minimize looping errors that otherwise creep into most of these models)
gpt-3.5-turbo-0125
's JA performance, which is worth noting, and is tuned *exclusively* with the old shisa-v1 dataset (so it's chart position will be very short lived).shisa-ai/shisa-v1-llama3-70b
augmxnt/ultra-orca-boros-en-ja-v1
I've setup a fork of Lightblue's Shaberi testing framework which uses LLM-as-a-Judge style benchmarks as something probably more representative of real world LLM strength in Japanese. Here's how the new base model ablations are looking:
Here's also a simple script for checking what the output looks like:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("augmxnt/shisa-7b-v1")
messages = [
{'role': 'user', 'content': 'This is the first user input.'},
{'role': 'assistant', 'content': 'This is the first assistant response.'},
{'role': 'user', 'content': 'This is the second user input.'},
]
print()
print('Chat Template:')
print(tokenizer.chat_template)
print()
print('---')
print()
print(tokenizer.apply_chat_template(messages, tokenize=False))
BTW, I was trying to get a tree on https://huggingface.co/mlabonne/AlphaMonarch-7B and it was getting caught in a recursion loop. I started first by adding caching on the ModelCard assuming it'd figure things out but it didn't and I hacked in some stuff preventing revisits (also added some weak handling for missing models since that was looping as well since AIDC-ai-business/Marcoroni-7B-v3 for example has disappeared).
Anyway, my updated code still has broken chart rendering (cyclic graph - what was causing the looping issues) but at least it will get a list of the model lineage, which was good enough for my purposes... In case anyone wants to move this forward or needs a reference in case they run into looping issues: https://colab.research.google.com/drive/1-7w_pPWPCCQQpQ7LrvlKIdhyHsoCHH4E?usp=sharing