I just shared a blogpost on https://nateraw.com explaining the motivation + process of training nateraw/musicgen-songstarter-v0.2 - including training details, WandB logs, hparams, and notes on previous experiments.
It'll take your voice and try to autotune it (because let's be real, you're no michael jackson), then pass it along to the model to condition on the melody. It works surprisingly well!
Misc models: 🦖T-Rex2, a very powerful object detection model for many applications https://github.com/IDEA-Research/T-Rex 👀 CT-RATE : A 3D dataset paired with text reports ibrahimhamamci/CT-RATE 🐙Octopus v2: a Gemma-based model trained for Android API - extremely fast, better than Llama+RAG, great results NexaAIDev/Octopus-v2
🌏Models and datasets around the world - Tess-70B, a MiQu-70B fine-tune with high-quality data migtissera/Tess-70B-v1.6 - UNI, a model trained on 100 million pathology images from 100k+ slides MahmoodLab/UNI - CONCH, a VLM trained on 1.17 million pathology image-text pairs MahmoodLab/CONCH
5. SpeechBrain 1.0: a toolkit with hundreds of recipes and pretrained models for audio-related tasks, such as speech recognition, diarization, and enhancement. New major release! HF repos: speechbrain Website: https://speechbrain.github.io/
The community has struggled to do a good preference-tune of Gemma, so the amazing @lewtun and @philschmid built an open-source recipe and trained a model to help people get started.
Some interesting details - Fine-tuned on DEITA and DPOed with Argilla DPO dataset - Very strong MT Bench results (7.81), better than Zephyr Beta (mistral based) and Gemma Instruct - Can run locally with tools such as llama.cpp on a Mac - Not so good AGIEval results compared to mistral-based tunes - All training code is open-sourced - Trained for 105 minutes on 8x H100 - No system message
Big kudos to the team! Super exciting to see a good fine-tune for Gemma
The paper shows an adversarial attack strategy in which a user sends malicious queries that can affect the output of other user queries from the same batch.
So if in the same batch we have - User A benign query - User B malicious query The response for A might be altered!😱
How is this possible? One approach is to fill the token buffers with adversarial data, hence forcing the gating to use the non-ideal experts or to entirely drop the bening tokens (in the case of finite limit size).
This assumes that the adversary can use the model as a black-box but can observe the logit outputs + ensure that the data is always grouped in the same batch.
How to mitigate this? - Randomize batch order (and even run twice if some queries are very sensitive) - Use a large capacity slack - Sample from gate weights instead of top-k (not great IMO, as that require more memory for inference)