# Model Card for Diva Llama 3 This is an ablation of our Distilled Voice Assistant (DiVA) model which can handle speech and text as inputs. This ablation is trained using only token-alignment loss as described in the ablations here: https://huggingface.co/papers/2410.02678 Weights and Biases Run: https://wandb.ai/i18nlp/DiVA%20Training%20Runs/runs/4t0mvbcd?nw=nwuserheld ## Citation This is the token-alignment only model from https://huggingface.co/papers/2410.02678 **BibTeX:** ``` @misc{DiVA, title={{D}istilling an {E}nd-to-{E}nd {V}oice {A}ssistant {W}ithout {I}nstruction {T}raining {D}ata}, author={William Held and Ella Li and Michael Ryan and Weiyan Shi and Yanzhe Zhang and Diyi Yang}, year={2024}, eprint={2410.02678}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2410.02678}, } ``` ## Table of Contents - [Model Card for DiVA Llama 3](#model-card-for-DiVA-Llama-3) - [Citation](#citation) - [Table of Contents](#table-of-contents) - [Training Details](#training-details) - [Training Data](#training-data) - [Training Procedure](#training-procedure) - [Environmental Impact](#environmental-impact) - [Technical Specifications [optional]](#technical-specifications-optional) - [Model Architecture and Objective](#model-architecture-and-objective) - [Compute Infrastructure](#compute-infrastructure) - [Hardware](#hardware) - [Software](#software) - [Model Card Contact](#model-card-contact) ## Training Details ### Training Data This model was trained on the [CommonVoice](https://huggingface.co/datasets/mozilla-foundation/common_voice_16_1) corpus. ### Training Procedure This model was trained for 7k gradient steps with a batch size of 512 Recordings and a linearly decaying learning rate from 5e-5 to zero, with a linear warmup of 70 steps. ### Environmental Impact - **Hardware Type:** V4-32 TPU - **Hours used:** 8 Hours - **Cloud Provider:** Google Cloud. - **Compute Region:** US Central C ### Hardware This model was trained on at V4 TPU on Google Cloud. ### Software This model was trained with [Levanter](https://github.com/stanford-crfm/levanter) ## Model Card Authors [optional] Will Held ## Model Card Contact held@stanford.edu