library_name: transformers
license: apache-2.0
datasets:
- billion-word-benchmark/lm1b
Quick Start Guide
To use this pre-trained model with the HuggingFace APIs, use the following snippet:
from transformers import AutoModelForMaskedLM, AutoTokenizer
# See the `UDLM` collection page on the hub for list of available models.
tokenizer = transformers.AutoTokenizer.from_pretrained('bert-base-uncased')
model_name = 'kuleshov-group/udlm-lm1b'
model = AutoModelForMaskedLM.from_pretrained(model_name)
Model Details
UDLM stands for Uniform Diffusion Language Models. This model was trained using the refined uniform noise discrete diffusion continuous-time ELBO introduced here.
Architecture
The model has a context size of 128 tokens. The model has 139M parameters.
The model architecture is based off of the Diffusion Transformer architecture and consists of:
- 12 multi-head attention blocks (with 12 attention heads),
- hidden dimension of 768,
adaLN
for conditioning on time-step (i.e., during diffusion training / generation).
Training Details
The model was trained using the bert-base-uncased
tokenizer.
We trained for 1M gradient update steps using a batch size of 512.
We use linear warm-up with 2500 steps until we reach a constant learning rate of 3e-4.
For more details, please refer to our work: Simple Guidance Mechanisms for Discrete Diffusion Models.
Citation
Please cite our work using the bibtex below:
BibTeX:
@article{schiff2024discreteguidance,
title={Simple Guidance Mechanisms for Discrete Diffusion Models},
author={Schiff, Yair and Sahoo, Subham Sekhar and Phung, Hao and Wang, Guanghan and Boshar, Sam and Dalla-torre, Hugo and de Almeida, Bernardo P and Rush, Alexander and Pierrot, Thomas and Kuleshov, Volodymyr},
journal={arXiv preprint arXiv:2412.10193},
year={2024}
}