IPT-125m (WIP)

IPT-125m is a decoder-style transformer pretrained from scratch on 4.36 billion tokens of Italian text from the OSCAR-2301 dataset.

If you like this project, consider supporting me with a cup of coffee! 🤖✨🌞 Buy me a coffee

How to Use

This model is best used with the Hugging Face transformers library for training and finetuning.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("efederici/ipt-125m", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("efederici/ipt-125m")

Model Description

The architecture is a modification of a standard decoder-only transformer.

Hyperparameter Value
n_parameters 125M
n_layers 12
n_heads 12
d_model 768
vocab size 50432
sequence length 2048
Downloads last month
15
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Dataset used to train efederici/ipt-125m