Spanish GPT-2 as backbone

Fine-tuned model on Spanish language using Opensubtitle dataset. The original GPT-2 model was used as backbone which has been trained from scratch on the Spanish portion of OSCAR dataset, according to the Flax/Jax Community by HuggingFace.

Model description and fine tunning

First, the model used as backbone was the OpenAI's GPT-2, introduced in the paper "Language Models are Unsupervised Multitask Learners" by Alec Radford et al. Second, transfer learning approach with a large dataset in Spanish was used to transform the text generation model to conversational tasks. The use of special tokens plays a key role in the process of fine-tuning.

tokenizer.add_special_tokens({"pad_token": "<pad>",
                              "bos_token": "<startofstring>",
                              "eos_token": "<endofstring>"})
tokenizer.add_tokens(["<bot>:"])

How to use

You can use this model directly with a pipeline for auto model with casual LM:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("erikycd/chatbot_hadita")
model = AutoModelForCausalLM.from_pretrained("erikycd/chatbot_hadita")
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
model = model.to(device)

def infer(inp):
    inp = "<startofstring> "+ inp +" <bot>: "
    inp = tokenizer(inp, return_tensors = "pt")
    X = inp["input_ids"].to(device)
    attn = inp["attention_mask"].to(device)
    output = model.generate(X, attention_mask = attn, pad_token_id = tokenizer.eos_token_id)
    output = tokenizer.decode(output[0], skip_special_tokens = True)
    return output

exit_commands = ('bye', 'quit')
text = ''
while text not in exit_commands:
    
    text = input('\nUser: ')
    output = infer(text)
    print('Bot: ', output)
    
Downloads last month
23
Safetensors
Model size
137M params
Tensor type
F32
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train erikycd/chatbot_hadita