Model Card for mixtral-8x7B-redbioma-qg-v0.2

Model Details

Model Description

This model was trained with the purpose of generating closed domain question-answer pairs focused on scientific texts of Costa Rica biodiversity. This texts are from scientific papers describing all kinds of species and its caractheristics. The texts are then inserted in the model to get high quality QA pairs.

All the resources and data used for the purposed of this project comes from the project redbioma through the Computer Science School of the Instituto Tecnol贸gico de Costa Rica (ITCR).

  • Developed by: Alejandro D铆az Pereira
  • Model type: Question answering generation.
  • Language(s) (NLP): Spanish
  • License: Apache 2.0
  • Finetuned from model: Mixtral-8x7B-Instruct-v0.1

Model Sources

Uses

This model was trained focused on closed domain question-answering generation in spanish, it is not intended to use for other tasks.

Bias, Risks, and Limitations

The model was trained on a small dataset of QA pairs (26035 examples), which are mostly extractive, therefore the model is biased in the way it can generate answers from a context. It works well with text from 400 chars to 4000 chars. But, due to the low diversity of texts, the model often generate one or two QA pairs. With some prompt engineering the results could be improved.

The prompt used that generated the best results is the following:

Genere un grupo de preguntas distintas y sus respuestas exactas correspondientes basadas en la entrada proporcionada sin repetir preguntas. Las preguntas deben explorar diferentes facetas de la informaci贸n presentada, las respuestas deben ser precisas, detalladas y comprensibles para un p煤blico no especialista. Enf贸quese en la claridad y profundidad para mejorar la comprensi贸n.

Sentences such as grupo, exactas, sin repetir preguntas improved the way the model generate the questions, it made the results more diverse, detailed and focused on using only the knowledge from the context.

The model show only one case of endless generation using this prompt and the following input text. Further training or prompt engineering could resolve this issue.

Vuela recto y r谩pidamente hasta la percha, y luego mantiene el cuerpo inm贸vil mientras inclina la cabeza en 谩ngulos raros para revisar la vegetaci贸n a su alrededor. Atrapa insectos, sus larvas y pupas, lagartijas peque帽as o frutos del follaje mediante salidas s煤bitas y agitadas. Un solo individuo puede acompa帽ar a bandadas mixtas de hormigueritos, furn谩ridos y tangaras.

Training Details

Training Data

This model was trained using the following datasets:

Training Procedure

The model was trained using self-supervised training. The parameters used for the training are as follows:

per_device_train_batch_size=4,
per_device_eval_batch_size=2,
eval_accumulation_steps=2,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
fp16=True,
evaluation_strategy="steps",
adam_beta1=0.9,
adam_beta2=0.95,
num_train_epochs=1,
learning_rate=5e-5,
weight_decay=0.01,
lr_scheduler_type="cosine",
warmup_ratio=0.1,
logging_steps=50,
eval_steps=100,
save_strategy="no",
report_to="tensorboard",
gradient_checkpointing_kwargs={"use_reentrant":False},
deepspeed="./deepspeed_config.json",
optim="adamw_bnb_8bit",

All this parameters were choosen to keep good memory consumption and training time. This were taken from https://blog.vessl.ai/finetuning-mixtral-8x7B. Deepspeed configuration could be found in the model repo.

Training Hyperparameters

  • Training regime: fp16 mixed precision.

Technical Specifications [optional]

Compute Infrastructure

For the training and inference phase it was used Jarvis.ai.

Hardware

The model was trained with the following hardware specifications:

  • RAM: 256GB (total)
  • GPU: 2 x RTX6000 Ada (48GB)

Software

The model was trained using Pytorch.

Model Card Contact

Alejandro D铆az Pereira (Main developer)

Email: [email protected]

Github: alejandrodp

Downloads last month
6
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.