Geneva Banner

Geneva 12B GammaCorpus v2-5m

A Mistral NeMo model fine-tuned on the GammaCorpus dataset

Overview

Geneva 12B GammaCorpus v2-5m is a fine-tune of Mistral's Mistral Nemo Instruct 2407 model. Geneva is designed to outperform other models that have a similar size while also showcasing GammaCorpus v2-5m.

Model Details

  • Base Model: mistralai/Mistral-Nemo-Instruct-2407
  • Parameters: 12B
  • Layers: 40
  • Dim: 5,120
  • Head dim: 128
  • Hidden dim: 14,336
  • Activation Function: SwiGLU
  • Number of heads: 32
  • Number of kv-heads: 8 (GQA)
  • Vocabulary size: 2**17 ~= 128k
  • Rotary embeddings (theta = 1M)

Training Details

Geneva-12B-GCv2-5m underwent fine-tuning with 1 A100 GPU for ~70 minutes and trained with the Unsloth framework. Geneva-12B-GCv2-5m was trained for 60 Epochs.

Usage

Requirements

Please use the following Transformers version here:

pip install git+https://github.com/huggingface/transformers.git

Quickstart

If you want to use Hugging Face transformers to generate text, you can do something like this:

from transformers import pipeline

prompt = "How tall is the Eiffel tower?"

messages = [
    {"role": "system", "content": "You are a helpful assistant named Geneva, built on the Mistral NeMo model developed by Mistral AI, and fine-tuned by Ruben Roy."},
    {"role": "user", "content": prompt},
]

infer = pipeline("text-generation", model="rubenroy/Geneva-12B-GCv2-5m", max_new_tokens=128)

infer(messages)

About GammaCorpus

This model, and all Geneva models, are trained with GammaCorpus. GammaCorpus is a dataset on HuggingFace that is filled with structured and filtered multi-turn conversations. GammaCorpus has 4 version with different sizes in each. These are the following versions and sizes:

GammaCorpus v1

  • 10k UNFILTERED
  • 50k UNFILTERED
  • 70k UNFILTERED

Here is a link to the GCv1 dataset collection:
https://huggingface.co/collections/rubenroy/gammacorpus-v1-67935e4e52a04215f15a7a60

GammaCorpus v2

  • 10k
  • 50k
  • 100k
  • 500k
  • 1m
  • 5m <-- This is the version of GammaCorpus v2 that the Geneva model you are using was trained on.

Here is a link to the GCv2 dataset collection:
https://huggingface.co/collections/rubenroy/gammacorpus-v2-67935e895e1259c404a579df

GammaCorpus CoT

  • Math 170k

Here is a link to the GC-CoT dataset collection:
https://huggingface.co/collections/rubenroy/gammacorpus-cot-6795bbc950b62b1ced41d14f

GammaCorpus QA

  • Fact 450k

Here is a link to the GC-QA dataset collection:
https://huggingface.co/collections/rubenroy/gammacorpus-qa-679857017bb3855234c1d8c7

The link to the full GammaCorpus dataset collection can be found here.

Known Limitations:

  • Bias: We have tried our best to mitigate as much bias we can, but please be aware of the possibility that the model might generate some biased answers.

Licence:

The model is released under the Apache 2.0 License. Please refer to the license for usage rights and restrictions.

Downloads last month
28
Safetensors
Model size
12.2B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for rubenroy/Geneva-12B-GCv2-5m

Finetuned
(52)
this model

Dataset used to train rubenroy/Geneva-12B-GCv2-5m

Space using rubenroy/Geneva-12B-GCv2-5m 1

Collection including rubenroy/Geneva-12B-GCv2-5m