Edit model card

Model Card for Model ID

This model was trained to choose between RAG and COT techniques for especific domain chat aplications. Depending on users questions, the model may choose what is the best way to generate the response.

  • Sometimes, questions are domain specific and can be answered by performing a simple RAG.
  • Sometimes, we may get complex questions that require a step by step approach.

We performed a simple prompt tunning over a low-parameters base model so that we can create a basic low parameter model capable of few-shot classification with really low dataset of nearly ~100 samples.

Base Model Sources

Prompt tunned version from bigscience/bloom-560m on a bnb configuration of 4bits.

Uses

This model aims to start to perform a especific task by choosing Retrieval Augmented Generation-RAG or Chain of Thought-COT

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
def make_inference(query, model):
  prompt = """\
### Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Categorize this question into one of this two categories:

RAG
COT

Input:
{Question}

### Response:
  """

  batch = tokenizer(prompt.format(Question=query), return_tensors='pt').to("cuda")

  with torch.cuda.amp.autocast():
    output_tokens = model.generate(**batch, max_new_tokens=10)

  return output_tokens
query = "{your_question_goes_here}"
output_tokens = make_inference(query, model)
response = tokenizer.decode(output_tokens[0])
print(response)

Training Details

Training Data

The dataset used is a sinthetic dataset that contains pairs, values of quentions, techniques.

Training Prompt

### Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Categorize this question into one of this two categories:

RAG
COT

Input:
{Question}

### Response:
{Category}

### End

"""

Training Hyperparameters

  • evaluation_strategy="steps",
  • eval_steps=1,
  • logging_strategy="steps",
  • per_device_train_batch_size=6,
  • gradient_accumulation_steps=4,
  • warmup_steps=50,
  • max_steps=100,
  • learning_rate=1e-3,
  • fp16=True,
  • logging_steps=1,

Evaluation

Metrics

  • Accuracy

Results

Train:

Accuracy: 0.9659090909090909 image/png

Validation:

Accuracy: 0.9090909090909091 image/png

Test:

Accuracy: 1.0 image/png

Model Card Contact

Linkedin: www.linkedin.com/in/jrodriguez130

Downloads last month
4
Safetensors
Model size
559M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.