Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications.

LLM Generation models trained by Jina AI, Finetuner team.

This repo contains the full weights (16bit) for Falcon-40b fit on the Code Alpaca dataset.

Reproduction

This version of the weights was trained with the following hyperparameters:

  • Epochs: 2
  • Batch size: 128
  • Micro batch size: 4
  • Learning rate: 3e-4
  • Lora r: 8
  • Lora target modules: query_key_value

You can reproduce using this repository:

https://github.com/jina-ai/jerboa

Make sure you install requirements and finetune using this command using the following command:

python finetune.py \
--base-model tiiuae/falcon-40b --lora-target-modules query_key_value \
--data-path sahil2801/CodeAlpaca-20k --output-dir ./lora-alpaca-code \
--batch-size 128 --micro-batch-size 4 --eval-limit 45 \
--eval-file code_eval.jsonl --wandb-project jerboa --wandb-log-model \
--wandb-watch gradients --num-epochs 2

Inference

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM


TOKENIZER_SOURCE = 'tiiuae/falcon-40b'
BASE_MODEL = 'jinaai/falcon-40b-code-alpaca'
DEVICE = "cuda"

PROMPT = """
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Write a for loop in python

### Input:

### Response:
"""
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=BASE_MODEL,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    device_map='auto',
)

model.eval()

tokenizer = AutoTokenizer.from_pretrained(
    TOKENIZER_SOURCE,
    trust_remote_code=True,
    padding_side='left',
)
tokenizer.pad_token = tokenizer.eos_token

inputs = tokenizer(PROMPT, return_tensors="pt")
input_ids = inputs["input_ids"].to(DEVICE)
input_attention_mask = inputs["attention_mask"].to(DEVICE)

with torch.no_grad():
    generation_output = model.generate(
        input_ids=input_ids,
        attention_mask=input_attention_mask,
        return_dict_in_generate=True,
        max_new_tokens=32,
        eos_token_id=tokenizer.eos_token_id,
    )
generation_output = generation_output.sequences[0]
output = tokenizer.decode(generation_output, skip_special_tokens=True)

print(output)

Contact

Join our Discord community and chat with other community members about ideas.

Downloads last month
57
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.