|
--- |
|
license: cc |
|
datasets: |
|
- Krooz/Campus_Recruitment_Text |
|
language: |
|
- en |
|
library_name: peft |
|
pipeline_tag: text-classification |
|
tags: |
|
- Education |
|
--- |
|
|
|
## Recruitment Guide Mistral 7B-Instruct |
|
Mistral 7B instruct fine-tuned on the [Campus Recruitment Text](https://huggingface.co/datasets/Krooz/Campus_Recruitment_Text) dataset with LoRA and 4bit quantization. See the Github [repository](https://github.com/Kirushikesh/Campus_Recruitment_Prediction_LLM) |
|
for training details. This model is trained with student's university record as input and, Placement status as output. Try out the application in huggingface [spaces](https://huggingface.co/spaces/Krooz/Campus_Recruitment_Demo) |
|
|
|
|
|
## Usage |
|
This repo contains the LoRA parameters of the fine-tuned Mistral 7B model. To perform inference, load and use the model as follows: |
|
|
|
``` |
|
import torch |
|
from peft import AutoPeftModelForCausalLM |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
|
|
|
|
|
def format_instruction(input): |
|
return return f"""### Instruction: |
|
Classify the student into Placed/NotPlaced based on his/her college report details. The report includes marks scored by the student in various courses and extra curricular activities taken by them. |
|
|
|
### Report: |
|
{input} |
|
|
|
### Label: |
|
""" |
|
|
|
|
|
# input is a report card of an university graduate |
|
prompt = """John is a college-level student who has demonstrated academic excellence throughout his schooling journey. He has a cumulative GPA of 6.7 in his university, indicating his strong academic abilities. Additionally, he scored 63 on an aptitude test, showcasing his analytical and problem-solving skills. John has also engaged in one project, demonstrating his creativity and practical skills. |
|
|
|
In terms of extracurricular activities, John is actively involved in a range of areas. He has participated in one project, which showcases his ability to work collaboratively and achieve results independently. However, he has zero internships and zero workshops/certifications, which could have been an area for improvement. |
|
|
|
In terms of soft skills, John has a rating of 3.8, which suggests that he has strong social and communication capabilities. He has no placement training, meaning he would benefit from gaining hands-on experience in a professional environment. |
|
|
|
Overall, John has good academic and extracurricular achievements but would benefit from gaining more practical work experience and soft skills training. He has the potential to be an excellent candidate in the future, and it would be beneficial to him to further develop these areas.""" |
|
|
|
prompt = format_instruction(prompt) |
|
|
|
# load base LLM model, LoRA params and tokenizer |
|
model = AutoPeftModelForCausalLM.from_pretrained( |
|
"Krooz/placement-classification-mistral-7b-instruct-v1", |
|
low_cpu_mem_usage=True, |
|
torch_dtype=torch.float16, |
|
load_in_4bit=True, |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained("Krooz/placement-classification-mistral-7b-instruct-v1") |
|
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda() |
|
|
|
# inference |
|
with torch.inference_mode(): |
|
outputs = model.generate( |
|
input_ids=input_ids, |
|
max_new_tokens=100, |
|
do_sample=False |
|
) |
|
|
|
# decode output tokens and strip response |
|
outputs = outputs.detach().cpu().numpy() |
|
outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True) |
|
output = outputs[0][len(prompt):] |
|
``` |
|
|
|
References: |
|
* https://medium.com/@jeremyarancio/fine-tune-an-llm-on-your-personal-data-create-a-the-lord-of-the-rings-storyteller-6826dd614fa9 |
|
* https://blog.neuralwork.ai/an-llm-fine-tuning-cookbook-with-mistral-7b/ |
|
* https://blog.gopenai.com/fine-tuning-mistral-7b-instruct-model-in-colab-a-beginners-guide-0f7bebccf11c |