--- base_model: unsloth/qwen2.5-coder-1.5b-instruct-bnb-4bit language: - en license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - qwen2 - trl - sft - fast-apply - instant-apply --- > **Remember to use `temperature = 0` for optimal results during inference.** # FastApply-1.5B-v1.0 [Github: kortix-ai/fast-apply](https://github.com/kortix-ai/fast-apply) [Dataset: Kortix/FastApply-dataset-v1.0](https://huggingface.co/datasets/Kortix/FastApply-dataset-v1.0) [Try it now on 👉 Google Colab](https://colab.research.google.com/drive/1BNCab4oK-xBqwFQD4kCcjKc7BPKivkm1?usp=sharing) ## Model Details ### Basic Information - **Developed by:** Kortix - **License:** apache-2.0 - **Finetuned from model:** [unsloth/Qwen2.5-Coder-1.5B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Qwen2.5-Coder-1.5B-Instruct-bnb-4bit) ### Model Description FastApply-1.5B-v1.0 is a 1.5B model designed for instant code application, producing full file edits to power [SoftGen AI](https://softgen.ai/). It is part of the Fast Apply pipeline for data generation and fine-tuning Qwen2.5 Coder models. The model achieves high throughput when deployed on fast providers like Fireworks while maintaining high edit accuracy, with a speed of approximately 340 tokens/second. ## Intended Use FastApply-1.5B-v1.0 is intended for use in AI-powered code editors and tools that require fast, accurate code modifications. It is particularly well-suited for: - Instant code application tasks - Full file edits - Integration with AI-powered code editors like Aider and PearAI - Local tools to reduce the cost of frontier model output ## Inference template FastApply-1.5B-v1.0 is based on the Qwen2.5 Coder architecture and is fine-tuned for code editing tasks. It uses a specific prompt structure for inference: ``` <|im_start|>system You are a coding assistant that helps merge code updates, ensuring every modification is fully integrated.<|im_end|> <|im_start|>user Merge all changes from the snippet into the below. - Preserve the code's structure, order, comments, and indentation exactly. - Output only the updated code, enclosed within and tags. - Do not include any additional text, explanations, placeholders, ellipses, or code fences. {original_code} {update_snippet} Provide the complete updated code.<|im_end|> <|im_start|>assistant ``` The model's output is structured as: ``` [Full-complete updated file] ``` ## Additional Information For more details on the Fast Apply pipeline, data generation process, and deployment instructions, please refer to the [GitHub repository](https://github.com/kortix-ai/fast-apply). ## How to Use To use the model, you can load it using the Hugging Face Transformers library: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("Kortix/FastApply-1.5B-v1.0", device_map="auto") tokenizer = AutoTokenizer.from_pretrained("Kortix/FastApply-1.5B-v1.0") # Prepare your input following the prompt structure mentioned above input_text = """<|im_start|>system You are a coding assistant that helps merge code updates, ensuring every modification is fully integrated.<|im_end|> <|im_start|>user Merge all changes from the snippet into the below. - Preserve the code's structure, order, comments, and indentation exactly. - Output only the updated code, enclosed within and tags. - Do not include any additional text, explanations, placeholders, ellipses, or code fences. {original_code} {update_snippet} Provide the complete updated code.<|im_end|> <|im_start|>assistant """ input_text = input_text.format( original_code=original_code, update_snippet=update_snippet, ).strip() # Generate the response input_ids = tokenizer.encode(input_text, return_tensors="pt") output = model.generate(input_ids, max_length=8192,) response = tokenizer.decode(output[0][len(input_ids[0]):]) print(response) # Extract the updated code from the response updated_code = response.split("")[1].split("")[0] ```