Uploaded model
- Developed by: Aratan
- License: apache-2.0
- Finetuned from model : llama-3.1-8b-bnb-4bit
Interencia
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"Continue the fibonnaci sequence.", # instruction
"1, 1, 2, 3, 5, 8", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)
Si usas ollama
FROM aratan_lora_model.Q4_K_M.gguf
TEMPLATE """ Below are some instructions that describe some tasks. Write responses that appropriately complete each request.{{ if .Prompt }}
Instruction:
{{ .Prompt }}{{ end }}
Response:
{{ .Response }}<|eot_id|> """
system """Responde solo a la pregunta, no inventes, se concreto."""
PARAMETER stop "<|eom_id|>" PARAMETER stop "<|end_header_id|>" PARAMETER stop "<|start_header_id|>" PARAMETER stop "<|finetune_right_pad_id|>" PARAMETER stop "<|python_tag|>" PARAMETER stop "<|end_of_text|>" PARAMETER stop "<|eot_id|>" PARAMETER stop "<|reserved_special_token_"
- Downloads last month
- 46
Hardware compatibility
Log In
to view the estimation
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support