You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

By clicking "Agree", you agree to the License Agreement and acknowledge Writer's Privacy Policy.

Log in or Sign Up to review the conditions and access this model content.

Palmyra-local-1.7B-Instruct

Introduction
Palmyra-local is part of the Palmyra series of domain-specialized language models, designed for high performance on enterprise and task-specific use cases. This release features a 1.7 billion parameter instruction-tuned variant of Palmyra-local, built for local deployment and optimized for enterprise-grade language understanding and generation.

Compared to earlier versions, Palmyra-local brings the following enhancements:

  • Stronger domain reasoning in code and math, powered by targeted expert tuning and curated domain datasets.
  • Improved instruction-following, generation of long-form outputs (8K+ tokens), accurate handling of structured data (e.g., tables), and consistent structured output generation (especially JSON).
  • Robust prompt handling, enabling nuanced role-play, dynamic agent behavior, and complex prompt chaining in enterprise workflows.
  • Extended context support, with a maximum context window of 128K tokens and generation support for up to 8K tokens.
  • Multilingual capabilities, supporting over 29 languages including English, Spanish, French, German, Chinese, Arabic, Japanese, and more.

This repository includes the instruction-tuned Palmyra-local 1.7B model, with the following architecture details:

  • Type: Causal Language Model
  • Training Stages: Pretraining + Instruction Tuning
  • Architecture: Transformer with RoPE positional encoding
  • Total Parameters: 1.7B
  • Number of Layers: 28
  • Attention Heads: GQA

Training Details

  • Architecture: Palmyra
  • Training Method: From scratch
  • Attention Mechanism: GQA
  • Training Data: [~1T packed dataset]

Benchmark Results

Benchmark Palmyra-local-1.7B Qwen2.5-1.5B-Instruct GPT-4 mini Llama-3.2-1B-Instruct Llama-3.2-3B-Instruct
HumanEval 74.10 61.60 N/A N/A N/A
MBPP 66.86 63.20 N/A N/A N/A
GSM8K 81.0 73.20 88.6 N/A 75.6
MATH 60.94 55.20 64.0 N/A 46.7
MMLU 59.82 58.37 67.3 32.2 58.0
MMLU Pro 34.10 32.40 52.8 N/A N/A
Average 62.8 57.33 N/A N/A N/A

Notes:

  • HumanEval and MBPP: Benchmark data for these tasks were not available for GPT-4 mini, Llama-3.2-1B-Instruct, and Llama-3.2-3B-Instruct based on the model created sources.

Usage

Install dependencies

requirements.txt

transformers==4.51.0
torch==2.6.0
tokenizers==0.21.1
accelerate==1.6.0
pip install -r requirements.txt

Inference

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Writer/Palmyra-local-1_7B"
auth_token = "xxx"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token=auth_token)

# Load model with quantization for lower memory usage (optional)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
    token=auth_token,
)

# Prepare input
messages = [
    {"role": "user", "content": "Write a blog post about strangelets"},
]

# Check if apply_chat_template is available, fallback if not
if hasattr(tokenizer, "apply_chat_template"):
    input_ids = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    )
else:
    input_text = messages[0]["content"]
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids

# Ensure input_ids is on the same device as the model
input_ids = input_ids.to(model.device)

# Generation config
gen_conf = {
    "max_new_tokens": 256,
    "eos_token_id": tokenizer.eos_token_id,
    "temperature": 0.7,
    "top_p": 0.9,
}

# Generate output
with torch.inference_mode():
    output_id = model.generate(input_ids, **gen_conf)

# Decode output
output_text = tokenizer.decode(output_id[0][input_ids.shape[1]:], skip_special_tokens=True)

print(output_text)

Citation and Related Information

To cite this model:

@misc{Palmyra-Local-1.7B,
  author = {Writer Engineering team},
  title = {{Palmyra-Local-1.7B: A powerful LLM designed for On device run}},
  howpublished = {\url{https://dev.writer.com}},
  year = 2025,
  month = March 
}

Contact [email protected]

Downloads last month
25
Safetensors
Model size
1.78B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support