Model Card

GPT2-medium fully fine-tuned on English conversations that include latency masking messages. The main purpose is to generate generic latency maskings.

The generation format is like follows:

agent: Phishing scams trick users into giving away sensitive information. Always double-check the ... verified links before clicking.
user: Should I ever trust links in emails?
mask: Good to question this.
agent: It's best to visit websites directly rather than clicking on links in emails. Even seemingly legitimate emails can be deceptive. What other aspects of online security concern you?
user: Is shopping online safe these days?
mask: Important consideration.
agent: Online shopping can be safe if you shop from reputable sites. Ensure the URL begins with "https://" indicating a secure connection, and make use of services that offer buyer protection, such as secure payment methods.
user: Would using a VPN enhance my security?
mask: Interesting angle.

Example usage (transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList
import time

# Load the model and tokenizer
model_name = 'promethistai/lat_mask_gpt2-medium-splits'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Define the custom stopping criterion
class StopOnNewline(StoppingCriteria):
    def __init__(self, tokenizer):
        self.tokenizer = tokenizer

    def __call__(self, input_ids, scores, **kwargs):
        last_token = input_ids[0, -1].item()  # Get the last token id
        last_token_str = self.tokenizer.decode(last_token)  # Decode token to string
        if last_token_str == '\n':  # Stop generation if newline is encountered
            return True
        return False

# Instantiate the stopping criterion
stop_criterion = StopOnNewline(tokenizer)

# Input text
input_text = """user: what the hell.
mask: I see your frustration.
agent: I apologize for any confusion. If you have any questions or topics you'd like to explore, feel free to share them.
user: yeah i would like to speak about the ceo.
mask: Let's dive into that topic.
agent: PromethistAI's CEO is Petr Škoda. Feel free to ask more questions!
user: Who am I?
mask:"""

# Tokenize input
input_ids = tokenizer.encode(input_text, return_tensors='pt').to("cuda")

start_time = time.time()

# Generate output with stopping criterion
output = model.generate(input_ids, 
                        max_new_tokens=100,
                        num_return_sequences=1, 
                        pad_token_id=tokenizer.eos_token_id,
                        do_sample=True,
                        temperature=0.4,
                        top_p=0.95,
                        stopping_criteria=StoppingCriteriaList([stop_criterion])  # Pass custom stopping criterion
                        )

end_time = time.time()

# Print input text and generated text
BLUE = '\033[94m'
print(input_text + BLUE + tokenizer.decode(output[0], skip_special_tokens=False).replace(input_text, '') + '\033[0m')

print(f"Time taken: {end_time - start_time:.2f} seconds") # vllm is much faster than transformers inference

Example usage (vllm), chat template is like the one above.

from vllm import LLM, SamplingParams
import time
import json

llm = LLM(model="promethistai/lat_mask_gpt2-medium-splits")  #  0.07s
sampling_params = SamplingParams(temperature=0.7, max_tokens=18, top_p=0.9,) #stop=["\n"])

conversation = [
     {"role": "user", "content": "Hello"},
     {"role": "mask", "content": "I hear you. mother"},
     {"role": "agent", "content": "Hello, how can I help you?"},
     {"role": "user", "content": "I want to know more about calling from the UK to the US."},
 ]

chat_template_jinja = ""
with open("model_selection/chat_template_simple.jinja", "r") as f:
    chat_template_jinja = f.read()

outputs = llm.chat(conversation,
                   sampling_params=sampling_params,
                   chat_template=chat_template_jinja,
                   use_tqdm=False,
                   )

print("Prompt:")
print("***" + outputs[0].prompt + "***")
Downloads last month
167
Safetensors
Model size
355M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for promethistai/lat_mask_gpt2-medium-splits

Finetuned
(103)
this model