Model Card

GPT2-medium fully fine-tuned on English conversations that include latency masking messages. The main purpose is to generate generic latency maskings.

The generation format is like follows:

agent: Phishing scams trick users into giving away sensitive information. Always double-check the ... verified links before clicking.
user: Should I ever trust links in emails?
mask: Good to question this.
agent: It's best to visit websites directly rather than clicking on links in emails. Even seemingly legitimate emails can be deceptive. What other aspects of online security concern you?
user: Is shopping online safe these days?
mask: Important consideration.
agent: Online shopping can be safe if you shop from reputable sites. Ensure the URL begins with "https://" indicating a secure connection, and make use of services that offer buyer protection, such as secure payment methods.
user: Would using a VPN enhance my security?
mask: Interesting angle.

Example usage (transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList
import time

# Load the model and tokenizer
model_name = 'promethistai/lat_mask_gpt2-medium-splits'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Define the custom stopping criterion
class StopOnNewline(StoppingCriteria):
    def __init__(self, tokenizer):
        self.tokenizer = tokenizer

    def __call__(self, input_ids, scores, **kwargs):
        last_token = input_ids[0, -1].item()  # Get the last token id
        last_token_str = self.tokenizer.decode(last_token)  # Decode token to string
        if last_token_str == '\n':  # Stop generation if newline is encountered
            return True
        return False

# Instantiate the stopping criterion
stop_criterion = StopOnNewline(tokenizer)

# Input text
input_text = """user: what the hell.
mask: I see your frustration.
agent: I apologize for any confusion. If you have any questions or topics you'd like to explore, feel free to share them.
user: yeah i would like to speak about the ceo.
mask: Let's dive into that topic.
agent: PromethistAI's CEO is Petr Škoda. Feel free to ask more questions!
user: Who am I?
mask:"""

# Tokenize input
input_ids = tokenizer.encode(input_text, return_tensors='pt').to("cuda")

start_time = time.time()

# Generate output with stopping criterion
output = model.generate(input_ids, 
                        max_new_tokens=100,
                        num_return_sequences=1, 
                        pad_token_id=tokenizer.eos_token_id,
                        do_sample=True,
                        temperature=0.4,
                        top_p=0.95,
                        stopping_criteria=StoppingCriteriaList([stop_criterion])  # Pass custom stopping criterion
                        )

end_time = time.time()

# Print input text and generated text
BLUE = '\033[94m'
print(input_text + BLUE + tokenizer.decode(output[0], skip_special_tokens=False).replace(input_text, '') + '\033[0m')

print(f"Time taken: {end_time - start_time:.2f} seconds") # vllm is much faster than transformers inference

Example usage (vllm), chat template is like the one above.

from vllm import LLM, SamplingParams
import time
import json

llm = LLM(model="promethistai/lat_mask_gpt2-medium-splits")  #  0.07s
sampling_params = SamplingParams(temperature=0.7, max_tokens=18, top_p=0.9,) #stop=["\n"])

conversation = [
     {"role": "user", "content": "Hello"},
     {"role": "mask", "content": "I hear you. mother"},
     {"role": "agent", "content": "Hello, how can I help you?"},
     {"role": "user", "content": "I want to know more about calling from the UK to the US."},
 ]

chat_template_jinja = ""
with open("model_selection/chat_template_simple.jinja", "r") as f:
    chat_template_jinja = f.read()

outputs = llm.chat(conversation,
                   sampling_params=sampling_params,
                   chat_template=chat_template_jinja,
                   use_tqdm=False,
                   )

print("Prompt:")
print("***" + outputs[0].prompt + "***")

promethistai
/

lat_mask_gpt2-medium-splits

Model Card

Example usage (transformers)

Example usage (vllm), chat template is like the one above.

Model tree for promethistai/lat_mask_gpt2-medium-splits