--- library_name: transformers base_model: - internlm/internlm3-8b-instruct --- ### Requirements ```python pip install -U transformers optimum auto-gptq ``` #### Transformers inference ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16 device = "auto" model_name = "jakiAJK/internlm3-8b-instruct_GPTQ-int4" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, device_map= device, trust_remote_code= True, torch_dtype= dtype) model.eval() chat = [ { "role": "user", "content": "List any 5 country capitals." }, ] chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) input_tokens = tokenizer(chat, return_tensors="pt").to('cuda') output = model.generate(**input_tokens, max_new_tokens=100) output = tokenizer.batch_decode(output) print(output) ```