File size: 2,903 Bytes
ae5d806 dd9edfc ae5d806 dd9edfc ae5d806 dd9edfc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
---
language:
- en
- ko
license: llama2
library_name: transformers
tags:
- tech
- translation
- enko
- ko
base_model:
- meta-llama/Meta-Llama-3-8B-Instruct
datasets:
- nayohan/026_tech_translation
pipeline_tag: text-generation
---
# **Introduction**
This model is trained to translate single sentences from English to Korean in a domain related to technology science.
### **Loading the Model**
Use the following Python code to load the model:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "nayohan/llama3-8b-translation-en-ko-1sent"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,device_map="auto",torch_dtype=torch.float16)
```
### **Generating Text**
To generate text, use the following Python code: No support for other languages or reverse direction and styles at this time.
```python
source="en"
target="ko"
style="written"
SYSTEM_PROMPT=f"Acts as a translator. Translate {source} sentences into {target} sentences in {style} style."
s = "The aerospace industry is a flower in the field of technology and science."
conversation = [{'role': 'system', 'content': SYSTEM_PROMPT},
{'role': 'user', 'content': s}]
inputs = tokenizer.apply_chat_template(conversation,tokenize=True,add_generation_prompt=True,return_tensors='pt').to("cuda")
outputs = model.generate(inputs, early_stopping=True, max_new_tokens=256)
print(tokenizer.decode(outputs[0][len(inputs[0]):]))
```
```
# Result
# INPUT: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nActs as a translator. Translate en sentences into ko sentences in written style.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nThe aerospace industry is a flower in the field of technology and science.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
# OUTPUT: 항공 우주 산업은 기술과 과학 분야의 꽃이라고 할 수 있다.
## [Warning!] In multiple sentences, there is a tendency to output in a single sentence.
# INPUT: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nActs as a translator. Translate ko sentences into en sentences in written style.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n
Technical and basic sciences are very important in terms of research. It has a significant impact on the industrial development of a country. Government policies control the research budget.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
# OUTPUT: 연구 측면에서 기술 및 기초 과학은 국가의 산업 발전에 큰 영향을 미치며 정부 정책은 연구 예산을 통제한다.
```
### **Citation**
```bibtex
@article{llama3modelcard,
title={Llama 3 Model Card},
author={AI@Meta},
year={2024},
url={https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}
```
Our trainig code can be found here: [TBD] |