https://github.com/BM-K/Sentence-Embedding-is-all-you-need # Korean-Sentence-Embedding ๐Ÿญ Korean sentence embedding repository. You can download the pre-trained models and inference right away, also it provides environments where individuals can train models. ## Quick tour ```python import torch from transformers import AutoModel, AutoTokenizer def cal_score(a, b): if len(a.shape) == 1: a = a.unsqueeze(0) if len(b.shape) == 1: b = b.unsqueeze(0) a_norm = a / a.norm(dim=1)[:, None] b_norm = b / b.norm(dim=1)[:, None] return torch.mm(a_norm, b_norm.transpose(0, 1)) * 100 model = AutoModel.from_pretrained('BM-K/KoSimCSE-roberta-multitask') AutoTokenizer.from_pretrained('BM-K/KoSimCSE-roberta-multitask') sentences = ['์น˜ํƒ€๊ฐ€ ๋“คํŒ์„ ๊ฐ€๋กœ ์งˆ๋Ÿฌ ๋จน์ด๋ฅผ ์ซ“๋Š”๋‹ค.', '์น˜ํƒ€ ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋จน์ด ๋’ค์—์„œ ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค.', '์›์ˆญ์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•œ๋‹ค.'] inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt") embeddings, _ = model(**inputs, return_dict=False) score01 = cal_score(embeddings[0][0], embeddings[1][0]) score02 = cal_score(embeddings[0][0], embeddings[2][0]) ```