CrabInHoney commited on
Commit
1abaa49
·
verified ·
1 Parent(s): b30be72

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -3
README.md CHANGED
@@ -1,3 +1,72 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ pipeline_tag: text-classification
6
+ tags:
7
+ - url
8
+ - urls
9
+ - classification
10
+ ---
11
+ This is a very small version of BERT, intended for later fine-tune under URL analysis.
12
+
13
+
14
+ An updated version of the old basic model for URL analysis
15
+
16
+ Old version: https://huggingface.co/CrabInHoney/urlbert-tiny-base-v2
17
+
18
+ Model size
19
+
20
+ 3.69M params
21
+
22
+ Tensor type
23
+
24
+ F32
25
+
26
+ Test example:
27
+
28
+ from transformers import BertTokenizerFast, BertForMaskedLM, pipeline
29
+ import torch
30
+
31
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
32
+ print(f"Используемое устройство: {device}")
33
+
34
+ model_name = "CrabInHoney/urlbert-tiny-base-v3"
35
+
36
+ tokenizer = BertTokenizerFast.from_pretrained(model_name)
37
+ model = BertForMaskedLM.from_pretrained(model_name)
38
+ model.to(device)
39
+
40
+ fill_mask = pipeline(
41
+ "fill-mask",
42
+ model=model,
43
+ tokenizer=tokenizer,
44
+ device=0 if torch.cuda.is_available() else -1
45
+ )
46
+
47
+ sentences = [
48
+ "http://example.[MASK]/"
49
+ ]
50
+
51
+ for sentence in sentences:
52
+ print(f"\nИсходное предложение: {sentence}")
53
+ results = fill_mask(sentence)
54
+ for result in results:
55
+ token_str = result['token_str']
56
+ score = result['score']
57
+ print(f"Предсказанное слово: {token_str}, вероятность: {score:.4f}")
58
+
59
+
60
+ Output:
61
+
62
+ Исходное предложение: http://example.[MASK]/
63
+
64
+ Предсказанное слово: com, вероятность: 0.7018
65
+
66
+ Предсказанное слово: org, вероятность: 0.1191
67
+
68
+ Предсказанное слово: nl, вероятность: 0.0406
69
+
70
+ Предсказанное слово: net, вероятность: 0.0294
71
+
72
+ Предсказанное слово: ca, вероятность: 0.0190