English
pszemraj commited on
Commit
c40464d
1 Parent(s): 13ebdae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -1
README.md CHANGED
@@ -4,7 +4,6 @@ datasets:
4
  - BEE-spoke-data/bees-internal
5
  language:
6
  - en
7
- library_name: transformers
8
  ---
9
 
10
  # BeeTokenizer
@@ -58,3 +57,20 @@ print(f"Tokens:\n\t{output.input_ids}")
58
  # Offsets
59
  offsets = output['offset_mapping']
60
  print(f"Offsets: {offsets}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - BEE-spoke-data/bees-internal
5
  language:
6
  - en
 
7
  ---
8
 
9
  # BeeTokenizer
 
57
  # Offsets
58
  offsets = output['offset_mapping']
59
  print(f"Offsets: {offsets}")
60
+ ```
61
+
62
+ This should result in the following (_Nov 2023 version_):
63
+
64
+ <pre>&gt;&gt;&gt; print(f&quot;Test string: {test_string}&quot;)
65
+ Test string: When dealing with Varroa destructor mites, it&apos;s crucial to administer the right acaricides during the late autumn months, but only after ensuring that the worker bee population is free from pesticide contamination.
66
+ &gt;&gt;&gt;
67
+ &gt;&gt;&gt; # Tokens
68
+ &gt;&gt;&gt; tokens = tokenizer.convert_ids_to_tokens(output[&apos;input_ids&apos;])
69
+ &gt;&gt;&gt; print(f&quot;Tokens: {tokens}&quot;)
70
+ Tokens: [&apos;▁When&apos;, &apos;▁dealing&apos;, &apos;▁with&apos;, &apos;▁Varroa&apos;, &apos;▁destructor&apos;, &apos;▁mites,&apos;, &quot;▁it&apos;s&quot;, &apos;▁cru&apos;, &apos;cial&apos;, &apos;▁to&apos;, &apos;▁administer&apos;, &apos;▁the&apos;, &apos;▁right&apos;, &apos;▁acar&apos;, &apos;icides&apos;, &apos;▁during&apos;, &apos;▁the&apos;, &apos;▁late&apos;, &apos;▁autumn&apos;, &apos;▁months,&apos;, &apos;▁but&apos;, &apos;▁only&apos;, &apos;▁after&apos;, &apos;▁ensuring&apos;, &apos;▁that&apos;, &apos;▁the&apos;, &apos;▁worker&apos;, &apos;▁bee&apos;, &apos;▁population&apos;, &apos;▁is&apos;, &apos;▁free&apos;, &apos;▁from&apos;, &apos;▁pesticide&apos;, &apos;▁contamination&apos;, &apos;.&apos;]
71
+ &gt;&gt;&gt;
72
+ &gt;&gt;&gt; # Offsets
73
+ &gt;&gt;&gt; offsets = output[&apos;offset_mapping&apos;]
74
+ &gt;&gt;&gt; print(f&quot;Offsets: {offsets}&quot;)
75
+ Offsets: [(0, 4), (4, 12), (12, 17), (17, 24), (24, 35), (35, 42), (42, 47), (47, 51), (51, 55), (55, 58), (58, 69), (69, 73), (73, 79), (79, 84), (84, 90), (90, 97), (97, 101), (101, 106), (106, 113), (113, 121), (121, 125), (125, 130), (130, 136), (136, 145), (145, 150), (150, 154), (154, 161), (161, 165), (165, 176), (176, 179), (179, 184), (184, 189), (189, 199), (199, 213), (213, 214)]
76
+ </pre>