Push model using huggingface_hub.

Browse files

Files changed (13) hide show

1_Pooling/config.json +10 -0
README.md +275 -0
config.json +29 -0
config_sentence_transformers.json +10 -0
config_setfit.json +4 -0
model.safetensors +3 -0
model_head.pkl +3 -0
modules.json +14 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +51 -0
tokenizer.json +0 -0
tokenizer_config.json +66 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 768,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,275 @@

+---
+base_model: mini1013/master_domain
+library_name: setfit
+metrics:
+- metric
+pipeline_tag: text-classification
+tags:
+- setfit
+- sentence-transformers
+- text-classification
+- generated_from_setfit_trainer
+widget:
+- text: 참존 톤업핏 블랙라벨 라이트 KF94 새부리형 마스크 20매 바닐라 베이지(10매/중형)_오픈 화이트(10매/대형) 주식회사 참존
+- text: 털보네액상공장 전자담배액상 사이트 전담액상 입호흡 폐호흡 재료 무니코틴 피나콜라다 100ml 입호흡_베이스( 무니코틴 )_1. 피나콜라다
+    (주)커넥티드코리아
+- text: voopoo 부푸 브이메이트 민트블루 견고한 내구성 전자담배 판매 TOP1 입호흡 1. 부푸 브이메이트 E (New 핑크마블) 마윈존
+- text: '세운 석션카테타 (Suction Catheter) - LATEX #5 10FR (Sterile, no valve, 1hole) 단위:1개  (주)엠디오씨'
+- text: 붐바 일회용 전자담배 편의점 전담 4ML 애플아이스 애플아이스 원더베이프(서초)
+inference: true
+model-index:
+- name: SetFit with mini1013/master_domain
+  results:
+  - task:
+      type: text-classification
+      name: Text Classification
+    dataset:
+      name: Unknown
+      type: unknown
+      split: test
+    metrics:
+    - type: metric
+      value: 0.9428033187702914
+      name: Metric
+---
+# SetFit with mini1013/master_domain
+This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [mini1013/master_domain](https://huggingface.co/mini1013/master_domain) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
+The model has been trained using an efficient few-shot learning technique that involves:
+1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
+2. Training a classification head with features from the fine-tuned Sentence Transformer.
+## Model Details
+### Model Description
+- **Model Type:** SetFit
+- **Sentence Transformer body:** [mini1013/master_domain](https://huggingface.co/mini1013/master_domain)
+- **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
+- **Maximum Sequence Length:** 512 tokens
+- **Number of Classes:** 17 classes
+<!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
+- **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
+- **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
+### Model Labels
+| Label | Examples                                                                                                                                                                                                                                       |
+|:------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 4.0   | <ul><li>'샤오미 전자 뜸 가정용 무연 뜸뜨기 허리 온뜸 마사지 힐링 소형 한의원 전신 3. 3세대 뜸 상자 30정 패스트커머스'</li><li>'기황 백살 힐링 무연뜸 24개입 1갑 간편뜸 무연 쑥뜸  (주)글로벌에스엠'</li><li>'청훈 무연 왕쑥봉 30개입 강 황토코팅 쑥봉  주식회사 우주헬스케어'</li></ul>                                                      |
+| 9.0   | <ul><li>'동방 스프링침 100쌈 한방침 멸균침 한의원침 일회용침 0.25x40 (20pcs) 새한메디칼'</li><li>'국내생산 백살 압봉 은색 1호 100매입 2개 2.백살압봉 23mm 30매입 x 2개 주식회사 케이솔루션컴퍼니'</li><li>'고려수지침학회 서암 출혈 침관 사혈기 MinSellAmount AKmall'</li></ul>                                           |
+| 14.0  | <ul><li>'올지 입벌림방지밴드 입막음 테이프 고치는법 수면 구강호흡 기구 스트랩 무호흡_MC 1+1 리뷰 이벤트 참여(네오프렌L+메쉬M) 멸치쇼핑'</li><li>'닥터아망 이지 브레스 아로마 밤 영유아 베이 코막힘 비염 코뚫는 스틱 1개 (3+1 이벤트) 이지브레스 아로마밤 4개 올릿'</li><li>'수면 입벌림방지 입호흡방지 얼굴 밴드 마스크 코 용품 코건강 기  소보로샵'</li></ul>             |
+| 12.0  | <ul><li>'젤로맥스 팟 개선팟 1.0옴 1팩(3개) 전자담배액상 전담 공팟 코일 카트리지  달콤베이프'</li><li>'긱베이프 제우스 서브옴 탱크 코일 0.2옴 5개 레전드 2 코일  용가리전자담배'</li><li>'전자담배액상 사이트 전담액�� 돌핀액상 무니코틴 툰드라알로에  K-액상'</li></ul>                                                                 |
+| 5.0   | <ul><li>'마스클립 마스크 안쪽까지 보호하는 마스크스트랩 화이트그레이 주식회사 아이리스'</li><li>'힘내세요 마스크 필터 50매 / 국내제작 SMMS원단 교체형 소프런필터 50매(패치형) 주식회사 소프런'</li><li>'마스클립 예쁜 마스크 스트랩 명품 마스크 끈 걸이 고리 목걸이 줄 액세서리 특허출원 차콜블랙 팝코즈(POPCOZ)'</li></ul>                                   |
+| 8.0   | <ul><li>'st3 국산 목초수액시트 30포 발패치 발파스ㄴ나이스팩 발파스 발패치 수액패치 발마사지 피로한발  빙고라이프'</li><li>'휴족시간 쿨링시트 6매  코트하우스'</li><li>'라이온코리아 휴족시간 쿨링시트 6매 3개입  건후 주식회사'</li></ul>                                                                                      |
+| 16.0  | <ul><li>'왁싱워머기 제모용 고급 업소용 왁싱워머 셀프 왁스 로즈 왁스 1개(450g) 헬로구쯔'</li><li>'뉴셀 파라핀 왁스 피치 6개 세트 ZP508P6 (6종 택 1) 손 손목 발 발목 어머니 아버지 1_ZP508L6 (라벤더6개입) 제스파'</li><li>'왁싱워머기 제모용 고급 업소용 왁싱워머 셀프 왁스 쟈스민 핸드 왁스 1개 (450g) 헬로구쯔'</li></ul>                      |
+| 6.0   | <ul><li>'찐마스크 내추럴키스 KFAD 새부리형 화이트 대형50매  열정청년'</li><li>'브이스타KF94 100매 블랙/화이트/컬러/대형/중형/어린이용 새부리형 일회용마스크 01.시크블랙_중형 100매 GntClean'</li><li>'고르고 바른 덴탈마스크 일회용 마스크 대형 컬러 마스크 50+50매 대형 코랄 50매+스트랩_대형 진베이지 50매+스트랩 주식회사 씨투클로버'</li></ul>            |
+| 13.0  | <ul><li>'가정주부 스트레스해소 어깨 발안마 지압기 5개 셀프안마 마사지용품 색상선택_색상임의배송 쇼킴'</li><li>'발지압기 순수편백 굴곡주판 다용도기 발목지압 굴곡형마사지기  기븐에이블'</li><li>'손가락마사지 롤링마사지기 손가락 롤러 3중 지압 손가락 마사지 롤러 제이투씨엘'</li></ul>                                                                |
+| 3.0   | <ul><li>'이지디텍트 대장검사지 1개 분변잠혈검사 키트  주식회사 월드비젼팜'</li><li>'메디위 셀프 이지디텍트 대장검사지/간편2분/초기 대장암 자가진단/용종 검사  주식회사 소연'</li><li>'메디퓨처 이지디텍트 대장진단키트 셀프대장검사 분변 잠혈 검사 대장자가검사지 이지디텍트 1개 고메디칼'</li></ul>                                                        |
+| 1.0   | <ul><li>'정전기방지팔찌 베아르 블랙 청주컴퍼니'</li><li>'게르마늄 팔찌 커플 남자 부모님 선물 실버 남성 21CM 콤비(실버로즈골드)_여성용_15 CM 제이디에스켐'</li><li>'페이버 R300B 스포츠팔찌 3줄타입 핸드메이드 야구 용품 남자 여자 R300B-08_M 버들버들RYU'</li></ul>                                                             |
+| 10.0  | <ul><li>'원데이마스크 일회용마스크 위생 투명 식당 조리용 원데이마스크(30매입) 해피콤마'</li><li>'원데이마스크 30매 일회용 위생 투명 식당 원데이마스크(30매입) 만월잡화점'</li><li>'원데이마스크 투명 위생 마스크 30개 원데이마스크(30매입) 스카이플라'</li></ul>                                                                        |
+| 2.0   | <ul><li>'비타 스틱 대용량 니코틴없는 약국 비타민 담배 금연 보조제 파이프 금연초 스타터키트 멘솔  천시원'</li><li>'금단호흡기 금연도움 호흡기 숨편기  제이와이플래닛'</li><li>'금연 스틱 니코틴 없는 금연초 금연 파이프 레몬향  스토어헤이'</li></ul>                                                                                  |
+| 0.0   | <ul><li>'스포츠 화이텐 목걸이 야구 건강 V타입 남자 운동 화이텐 V타입목걸이 메탈릭 레드 원리빙'</li><li>'귀파개 막대 귀청소 깃털 도구 이어 클리닝 브러시 4피스귀파개세트(보관함포함) 엔케이몰'</li><li>'엘그 (erg) 시냅스 목걸이 SPACY 스페이시 스포츠 목걸이 (건 메타 × 그랑 블루) 사이즈 (41cm)  Shouzan'</li></ul>                              |
+| 15.0  | <ul><li>'나잘후레쉬 500ml 전용 코세척 분말(4.5g) 100포 x 2박스 / 코세척기 미포함  나잘후레쉬공식스토어'</li><li>'비원 분말 코 비, 도움 원 대용량 알칼리성 천일염 소금 분말 다용도 분말 비원 350g 주식회사 슈나'</li><li>'나잘후레쉬 식염분말 2.7g x 60포 x 3박스  멸치쇼핑'</li></ul>                                              |
+| 11.0  | <ul><li>'헬베이프 젤로 블랙로즈 입호흡 csv 액상 전자담배 전담 기계 기기 블랙 로즈(new) 바다웹'</li><li>'전자담배 무화기 폐호흡 교체 아크릴 천둥 패널 커버 Cthulhu AIO 박스 모드 액세서리 01 WHITE 특별한하루직구'</li><li>'오리지널 스팀 갈망 미니 로봇 튜브 모드 단일 18650 배터리 스레드 35V/ 42V 전자담배 기화기 Vape 01 Black 투코물류'</li></ul> |
+| 7.0   | <ul><li>'3M 귀마개 1100 소음 방지 폼타입 귀마개 25쌍 스타일인센글로벌로지스'</li><li>'실리콘귀덮개 염색용귀보호커버 파마용귀마개  리테일파크'</li><li>'예스이어 소음방지 이어플러그 수면귀마개 층간 차단 수영 NS4000 FI3000 TI5000 [제품2]NS4000_블루 예스이어본사'</li></ul>                                                      |
+## Evaluation
+### Metrics
+| Label   | Metric |
+|:--------|:-------|
+| **all** | 0.9428 |
+## Uses
+### Direct Use for Inference
+First install the SetFit library:
+```bash
+pip install setfit
+```
+Then you can load this model and run inference.
+```python
+from setfit import SetFitModel
+# Download from the 🤗 Hub
+model = SetFitModel.from_pretrained("mini1013/master_cate_lh0")
+# Run inference
+preds = model("붐바 일회용 전자담배 편의점 전담 4ML 애플아이스 애플아이스 원더베이프(서초)")
+```
+<!--
+### Downstream Use
+*List how someone could finetune this model on their own dataset.*
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Set Metrics
+| Training set | Min | Median  | Max |
+|:-------------|:----|:--------|:----|
+| Word count   | 3   | 10.5920 | 31  |
+| Label | Training Sample Count |
+|:------|:----------------------|
+| 0.0   | 50                    |
+| 1.0   | 50                    |
+| 2.0   | 25                    |
+| 3.0   | 50                    |
+| 4.0   | 50                    |
+| 5.0   | 50                    |
+| 6.0   | 50                    |
+| 7.0   | 50                    |
+| 8.0   | 50                    |
+| 9.0   | 50                    |
+| 10.0  | 28                    |
+| 11.0  | 50                    |
+| 12.0  | 24                    |
+| 13.0  | 50                    |
+| 14.0  | 50                    |
+| 15.0  | 50                    |
+| 16.0  | 50                    |
+### Training Hyperparameters
+- batch_size: (512, 512)
+- num_epochs: (20, 20)
+- max_steps: -1
+- sampling_strategy: oversampling
+- num_iterations: 40
+- body_learning_rate: (2e-05, 2e-05)
+- head_learning_rate: 2e-05
+- loss: CosineSimilarityLoss
+- distance_metric: cosine_distance
+- margin: 0.25
+- end_to_end: False
+- use_amp: False
+- warmup_proportion: 0.1
+- seed: 42
+- eval_max_steps: -1
+- load_best_model_at_end: False
+### Training Results
+| Epoch   | Step | Training Loss | Validation Loss |
+|:-------:|:----:|:-------------:|:---------------:|
+| 0.0082  | 1    | 0.4317        | -               |
+| 0.4098  | 50   | 0.3501        | -               |
+| 0.8197  | 100  | 0.207         | -               |
+| 1.2295  | 150  | 0.1065        | -               |
+| 1.6393  | 200  | 0.0426        | -               |
+| 2.0492  | 250  | 0.0299        | -               |
+| 2.4590  | 300  | 0.0323        | -               |
+| 2.8689  | 350  | 0.033         | -               |
+| 3.2787  | 400  | 0.0211        | -               |
+| 3.6885  | 450  | 0.0383        | -               |
+| 4.0984  | 500  | 0.0239        | -               |
+| 4.5082  | 550  | 0.0137        | -               |
+| 4.9180  | 600  | 0.0099        | -               |
+| 5.3279  | 650  | 0.0057        | -               |
+| 5.7377  | 700  | 0.0041        | -               |
+| 6.1475  | 750  | 0.0045        | -               |
+| 6.5574  | 800  | 0.0002        | -               |
+| 6.9672  | 850  | 0.0059        | -               |
+| 7.3770  | 900  | 0.0059        | -               |
+| 7.7869  | 950  | 0.0001        | -               |
+| 8.1967  | 1000 | 0.004         | -               |
+| 8.6066  | 1050 | 0.0039        | -               |
+| 9.0164  | 1100 | 0.0003        | -               |
+| 9.4262  | 1150 | 0.0002        | -               |
+| 9.8361  | 1200 | 0.0001        | -               |
+| 10.2459 | 1250 | 0.0001        | -               |
+| 10.6557 | 1300 | 0.0001        | -               |
+| 11.0656 | 1350 | 0.0001        | -               |
+| 11.4754 | 1400 | 0.0001        | -               |
+| 11.8852 | 1450 | 0.0001        | -               |
+| 12.2951 | 1500 | 0.0001        | -               |
+| 12.7049 | 1550 | 0.0001        | -               |
+| 13.1148 | 1600 | 0.0001        | -               |
+| 13.5246 | 1650 | 0.0001        | -               |
+| 13.9344 | 1700 | 0.0001        | -               |
+| 14.3443 | 1750 | 0.0001        | -               |
+| 14.7541 | 1800 | 0.0           | -               |
+| 15.1639 | 1850 | 0.0001        | -               |
+| 15.5738 | 1900 | 0.0001        | -               |
+| 15.9836 | 1950 | 0.0001        | -               |
+| 16.3934 | 2000 | 0.0           | -               |
+| 16.8033 | 2050 | 0.0001        | -               |
+| 17.2131 | 2100 | 0.0           | -               |
+| 17.6230 | 2150 | 0.0001        | -               |
+| 18.0328 | 2200 | 0.0           | -               |
+| 18.4426 | 2250 | 0.0           | -               |
+| 18.8525 | 2300 | 0.0           | -               |
+| 19.2623 | 2350 | 0.0001        | -               |
+| 19.6721 | 2400 | 0.0           | -               |
+### Framework Versions
+- Python: 3.10.12
+- SetFit: 1.1.0.dev0
+- Sentence Transformers: 3.1.1
+- Transformers: 4.46.1
+- PyTorch: 2.4.0+cu121
+- Datasets: 2.20.0
+- Tokenizers: 0.20.0
+## Citation
+### BibTeX
+```bibtex
+@article{https://doi.org/10.48550/arxiv.2209.11055,
+    doi = {10.48550/ARXIV.2209.11055},
+    url = {https://arxiv.org/abs/2209.11055},
+    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
+    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
+    title = {Efficient Few-Shot Learning Without Prompts},
+    publisher = {arXiv},
+    year = {2022},
+    copyright = {Creative Commons Attribution 4.0 International}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "_name_or_path": "mini1013/master_item_lh",
+  "architectures": [
+    "RobertaModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "classifier_dropout": null,
+  "eos_token_id": 2,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "roberta",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "position_embedding_type": "absolute",
+  "tokenizer_class": "BertTokenizer",
+  "torch_dtype": "float32",
+  "transformers_version": "4.46.1",
+  "type_vocab_size": 1,
+  "use_cache": true,
+  "vocab_size": 32000
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.1.1",
+    "transformers": "4.46.1",
+    "pytorch": "2.4.0+cu121"
+  },
+  "prompts": {},
+  "default_prompt_name": null,
+  "similarity_fn_name": null
+}

config_setfit.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "labels": null,
+  "normalize_embeddings": false
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:de2dcfc34d4da6eb93258663d1d01b11ac4d24a5a032fbaa4ab9d18b56e53299
+size 442494816

model_head.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f82c092584416eb67b783163a4186ecdcbe5f9fb9171159caee8adb7bb0311f
+size 105535

modules.json ADDED Viewed

	@@ -0,0 +1,14 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 512,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,66 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "[CLS]",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": false,
+  "eos_token": "[SEP]",
+  "mask_token": "[MASK]",
+  "max_length": 512,
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff