Transformers
PyTorch
code
custom_code
Inference Endpoints
codesage commited on
Commit
05a7597
·
verified ·
1 Parent(s): 69fff7d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -58,9 +58,11 @@ For this V2 model, we enhanced semantic search performance by improving the qual
58
  ### Training Data
59
  This pretrained checkpoint is the same as those used by our V1 model ([codesage/codesage-small](https://huggingface.co/codesage/codesage-small), which is trained on [The Stack](https://huggingface.co/datasets/bigcode/the-stack-dedup) data. The constative learning data are extracted from [The Stack V2](https://huggingface.co/datasets/bigcode/the-stack-v2). Same as our V1 model, we supported nine languages as follows: c, c-sharp, go, java, javascript, typescript, php, python, ruby.
60
 
61
- ### How to use
62
- This checkpoint consists of an encoder (130M model), which can be used to extract code embeddings of 1024 dimension. It can be easily loaded using the AutoModel functionality and employs the [Starcoder Tokenizer](https://arxiv.org/pdf/2305.06161.pdf).
63
 
 
 
64
  ```
65
  from transformers import AutoModel, AutoTokenizer
66
 
@@ -77,6 +79,12 @@ inputs = tokenizer.encode("def print_hello_world():\tprint('Hello World!')", ret
77
  embedding = model(inputs)[0]
78
  ```
79
 
 
 
 
 
 
 
80
  ### BibTeX entry and citation info
81
  ```
82
  @inproceedings{
 
58
  ### Training Data
59
  This pretrained checkpoint is the same as those used by our V1 model ([codesage/codesage-small](https://huggingface.co/codesage/codesage-small), which is trained on [The Stack](https://huggingface.co/datasets/bigcode/the-stack-dedup) data. The constative learning data are extracted from [The Stack V2](https://huggingface.co/datasets/bigcode/the-stack-v2). Same as our V1 model, we supported nine languages as follows: c, c-sharp, go, java, javascript, typescript, php, python, ruby.
60
 
61
+ ### How to Use
62
+ This checkpoint consists of an encoder (356M model), which can be used to extract code embeddings of 1024 dimension.
63
 
64
+ 1. Accessing CodeSage via HuggingFace: it can be easily loaded using the AutoModel functionality and employs the [Starcoder Tokenizer](https://arxiv.org/pdf/2305.06161.pdf).
65
+
66
  ```
67
  from transformers import AutoModel, AutoTokenizer
68
 
 
79
  embedding = model(inputs)[0]
80
  ```
81
 
82
+ 2. Accessing CodeSage via SentenceTransformer
83
+ ```
84
+ from sentence_transformers import SentenceTransformer
85
+ model = SentenceTransformer("codesage/codesage-base-v2", trust_remote_code=True)
86
+ ```
87
+
88
  ### BibTeX entry and citation info
89
  ```
90
  @inproceedings{