prithivida
commited on
Commit
•
6e7e914
1
Parent(s):
92abd3f
Update README.md
Browse files
README.md
CHANGED
@@ -144,7 +144,7 @@ for query, query_embedding in zip(queries, query_embeddings):
|
|
144 |
# FAQS
|
145 |
|
146 |
#### How can I reduce overall inference cost ?
|
147 |
-
- You can
|
148 |
|
149 |
#### How do I reduce vector storage cost ?
|
150 |
[Use Binary and Scalar Quantisation](https://huggingface.co/blog/embedding-quantization)
|
|
|
144 |
# FAQS
|
145 |
|
146 |
#### How can I reduce overall inference cost ?
|
147 |
+
- You can host these models without heavy torch dependency using the ONNX flavours of these models via [FlashRetrieve](https://github.com/PrithivirajDamodaran/FlashRetrieve) library.
|
148 |
|
149 |
#### How do I reduce vector storage cost ?
|
150 |
[Use Binary and Scalar Quantisation](https://huggingface.co/blog/embedding-quantization)
|