nemanjaPetrovic
/

legal-jerteh-125-sbert

Sentence Similarity

sentence-transformers

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

nemanjaPetrovic commited on Jun 1, 2024

Commit

70bdb77

·

verified ·

1 Parent(s): 5c03633

Update README.md

Files changed (1) hide show

README.md +78 -2

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-license: apache-2.0
 language:
 - sr
 library_name: sentence-transformers
@@ -8,4 +8,80 @@ tags:
 - Legal
 - SBERT
 - Jerteh
----

 ---
+license: mit
 language:
 - sr
 library_name: sentence-transformers
 - Legal
 - SBERT
 - Jerteh
+---
+## Semantic Search of Legal Data Using SBERT
+This repository contains a proof-of-concept model for semantic search of legal data, based on Sentence-BERT (SBERT) and fine-tuned using triplets. The model is designed to provide efficient and accurate semantic search capabilities for legal documents.
+### Model Overview
+-   **Base Model**: Jerteh-125
+-   **Fine-tuning Technique**: Triplet loss
+-   **Purpose**: To enable semantic search within legal data
+### Installation
+To use the model, you need to have Python 3.6 or higher installed. Additionally, install the necessary dependencies:
+`pip install transformers
+pip install sentence-transformers`
+### Usage
+Here's how you can use the model for semantic search:
+1.  **Load the Model**
+    `from sentence_transformers import SentenceTransformer
+    model = SentenceTransformer('nemanjaPetrovic/legal-jerteh-125-sbert')`
+2.  **Encode Sentences**
+`sentences = ["Sankcije se propisuju u granicama zakonom utvrđenog minimuma i maksimuma.", "Vrste krivičnih sankcija određuju se samo krivičnim zakonom."]`
+`sentence_embeddings = model.encode(sentences)`
+3.  **Perform Semantic Search**
+To perform a semantic search, you need to encode both your query and the documents you want to search through. You can then use cosine similarity to find the most relevant documents. **You should use vector database for this**, but for quick test, you can try code bellow
+`from sklearn.metrics.pairwise import cosine_similarity
+import numpy as np`
+`query = "Objasni mi pojam sankcija."`
+`query_embedding = model.encode([query])`
+`cosine_similarities = cosine_similarity(query_embedding, sentence_embeddings)`
+`most_similar_idx = np.argmax(cosine_similarities)`
+`most_similar_document = sentences[most_similar_idx]`
+`print(f"The most similar document to the query is: {most_similar_document}")`
+### Fine-tuning Details
+The model was fine-tuned using triplet loss, a common technique for training embedding models to understand semantic similarity. The fine-tuning dataset consisted of triplets (anchor, positive, negative) to teach the model to distinguish between similar and dissimilar legal documents.
+### License
+This project is licensed under the MIT License - see the [LICENSE](https://en.wikipedia.org/wiki/MIT_License) file for details.
+### Acknowledgments
+I would like to acknowledge the author of Jerteh-125 model Mihailo Skoric and the creators of Sentence-BERT for their foundational work, which made this project possible.
+### Contact
+For any questions or issues, please contact [email protected].