--- base_model: Alibaba-NLP/gte-base-en-v1.5 library_name: setfit metrics: - accuracy pipeline_tag: text-classification tags: - setfit - sentence-transformers - text-classification - generated_from_setfit_trainer widget: - text: Tech Start-up Revolutionizes Water Purification SAN FRANCISCO - AquaTech, a Silicon Valley start-up, unveiled its groundbreaking water purification system today. Using advanced nanotechnology, the device can purify contaminated water in seconds, potentially bringing safe drinking water to millions. "This could be a game-changer for global health," said WHO representative Dr. Amina Osei. Field trials are set to begin next month. - text: Whistleblower Exposes Massive Fraud in Medicare Billing WASHINGTON - A former employee of MedTech Solutions, a major medical equipment supplier, has come forward with explosive allegations of systematic fraud in Medicare billing practices. The whistleblower, whose identity remains protected, claims the company routinely inflated prices and billed for unnecessary equipment, defrauding the government of an estimated $1.2 billion over five years. Documents obtained by this newspaper appear to corroborate these claims, showing discrepancies between actual costs and billed amounts for common medical devices such as wheelchairs and oxygen tanks. "This isn't just about money," said Senator Lisa Kline, chair of the Senate Health Committee. "This kind of fraud directly impacts patient care and drives up healthcare costs for everyone." The Department of Justice has announced a full investigation into MedTech Solutions and its parent company, HealthCorp International. Industry experts suggest this could be just the tip of the iceberg, with similar practices potentially widespread across the medical supply sector. MedTech Solutions has denied all allegations and vowed to cooperate fully with investigators. - text: Nursing Home Chain Under Fire for Neglect and Fraud CHICAGO - A damning report released today by state health inspectors reveals a pattern of severe neglect and fraudulent practices across Sunset Years, one of the nation's largest nursing home chains. Investigators found widespread understaffing, with some facilities staffed at dangerously low levels while still billing Medicare and Medicaid for full care. In several instances, residents were found to be malnourished or suffering from untreated bedsores, despite records indicating proper care. "It's heartbreaking," said Maria Rodriguez, whose mother was a resident at one of the chain's Chicago facilities. "We trusted them with our loved ones, and they betrayed that trust for profit." Sunset Years CEO Robert Thompson issued a statement claiming the issues were isolated incidents and not reflective of the company's overall standards. However, multiple state attorneys general have announced plans to pursue legal action against the chain - text: Global Coffee Prices Surge Amid Brazilian Drought Coffee futures hit a five-year high today as severe drought continues to ravage Brazil's coffee-growing regions. Experts warn consumers may see significant price increases in coming months. - text: 'BREAKING: Hospital CEO Arrested in Kickback Scheme Federal agents arrested Mercy General Hospital CEO John Smith today on charges of accepting kickbacks for preferential treatment of patients. Prosecutors allege Smith pocketed over $2 million, compromising patient care. Smith''s lawyer denies all accusations.' inference: true model-index: - name: SetFit with Alibaba-NLP/gte-base-en-v1.5 results: - task: type: text-classification name: Text Classification dataset: name: Unknown type: unknown split: test metrics: - type: accuracy value: 0.8181818181818182 name: Accuracy --- # SetFit with Alibaba-NLP/gte-base-en-v1.5 This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [Alibaba-NLP/gte-base-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) as the Sentence Transformer embedding model. A [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance is used for classification. The model has been trained using an efficient few-shot learning technique that involves: 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning. 2. Training a classification head with features from the fine-tuned Sentence Transformer. ## Model Details ### Model Description - **Model Type:** SetFit - **Sentence Transformer body:** [Alibaba-NLP/gte-base-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) - **Classification head:** a [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance - **Maximum Sequence Length:** 8192 tokens - **Number of Classes:** 2 classes ### Model Sources - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit) - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055) - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit) ### Model Labels | Label | Examples | |:------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 1 | | | 0 | | ## Evaluation ### Metrics | Label | Accuracy | |:--------|:---------| | **all** | 0.8182 | ## Uses ### Direct Use for Inference First install the SetFit library: ```bash pip install setfit ``` Then you can load this model and run inference. ```python from setfit import SetFitModel # Download from the 🤗 Hub model = SetFitModel.from_pretrained("twright8/news_cats") # Run inference preds = model("Global Coffee Prices Surge Amid Brazilian Drought Coffee futures hit a five-year high today as severe drought continues to ravage Brazil's coffee-growing regions. Experts warn consumers may see significant price increases in coming months.") ``` ## Training Details ### Training Set Metrics | Training set | Min | Median | Max | |:-------------|:----|:---------|:----| | Word count | 55 | 153.8462 | 290 | | Label | Training Sample Count | |:------|:----------------------| | 0 | 13 | | 1 | 13 | ### Training Hyperparameters - batch_size: (8, 1) - num_epochs: (3, 17) - max_steps: -1 - sampling_strategy: oversampling - body_learning_rate: (9.629116538858926e-05, 2.651259436793277e-05) - head_learning_rate: 0.02145586669240117 - loss: CoSENTLoss - distance_metric: cosine_distance - margin: 0.25 - end_to_end: True - use_amp: True - warmup_proportion: 0.1 - max_length: 512 - seed: 42 - eval_max_steps: -1 - load_best_model_at_end: True ### Training Results | Epoch | Step | Training Loss | Validation Loss | |:----------:|:------:|:-------------:|:---------------:| | 0.0217 | 1 | 1.8133 | - | | **0.4348** | **20** | **0.0054** | **1.6363** | | 0.8696 | 40 | 0.0 | 4.9011 | | 1.3043 | 60 | 0.0 | 7.0885 | | 1.7391 | 80 | 0.0 | 6.2756 | | 2.1739 | 100 | 0.0 | 6.2417 | | 2.6087 | 120 | 0.0 | 6.4769 | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.10.13 - SetFit: 1.0.3 - Sentence Transformers: 3.0.1 - Transformers: 4.39.0 - PyTorch: 2.3.0+cu121 - Datasets: 2.20.0 - Tokenizers: 0.15.2 ## Citation ### BibTeX ```bibtex @article{https://doi.org/10.48550/arxiv.2209.11055, doi = {10.48550/ARXIV.2209.11055}, url = {https://arxiv.org/abs/2209.11055}, author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren}, keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {Efficient Few-Shot Learning Without Prompts}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Attribution 4.0 International} } ```