news_cats / README.md

twright8

Push model using huggingface_hub.

7271f60 verified 3 months ago

preview code

raw

history blame

No virus

21.4 kB

	---
	base_model: Alibaba-NLP/gte-base-en-v1.5
	library_name: setfit
	metrics:
	- accuracy
	pipeline_tag: text-classification
	tags:
	- setfit
	- sentence-transformers
	- text-classification
	- generated_from_setfit_trainer
	widget:
	- text: Tech Start-up Revolutionizes Water Purification SAN FRANCISCO - AquaTech,
	a Silicon Valley start-up, unveiled its groundbreaking water purification system
	today. Using advanced nanotechnology, the device can purify contaminated water
	in seconds, potentially bringing safe drinking water to millions. "This could
	be a game-changer for global health," said WHO representative Dr. Amina Osei.
	Field trials are set to begin next month.
	- text: Whistleblower Exposes Massive Fraud in Medicare Billing WASHINGTON - A former
	employee of MedTech Solutions, a major medical equipment supplier, has come forward
	with explosive allegations of systematic fraud in Medicare billing practices.
	The whistleblower, whose identity remains protected, claims the company routinely
	inflated prices and billed for unnecessary equipment, defrauding the government
	of an estimated $1.2 billion over five years. Documents obtained by this newspaper
	appear to corroborate these claims, showing discrepancies between actual costs
	and billed amounts for common medical devices such as wheelchairs and oxygen tanks.
	"This isn't just about money," said Senator Lisa Kline, chair of the Senate Health
	Committee. "This kind of fraud directly impacts patient care and drives up healthcare
	costs for everyone." The Department of Justice has announced a full investigation
	into MedTech Solutions and its parent company, HealthCorp International. Industry
	experts suggest this could be just the tip of the iceberg, with similar practices
	potentially widespread across the medical supply sector. MedTech Solutions has
	denied all allegations and vowed to cooperate fully with investigators.
	- text: Nursing Home Chain Under Fire for Neglect and Fraud CHICAGO - A damning report
	released today by state health inspectors reveals a pattern of severe neglect
	and fraudulent practices across Sunset Years, one of the nation's largest nursing
	home chains. Investigators found widespread understaffing, with some facilities
	staffed at dangerously low levels while still billing Medicare and Medicaid for
	full care. In several instances, residents were found to be malnourished or suffering
	from untreated bedsores, despite records indicating proper care. "It's heartbreaking,"
	said Maria Rodriguez, whose mother was a resident at one of the chain's Chicago
	facilities. "We trusted them with our loved ones, and they betrayed that trust
	for profit." Sunset Years CEO Robert Thompson issued a statement claiming the
	issues were isolated incidents and not reflective of the company's overall standards.
	However, multiple state attorneys general have announced plans to pursue legal
	action against the chain
	- text: Global Coffee Prices Surge Amid Brazilian Drought Coffee futures hit a five-year
	high today as severe drought continues to ravage Brazil's coffee-growing regions.
	Experts warn consumers may see significant price increases in coming months.
	- text: 'BREAKING: Hospital CEO Arrested in Kickback Scheme Federal agents arrested
	Mercy General Hospital CEO John Smith today on charges of accepting kickbacks
	for preferential treatment of patients. Prosecutors allege Smith pocketed over
	$2 million, compromising patient care. Smith''s lawyer denies all accusations.'
	inference: true
	model-index:
	- name: SetFit with Alibaba-NLP/gte-base-en-v1.5
	results:
	- task:
	type: text-classification
	name: Text Classification
	dataset:
	name: Unknown
	type: unknown
	split: test
	metrics:
	- type: accuracy
	value: 0.8181818181818182
	name: Accuracy
	---

	# SetFit with Alibaba-NLP/gte-base-en-v1.5

	This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [Alibaba-NLP/gte-base-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) as the Sentence Transformer embedding model. A [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance is used for classification.

	The model has been trained using an efficient few-shot learning technique that involves:

	1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
	2. Training a classification head with features from the fine-tuned Sentence Transformer.

	## Model Details

	### Model Description
	- Model Type: SetFit
	- Sentence Transformer body: [Alibaba-NLP/gte-base-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5)
	- Classification head: a [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance
	- Maximum Sequence Length: 8192 tokens
	- Number of Classes: 2 classes
	<!-- - Training Dataset: [Unknown](https://huggingface.co/datasets/unknown) -->
	<!-- - Language: Unknown -->
	<!-- - License: Unknown -->

	### Model Sources

	- Repository: [SetFit on GitHub](https://github.com/huggingface/setfit)
	- Paper: [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
	- Blogpost: [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)

	### Model Labels
	\| Label \| Examples \|
	\|:------\|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| 1 \| <ul><li>'Lucknow: Deputy CM Brajesh Pathak recommends dismissal of 17 govt doctors for absenteeism LUCKNOW: State govt has recommended the dismissal of 17 medical officers after they were found absent from duty for several months. In addition to this, disciplinary action has been ordered against three medical officers.The order was issued by deputy CM Brajesh Pathak who also holds the charge of health and medical education departments, said a govt spokesman on Thursday. In his order, Pathak stated: "No doctor or health worker who is negligent in medical services will be forgiven." tnn \'Committed to high-level health services\'Strict action will be taken against them. The state is committed to providing high-level health services to the people and no laxity on the count will be tolerated," Pathak stated. Three doctors who will face disciplinary action are Dr Mukul Mishra, orthopedic specialist of District Hospital, Jhansi; Dr Madhavi Singh, ophthalmologist posted at Community Health Centre, Fatehpur, Barabanki and Dr Pramod Kumar Sharma under Chief Medical Officer, Bareilly.'</li><li>"Kerala model therapy: Govt gives 56 absentee doctors 'show-cause pill' Thiruvananthapuram: The state health and family welfare department has issued show-cause notice to 56 doctors who have been on unauthorised absence in various medical colleges and pharmacy colleges in Kerala. In the notice issued by Rajan Khobragade, additional chief secretary, health and family welfare department, the doctors have been directed to report for duty before the ACS at the secretariat within 15 days."</li><li>'42% of Nigerian Doctors, Nurse Demand Bribes Before Attending to Patients - NBS Reports The National Bureau of Statistics (NBS) recently published a report titled "NBS Corruption in Nigeria: Patterns and Trend" for 2023, revealing concerning statistics about corruption in the healthcare sector. According to the report, two-thirds of Nigerian doctors, nurses, and midwives demand bribes from patients before providing treatment. Additionally, 42 percent of these health workers accept bribes to expedite procedures, while 15 percent take bribes to ensure the completion of medical procedures. It, however, added that 11 per cent were paid bribes as a "sign of appreciation," which still reflects the purpose of gratification for the healthcare service they received. "As for doctors, nurses and midwives, 11 per cent of bribes were paid as a sign of appreciation, possibly reflecting gratitude for the care received," it stated. The report comes as Nigerians have continued to raise concerns over poor quality health services in the country. With these concerns, a shortage of health workers continues to plague the health system even as practitioners travel abroad to seek better welfare with the "japa syndrome." The NBS report, in collaboration with the United Nations Office on Drugs and Crimes (UNODC), also revealed how Nigerian public officials received nothing less than N721 billion as bribes in 2023'</li></ul> \|
	\| 0 \| <ul><li>'Malta\'s former prime minister charged with corruption over hospital scandal Malta\'s former prime minister Joseph Muscat has been charged with corruption in a hospital privatisation scandal that was once investigated by the murdered investigative journalist Daphne Caruana Galizia. Muscat has been charged with accepting bribes, corruption in public office and money laundering, according to documents seen by AFP. He has described the allegations as "fantasies and lies" and said he was the victim of a political vendetta. Chris Fearne, Malta\'s deputy prime minister, who is tipped to become Malta\'s next European commissioner, and the country\'s former finance minister Edward Scicluna, who is now the governor of Malta\'s central bank, were charged with fraud, misappropriation and fraudulent gain.'</li><li>"US Supreme Court gives pharma companies a chance to thwart terrorism-funding lawsuit 21 pharmaceutical and medical equipment companies, including AstraZeneca, Pfizer, GE Healthcare USA, Johnson & Johnson, and F. Hoffmann-La Roche, are accused of illegally helping to fund terrorism in Iraq by providing corrupt payments to the Hezbollah-sponsored militia group Jaysh al-Mahdi to obtain medical supply contracts from Iraq's health ministry. The lawsuit seeks unspecified damages under the Anti-Terrorism Act."</li><li>'Health Ministry Official Arrested in Procurement Scandal JAKARTA - Indonesian authorities have arrested a high-ranking Health Ministry official on suspicion of corruption in medical equipment procurement. Agus Sutiyo, 52, Director of Medical Supplies, is accused of accepting bribes totaling $1.2 million from suppliers in exchange for awarding inflated contracts. The Corruption Eradication Commission (KPK) alleges that Sutiyo manipulated tender processes, favoring companies that offered kickbacks. The scheme reportedly cost the government an estimated $10 million in overpayments. KPK spokesperson Febri Diansyah stated, "This case undermines public trust and diverts crucial resources from healthcare services." Sutiyo faces up to 20 years in prison if convicted.'</li></ul> \|

	## Evaluation

	### Metrics
	\| Label \| Accuracy \|
	\|:--------\|:---------\|
	\| all \| 0.8182 \|

	## Uses

	### Direct Use for Inference

	First install the SetFit library:

	```bash
	pip install setfit
	```

	Then you can load this model and run inference.

	```python
	from setfit import SetFitModel

	# Download from the 🤗 Hub
	model = SetFitModel.from_pretrained("twright8/news_cats")
	# Run inference
	preds = model("Global Coffee Prices Surge Amid Brazilian Drought Coffee futures hit a five-year high today as severe drought continues to ravage Brazil's coffee-growing regions. Experts warn consumers may see significant price increases in coming months.")
	```

	<!--
	### Downstream Use

	List how someone could finetune this model on their own dataset.
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Set Metrics
	\| Training set \| Min \| Median \| Max \|
	\|:-------------\|:----\|:---------\|:----\|
	\| Word count \| 55 \| 153.8462 \| 290 \|

	\| Label \| Training Sample Count \|
	\|:------\|:----------------------\|
	\| 0 \| 13 \|
	\| 1 \| 13 \|

	### Training Hyperparameters
	- batch_size: (8, 1)
	- num_epochs: (3, 17)
	- max_steps: -1
	- sampling_strategy: oversampling
	- body_learning_rate: (9.629116538858926e-05, 2.651259436793277e-05)
	- head_learning_rate: 0.02145586669240117
	- loss: CoSENTLoss
	- distance_metric: cosine_distance
	- margin: 0.25
	- end_to_end: True
	- use_amp: True
	- warmup_proportion: 0.1
	- max_length: 512
	- seed: 42
	- eval_max_steps: -1
	- load_best_model_at_end: True

	### Training Results
	\| Epoch \| Step \| Training Loss \| Validation Loss \|
	\|:----------:\|:------:\|:-------------:\|:---------------:\|
	\| 0.0217 \| 1 \| 1.8133 \| - \|
	\| 0.4348 \| 20 \| 0.0054 \| 1.6363 \|
	\| 0.8696 \| 40 \| 0.0 \| 4.9011 \|
	\| 1.3043 \| 60 \| 0.0 \| 7.0885 \|
	\| 1.7391 \| 80 \| 0.0 \| 6.2756 \|
	\| 2.1739 \| 100 \| 0.0 \| 6.2417 \|
	\| 2.6087 \| 120 \| 0.0 \| 6.4769 \|

	* The bold row denotes the saved checkpoint.
	### Framework Versions
	- Python: 3.10.13
	- SetFit: 1.0.3
	- Sentence Transformers: 3.0.1
	- Transformers: 4.39.0
	- PyTorch: 2.3.0+cu121
	- Datasets: 2.20.0
	- Tokenizers: 0.15.2

	## Citation

	### BibTeX
	```bibtex
	@article{https://doi.org/10.48550/arxiv.2209.11055,
	doi = {10.48550/ARXIV.2209.11055},
	url = {https://arxiv.org/abs/2209.11055},
	author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
	keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
	title = {Efficient Few-Shot Learning Without Prompts},
	publisher = {arXiv},
	year = {2022},
	copyright = {Creative Commons Attribution 4.0 International}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->