diarizers-community
/

speaker-segmentation-fine-tuned-callhome-eng

speaker-diarization

speaker-segmentation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

speaker-segmentation-fine-tuned-callhome-eng / README.md

kamilakesbi's picture

Update README.md

0c244ec verified 6 months ago

|

history blame contribute delete

No virus

3.69 kB

	---
	license: mit
	base_model: pyannote/segmentation-3.0
	tags:
	- speaker-diarization
	- speaker-segmentation
	- generated_from_trainer
	datasets:
	- diarizers-community/callhome
	model-index:
	- name: speaker-segmentation-fine-tuned-callhome-eng
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# speaker-segmentation-fine-tuned-callhome-eng

	This model is a fine-tuned version of [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) on the diarizers-community/callhome eng dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4602
	- Der: 0.1828
	- False Alarm: 0.0584
	- Missed Detection: 0.0717
	- Confusion: 0.0528

	## Model description

	This segmentation model has been trained on English data (Callhome) using [diarizers](https://github.com/huggingface/diarizers/tree/main).
	It can be loaded with two lines of code:

	```python
	from diarizers import SegmentationModel

	segmentation_model = SegmentationModel().from_pretrained('diarizers-community/speaker-segmentation-fine-tuned-callhome-eng')
	```

	To use it within a pyannote speaker diarization pipeline, load the [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) pipeline, and convert the model to a pyannote compatible format:

	```python

	from pyannote.audio import Pipeline
	import torch

	device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")


	# load the pre-trained pyannote pipeline
	pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1")
	pipeline.to(device)

	# replace the segmentation model with your fine-tuned one
	model = segmentation_model.to_pyannote_model()
	pipeline._segmentation.model = model.to(device)
	```

	You can now use the pipeline on audio examples:

	```python
	# load dataset example
	dataset = load_dataset("diarizers-community/callhome", "jpn", split="data")
	sample = dataset[0]["audio"]

	# pre-process inputs
	sample["waveform"] = torch.from_numpy(sample.pop("array")[None, :]).to(device, dtype=model.dtype)
	sample["sample_rate"] = sample.pop("sampling_rate")

	# perform inference
	diarization = pipeline(sample)

	# dump the diarization output to disk using RTTM format
	with open("audio.rttm", "w") as rttm:
	diarization.write_rttm(rttm)
	```



	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- num_epochs: 5.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Der \| False Alarm \| Missed Detection \| Confusion \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:-----------:\|:----------------:\|:---------:\|
	\| 0.4123 \| 1.0 \| 362 \| 0.4801 \| 0.1930 \| 0.0627 \| 0.0741 \| 0.0563 \|
	\| 0.3906 \| 2.0 \| 724 \| 0.4558 \| 0.1836 \| 0.0589 \| 0.0727 \| 0.0519 \|
	\| 0.3753 \| 3.0 \| 1086 \| 0.4643 \| 0.1830 \| 0.0557 \| 0.0746 \| 0.0527 \|
	\| 0.3632 \| 4.0 \| 1448 \| 0.4566 \| 0.1821 \| 0.0564 \| 0.0728 \| 0.0529 \|
	\| 0.3475 \| 5.0 \| 1810 \| 0.4602 \| 0.1828 \| 0.0584 \| 0.0717 \| 0.0528 \|


	### Framework versions

	- Transformers 4.40.0
	- Pytorch 2.2.2+cu121
	- Datasets 2.18.0
	- Tokenizers 0.19.1