anzorq
/

w2v-bert-2.0-kbd-v2

@@ -1,199 +1,126 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+license: mit
+language:
+- kbd
+datasets:
+- anzorq/kbd_speech
+- anzorq/sixuxar_yijiri_mak7
+metrics:
+- wer
+pipeline_tag: automatic-speech-recognition
 ---
+# Circassian (Kabardian) ASR Model
+This is a fine-tuned model for Automatic Speech Recognition (ASR) in `kbd`, based on the `facebook/w2v-bert-2.0` model.
+The model was trained on a combination of the `anzorq/kbd_speech` (filtered on `country=russia`) and `anzorq/sixuxar_yijiri_mak7` datasets.
 ## Model Details
+- **Base Model**: facebook/w2v-bert-2.0
+- **Language**: Kabardian
+- **Task**: Automatic Speech Recognition (ASR)
+- **Datasets**: anzorq/kbd_speech, anzorq/sixuxar_yijiri_mak7
+- **Training Steps**: 5000
+## Training
+The model was fine-tuned using the following training arguments:
+```python
+TrainingArguments(
+   output_dir='output',
+   group_by_length=True,
+   per_device_train_batch_size=8,
+   gradient_accumulation_steps=2,
+   evaluation_strategy="steps",
+   num_train_epochs=10,
+   gradient_checkpointing=True,
+   fp16=True,
+   save_steps=1000,
+   eval_steps=500,
+   logging_steps=300,
+   learning_rate=5e-5,
+   warmup_steps=500,
+   save_total_limit=2,
+   push_to_hub=True,
+   report_to="wandb"
+)
+```
+## Performance
+The model's performance during training:
+| Step | Training Loss | Validation Loss | Wer      |
+|------|---------------|-----------------|----------|
+| 500  | 2.761100      | 0.572304        | 0.830552 |
+| 1000 | 0.325700      | 0.352516        | 0.678261 |
+| 1500 | 0.247000      | 0.271146        | 0.377438 |
+| 2000 | 0.179300      | 0.235156        | 0.319859 |
+| 2500 | 0.176100      | 0.229383        | 0.293537 |
+| 3000 | 0.171600      | 0.208033        | 0.310458 |
+| 3500 | 0.133200      | 0.199517        | 0.289542 |
+| 4000 | 0.117900      | 0.208304        | 0.258989 | <-- this model
+| 4500 | 0.145400      | 0.184942        | 0.285311 |
+| 5000 | 0.129600      | 0.195167        | 0.372033 |
+| 5500 | 0.122600      | 0.203584        | 0.386369 |
+| 6000 | 0.196800      | 0.270521        | 0.687662 |
+## Note
+Prior to training, specific character replacements were performed to reduce the tokenizer vocabulary by replacing digraphs with single characters. The replacements are as follows:
+```
+гъ -> ɣ
+дж -> j
+дз -> ӡ
+жь -> ʐ
+кӏ -> қ
+къ -> q
+кхъ -> qҳ
+лъ -> ɬ
+лӏ -> ԯ
+пӏ -> ԥ
+тӏ -> ҭ
+фӏ -> ჶ
+хь -> h
+хъ -> ҳ
+цӏ -> ҵ
+щӏ -> ɕ
+я  -> йа
+```
+After obtaining the transcription, reversed replacements can be applied to restore the original characters.
+## Inference
+```python
+import torchaudio
+from transformers import pipeline
+pipe = pipeline(model="anzorq/w2v-bert-2.0-kbd-v2", device=0)
+reversed_replacements = {
+    'ɣ': 'гъ', 'j': 'дж', 'ӡ': 'дз', 'ʐ': 'жь',
+    'қ': 'кӏ', 'q': 'къ', 'qҳ': 'кхъ', 'ɬ': 'лъ',
+    'ԯ': 'лӏ', 'ԥ': 'пӏ', 'ҭ': 'тӏ', 'ჶ': 'фӏ',
+    'h': 'хь', 'ҳ': 'хъ', 'ҵ': 'цӏ', 'ɕ': 'щӏ',
+    'йа': 'я'
+}
+def reverse_replace_symbols(text):
+    for orig, replacement in reversed_replacements.items():
+        text = text.replace(orig, replacement)
+    return text
+def transcribe_speech(audio_path):
+    waveform, sample_rate = torchaudio.load(audio_path)
+    waveform = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)(waveform)
+    torchaudio.save("temp.wav", waveform, 16000)
+    transcription = pipe("temp.wav", chunk_length_s=10)['text']
+    transcription = reverse_replace_symbols(transcription)
+    return transcription
+audio_path = "audio.wav"
+transcription = transcribe_speech(audio_path)
+print(f"Transcription: {transcription}")
+```