finetuned_Helsinki-NLP-en-mr
This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-mr on the None dataset.
Model description
Helsinki-NLP/opus-en-mr is used for this model. We have fine tuned the model with marathi transcript data set so that the results are as correct as possible.
Intended uses & limitations
More information needed
Training and evaluation data
We have used the data collected from various resources like Kaggle, huggingface, github. The columns have english translated to marathi data(in marathi font) which is further converted to romanized english script for meeting the required constraints of Hackathon.
Training procedure
1.Data Collection & Preprocessing Gathered parallel English-Marathi datasets. Clean and normalize text (remove noise, punctuation, and special characters). Tokenize sentences.
2.Model Selection Use Transformer-based model Helsinki-NLP/opus-en-mr (pretrained for low-resource languages). Fine-tune on the collected dataset using Hugging Face’s transformers library.
3.Training Configuration Loss Function: Cross-entropy with label smoothing. Optimizer: AdamW with learning rate scheduling. Batch Size: 64. Evaluation Metric: BLEU Score for translation quality.
4.Training & Fine-tuning Trained on TPUs/GPUs (Google Colab). Use mixed-precision training for faster convergence. Augment data with back-translation for better generalization.
5.Evaluation & Inference Validate on unseen test data. Perform human evaluation to refine translation quality.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 50
- training_steps: 10000
Training results
The following losses were obtained for English to Marathi Language Translation Model. Helsinki-NLP 0.5174
Framework versions
- Transformers 4.49.0
- Pytorch 2.5.1+cu124
- Datasets 3.3.1
- Tokenizers 0.21.0
- Downloads last month
- 25
Model tree for SPriyanshi/finetuned_Helsinki-NLP-en-mr
Base model
Helsinki-NLP/opus-mt-en-mr