File size: 1,820 Bytes
eba49c5 11c8d12 eba49c5 11c8d12 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
license: mit
---
# Bengali to English Word Aligner
Finetuned Model for **Bengali to English Word** which was build on `bert-base-multilingual-cased`
## Quick Start
Initialize to use it in your project
```python
tokenizer = AutoTokenizer.from_pretrained("musfiqdehan/bengali-english-word-aligner")
model = AutoModel.from_pretrained("musfiqdehan/bengali-english-word-aligner")
```
## Bengali-English Word Alignment
[](https://colab.research.google.com/drive/1x5wUXS7vdWNeROkJS_B_lUwKTJZGaB7v?usp=sharing)
[](https://www.kaggle.com/musfiqdehan/bengali-english-alignment-demo)
Install Dependencies
```
!pip install -U data-preprocessors
!pip install -U bangla-postagger
```
Import Necessary Libraries
```python
from pprint import pprint
from data_preprocessors import text_preprocessor as tp
from bangla_postagger import (en_postaggers as ep,
bn_en_mapper as bem,
translators as trans)
```
Testing Word Mapping and Alignment
```python
src = "আমি ভাত খাই না, রুটি খাই।"
tgt = "I do not eat rice, I eat bread."
# Give one space before and after punctuation
# for easy tokenization
src = tp.space_punc(src)
tgt = tp.space_punc(tgt)
print("Word Mapping:")
mapping = bem.get_word_mapping(
source=src, target=tgt, model_path="musfiqdehan/bengali-english-word-aligner")
pprint(mapping)
```
Output
```
Word Mapping:
['bn:(আমি) -> en:(I)',
'bn:(ভাত) -> en:(rice)',
'bn:(খাই) -> en:(do)',
'bn:(খাই) -> en:(eat)',
'bn:(না) -> en:(not)',
'bn:(,) -> en:(,)',
'bn:(রুটি) -> en:(bread)',
'bn:(খাই) -> en:(eat)',
'bn:(।) -> en:(.)']
``` |