File size: 1,410 Bytes
a9f81e5
 
 
 
 
 
 
 
 
 
 
7ab934c
c113a01
7ab934c
 
c113a01
7ab934c
 
 
c113a01
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7ab934c
c113a01
7ab934c
 
 
 
 
 
 
 
c113a01
7ab934c
 
 
 
 
 
 
c113a01
7ab934c
 
c113a01
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
license: afl-3.0
datasets:
- WillHeld/hinglish_top
language:
- en
- hi
metrics:
- accuracy
library_name: transformers
pipeline_tag: fill-mask
---

### SRDberta

This is a BERT model trained for Masked Language Modeling for Hinglish Data.

Hinglish is a term used to describe the hybrid language spoken in India, which combines elements of Hindi and English. It is commonly used in informal conversations and in media such as Bollywood films

### Dataset
Hinglish-Top [Dataset](https://huggingface.co/datasets/WillHeld/hinglish_top) columns
- en_query
- cs_query
- en_parse 
- cs_parse 
- domain 

### Training
|Epochs|Train Loss|
|:------:|:----------:|
|4th   |   0.251 |

The model was trained only for 4 epochs due to the GPU limitations. The model will give far better results with 10 epochs

### Inference 
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("SRDdev/SRDBerta")

model = AutoModelForMaskedLM.from_pretrained("SRDdev/SRDBerta")

fill = pipeline('fill-mask', model='SRDberta', tokenizer='SRDberta')
```
```python
fill_mask = fill.tokenizer.mask_token
fill(f'Aap {fill_mask} ho?')
```

### Citation
Author: @[SRDdev](https://huggingface.co/SRDdev)
```
Name : Shreyas Dixit
framework : Pytorch
Year: Jan 2023
Pipeline : fill-mask
Github : https://github.com/SRDdev
LinkedIn : https://www.linkedin.com/in/srddev/ 
```