uitnlp
/

visobert

Vietnamese Pre-trained Model

Sentiment Analysis

Hate Speech Detection

Emotionn Recognition

Inference Endpoints

Model card Files Files and versions Community

visobert / README.md

nqnam02's picture

Update README.md

0da8032 about 1 year ago

|

1.81 kB

	---
	pipeline_tag: fill-mask
	widget:
	- text: "đậu xanh rau <mask>"
	---
	# <a name="introduction"></a> ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing (EMNLP 2023 - Main)
	Disclaimer: The paper contains actual comments on social networks that might be construed as abusive, offensive, or obscene.

	ViSoBERT is the state-of-the-art language model for Vietnamese social media tasks:

	- ViSoBERT is the first monolingual MLM (XLM-R architecture) from scratch specifically for Vietnamese social media text.
	- ViSoBERT outperforms previous monolingual, multilingual, and multilingual social media approaches, obtaining new state-of-the-art performances on four downstream Vietnamese social media tasks.

	The general architecture and experimental results of ViSoBERT can be found in our [paper](https://openreview.net/forum?id=gqkg54QNDY):

	@misc{nguyen2023visobert,
	title={ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing},
	author={Quoc-Nam Nguyen and Thang Chau Phan and Duc-Vu Nguyen and Kiet Van Nguyen},
	year={2023},
	eprint={2310.11166},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}


	Please CITE our paper when ViSoBERT is used to help produce published results or is incorporated into other software.

	Installation

	Install `transformers` with pip: `pip install transformers` and `SentencePiece` with pip: `pip install SentencePiece`

	Example usage
	```python
	from transformers import AutoModel,AutoTokenizer
	import torch

	model= AutoModel.from_pretrained('uitnlp/visobert')
	tokenizer = AutoTokenizer.from_pretrained('uitnlp/visobert')

	encoding = tokenizer('dau xanh rau ma',return_tensors='pt')

	with torch.no_grad():
	output = model(**encoding)
	```