kornosk
/

bert-political-election2020-twitter-mlm

masked-token-prediction

Inference Endpoints

Model card Files Files and versions Community

bert-political-election2020-twitter-mlm / README.md

kornosk's picture

Update README.md

6740229 over 2 years ago

|

history blame contribute delete

2.54 kB

	---
	language: "en"
	tags:
	- twitter
	- masked-token-prediction
	- election2020
	- politics
	license: "gpl-3.0"
	---

	# Pre-trained BERT on Twitter US Political Election 2020

	Pre-trained weights for [Knowledge Enhance Masked Language Model for Stance Detection](https://www.aclweb.org/anthology/2021.naacl-main.376), NAACL 2021.

	We use the initialized weights from BERT-base (uncased) or `bert-base-uncased`.

	# Training Data

	This model is pre-trained on over 5 million English tweets about the 2020 US Presidential Election.

	# Training Objective

	This model is initialized with BERT-base and trained with normal MLM objective.

	# Usage

	This pre-trained language model can be fine-tunned to any downstream task (e.g. classification).

	Please see the [official repository](https://github.com/GU-DataLab/stance-detection-KE-MLM) for more detail.

	```python
	from transformers import BertTokenizer, BertForMaskedLM, pipeline
	import torch

	# Choose GPU if available
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# Select mode path here
	pretrained_LM_path = "kornosk/bert-political-election2020-twitter-mlm"

	# Load model
	tokenizer = BertTokenizer.from_pretrained(pretrained_LM_path)
	model = BertForMaskedLM.from_pretrained(pretrained_LM_path)

	# Fill mask
	example = "Trump is the [MASK] of USA"
	fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
	# Use following line instead of the above one does not work.
	# Huggingface have been updated, newer version accepts a string of model name instead.
	fill_mask = pipeline('fill-mask', model=pretrained_LM_path, tokenizer=tokenizer)

	outputs = fill_mask(example)
	print(outputs)

	# See embeddings
	inputs = tokenizer(example, return_tensors="pt")
	outputs = model(**inputs)
	print(outputs)

	# OR you can use this model to train on your downstream task!
	# Please consider citing our paper if you feel this is useful :)
	```

	# Reference

	- [Knowledge Enhance Masked Language Model for Stance Detection](https://www.aclweb.org/anthology/2021.naacl-main.376), NAACL 2021.

	# Citation
	```bibtex
	@inproceedings{kawintiranon2021knowledge,
	title={Knowledge Enhanced Masked Language Model for Stance Detection},
	author={Kawintiranon, Kornraphop and Singh, Lisa},
	booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
	year={2021},
	publisher={Association for Computational Linguistics},
	url={https://www.aclweb.org/anthology/2021.naacl-main.376}
	}
	```