zhayunduo
/

roberta-base-stocktwits-finetuned

Text Classification

Inference Endpoints

Model card Files Files and versions Community

roberta-base-stocktwits-finetuned / README.md

zhayunduo's picture

Update README.md

e2e2a0a over 1 year ago

|

history blame contribute delete

2.72 kB

	---
	license: apache-2.0
	pipeline_tag: text-classification
	language:
	- en
	metrics:
	- accuracy
	library_name: transformers
	tags:
	- finance
	---

	## Sentiment Inferencing model for stock related commments

	#### A project by NUS ISS students Frank Cao, Gerong Zhang, Jiaqi Yao, Sikai Ni, Yunduo Zhang

	<br />

	### Description

	This model is fine tuned with roberta-base model on 3200000 comments from stocktwits, with the user labeled tags 'Bullish' or 'Bearish'

	try something that the individual investors may say on the investment forum on the inference API, for example, try 'red' and 'green'.

	[code on github](https://github.com/Gitrexx/PLPPM_Sentiment_Analysis_via_Stocktwits/tree/main/SentimentEngine)

	<br />

	### Training information
	- batch size 32
	- learning rate 2e-5

	\| \| Train loss \| Validation loss \| Validation accuracy \|
	\| ----------- \| ----------- \| ---------------- \| ------------------- \|
	\| epoch1 \| 0.3495 \| 0.2956 \| 0.8679 \|
	\| epoch2 \| 0.2717 \| 0.2235 \| 0.9021 \|
	\| epoch3 \| 0.2360 \| 0.1875 \| 0.9210 \|
	\| epoch4 \| 0.2106 \| 0.1603 \| 0.9343 \|

	<br />

	# How to use
	```python
	from transformers import RobertaForSequenceClassification, RobertaTokenizer
	from transformers import pipeline
	import pandas as pd
	import emoji

	# the model was trained upon below preprocessing
	def process_text(texts):

	# remove URLs
	texts = re.sub(r'https?://\S+', "", texts)
	texts = re.sub(r'www.\S+', "", texts)
	# remove '
	texts = texts.replace(''', "'")
	# remove symbol names
	texts = re.sub(r'(\#)(\S+)', r'hashtag_\2', texts)
	texts = re.sub(r'(\$)([A-Za-z]+)', r'cashtag_\2', texts)
	# remove usernames
	texts = re.sub(r'(\@)(\S+)', r'mention_\2', texts)
	# demojize
	texts = emoji.demojize(texts, delimiters=("", " "))

	return texts.strip()

	tokenizer_loaded = RobertaTokenizer.from_pretrained('zhayunduo/roberta-base-stocktwits-finetuned')
	model_loaded = RobertaForSequenceClassification.from_pretrained('zhayunduo/roberta-base-stocktwits-finetuned')

	nlp = pipeline("text-classification", model=model_loaded, tokenizer=tokenizer_loaded)

	sentences = pd.Series(['just buy','just sell it',
	'entity rocket to the sky!',
	'go down','even though it is going up, I still think it will not keep this trend in the near future'])
	# sentences = list(sentences.apply(process_text)) # if input text contains https, @ or # or $ symbols, better apply preprocess to get a more accurate result
	sentences = list(sentences)
	results = nlp(sentences)
	print(results) # 2 labels, label 0 is bearish, label 1 is bullish

	```