serdarakyol
/

interpress-turkish-news-classification

Text Classification

Inference Endpoints

Model card Files Files and versions Community

serdarakyol commited on Mar 3, 2021

Commit

9d02852

·

1 Parent(s): 2017989

Update README.md

Files changed (1) hide show

README.md +0 -36

README.md CHANGED Viewed

@@ -20,42 +20,7 @@ tokenizer = AutoTokenizer.from_pretrained("serdarakyol/interpress-turkish-news-c
 model = AutoModelForSequenceClassification.from_pretrained("serdarakyol/interpress-turkish-news-classification")
 ```
-## NOTE: Please remember, for predict on BERT model, you don't actually need to preprocessing but the dataset was real world data. That why I needed to do some preprocessing. If you have normal news from any news web page, you can just copy the news and past. Then delete the first comment on ***prediction*** function. That's it.
-```sh
-# PREPROCESSING
-import re
-my_punc = r"#$%&()*+-/:;<=>@[\]^_{|}~"
-def clean_url(content):
-    reg_url=r'[\S]+\.(net|com|org|info|edu|gov|uk|de|ca|jp|fr|au|us|ru|ch|it|nel|se|no|es|mil)[\S]*\s?'
-    pattern_url = re.compile(reg_url)
-    content = pattern_url.sub('',content)
-    return content
-def clean_email(content):
-    reg_email='\S*@\S*\s?'
-    pattern_email = re.compile(reg_email)
-    content = pattern_email.sub('',content)
-    return content
-def clean_punctuation(content):
-    content = content.translate(content.maketrans("", "", my_punc))
-    return content
-def clean_data(text):
-    text = clean_url(text)
-    text = clean_email(text)
-    text = clean_punctuation(text)
-    filtered_sentence = []
-    for word in text.split(" "):
-        if len(word) > 2:
-            filtered_sentence.append(word)
-    text = ' '.join(filtered_sentence)
-    return text
-```
 ```sh
 import torch
 import numpy as np
@@ -71,7 +36,6 @@ else:
 ```
 ```sh
 def prediction(news):
-    news=clean_data(news)
     news=[news]
     indices=tokenizer.batch_encode_plus(
     news,

 model = AutoModelForSequenceClassification.from_pretrained("serdarakyol/interpress-turkish-news-classification")
 ```
 ```sh
 import torch
 import numpy as np
 ```
 ```sh
 def prediction(news):
     news=[news]
     indices=tokenizer.batch_encode_plus(
     news,