Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

StackOBERTflow-comments-small

StackOBERTflow is a RoBERTa model trained on StackOverflow comments. A Byte-level BPE tokenizer with dropout was used (using the tokenizers package).

The model is small, i.e. has only 6-layers and the maximum sequence length was restricted to 256 tokens. The model was trained for 6 epochs on several GBs of comments from the StackOverflow corpus.

Quick start: masked language modeling prediction

from transformers import pipeline
from pprint import pprint

COMMENT = "You really should not do it this way, I would use <mask> instead."

fill_mask = pipeline(
    "fill-mask",
    model="giganticode/StackOBERTflow-comments-small-v1",
    tokenizer="giganticode/StackOBERTflow-comments-small-v1"
)

pprint(fill_mask(COMMENT))
# [{'score': 0.019997311756014824,
#   'sequence': '<s> You really should not do it this way, I would use jQuery instead.</s>',
#   'token': 1738},
#  {'score': 0.01693696901202202,
#   'sequence': '<s> You really should not do it this way, I would use arrays instead.</s>',
#   'token': 2844},
#  {'score': 0.013411642983555794,
#   'sequence': '<s> You really should not do it this way, I would use CSS instead.</s>',
#   'token': 2254},
#  {'score': 0.013224546797573566,
#   'sequence': '<s> You really should not do it this way, I would use it instead.</s>',
#   'token': 300},
#  {'score': 0.011984303593635559,
#   'sequence': '<s> You really should not do it this way, I would use classes instead.</s>',
#   'token': 1779}]
Downloads last month
40
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.