Jjzzzz's picture
Update README.md
222515f verified
metadata
license: apache-2.0
base_model: distilgpt2
tags:
  - generated_from_trainer
model-index:
  - name: distilgpt2-finetuned-stories
    results: []
language:
  - en
metrics:
  - perplexity
pipeline_tag: text-generation

distilgpt2-finetuned-stories

This model is a fine-tuned version of distilgpt2 on the demelin/understanding_fables dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3089

Autoregressive and Prefix Language Modelling

Language Modelling, especially text generation works on the principle of generating the next token based on its previous antecedents.

This is what Autoregressive modelling are based on, it predicts the next token i.e. word here on the basis of token preceding it. Here, we take P(wi|wi-1), where wi is next word and wi-1 is token preceeding it, and P is the probbaility pf generating wi wrt wi-1

But for Prefix Language modelling, we consider input into function and consider it in generation of our next word, i.e. the input is used as a context for generation of next tokens, calculating the conditional probability of next work wrt context. P(w|x), where w is next token and x is context and P is probability of getting w wrt x context.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 20 3.4065
No log 2.0 40 3.3288
No log 3.0 60 3.3089

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0