rv2307's picture
Update README.md
5d1ae1a verified
metadata
license: apache-2.0
language:
  - en
metrics:
  - rouge
  - bleu
library_name: transformers

Model Card for Model ID

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

  • Developed by: விபின்
  • Model type: T5-small
  • Language(s) (NLP): English
  • License: Apache 2.0 license
  • Finetuned from model [optional]: T5-small model

Uses

This model aims to respond with extractive and abstractive keyphrases for the given content. Kindly use "find keyphrase: " as the task prefix prompt to get the desired outputs.

Bias, Risks, and Limitations

This model response is based on the inputs given to it. So if any Harmful sentences given to this model, it will respond according to that.

How to Get Started with the Model

from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

model_dir = "rv2307/keyphrase-abstraction-t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_dir)
model = T5ForConditionalGeneration.from_pretrained(model_dir, torch_dtype=torch.bfloat16)
device = "cuda"
model.to(device)

def generate(text):
    
    text = "find keyphrase: "  + text
    inputs = tokenizer(text, max_length=512, padding=True, truncation=True, return_tensors='pt')
    inputs = {k:v.to(model.device) for k,v in inputs.items()}

    
    with torch.no_grad():
        outputs = model.generate(
            inputs['input_ids'],
            attention_mask=inputs['attention_mask'],
            max_length=100,
            use_cache=True
        )

    output_list = tokenizer.decode(outputs[0],skip_special_tokens=True)
        
    return output_list

content = "Use of BICs by businesses has been recommended by the Task Force on Nature-related Financial Disclosures[2] and the first provider of BICs for sale is Botanic Gardens Conservation International (BGCI). The credits are generated by BGCI's international member organisations by rebuilding the populations of tree species at high risk of extinction under the IUCN Red List methodology.[3]"
outputs = generate(content)
print(outputs)
"""
[
  "BICs for businesses",
  "Task Force on Naturerelated Financial Disclosures",
  "Botanic Gardens Conservation International (BGCI)",
  "Rebuilding tree species at high risk",
  "IUCN Red List methodology",
  "Credits generated by BGCI",
  "International member organisations"
]
"""

Training Details

Training Data

Mostly used open source datasets for these tasks, which are already available on the huggingface.

Training Procedure

This model has been fine tuned for 6 epochs with 40k datasets collected from the internet.

Results

Epoch	Training Loss	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1	0.105800	0.087497	43.840900	19.029900	40.303200	40.320300	16.306200
2	0.097600	0.081029	46.335000	21.246800	42.377400	42.387500	16.404900
3	0.091800	0.077546	47.721200	22.467200	43.622400	43.632000	16.308200
4	0.087600	0.075441	48.633700	23.351300	44.493800	44.504300	16.359000
5	0.088200	0.074088	48.977500	23.747000	44.804900	44.813200	16.300500
6	0.084900	0.073381	49.347300	24.029500	45.097100	45.108300	16.332600