BART (large-sized model), fine-tuned on Amazon Reviews (English Language)

The BART model was pre-trained on the CNN-DailyMail dataset, but it was re-trained on the Amazon's Website Purchase that were provided in English Language. The purpose of doing this was to build a pipeline that is designed to summarize user reviews on Amazon.com.

Model description

According to huggingface, BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.

Datasets

Link: Amazon Reviews Corpus

Intended uses & limitations

This model is intended to be used for summarizing user reviews on websites.

How to use

Here is how to use this model with the pipeline API:

from transformers import pipeline
summarizer = pipeline("summarization", model="mabrouk/amazon-review-summarizer-bart")
review = """ I really like this book. It takes a step-by-step approach to introduce the reader to the IBM Q Experience, to the basics underlying quantum computing, and to the reality of the noise involved in the current machines. This introduction is technical and shows the user how to use the IBM system either directly through the GUI on their website or by running Python code on one's own machine. The text provides examples of small exercises to try and stimulates ideas of new things to try. The IBM Q Exp Qiskit software modules are identified and introduced - Terra, Aer, Ignis, and Aqua, as well as the backends that one can choose to do the computing. The book ends with two great chapters on quantum algorithms.
"""
print(summarizer(review, min_length = 60))
>>> [{'summary': 'This book is a great resource, and a great read, to learn about quantum and start writing your first programs, or to brush up on your programming skills. I loved that there is a quiz at the end of every chapter so you can check and see how...'}]

Reference

Pre-traind Model: facebook/bart-large-cnn Re-trained Dataset: Amazon Reviews Corpus