Text Summarization Model with Seq2Seq and LSTM
This model is a sequence-to-sequence (seq2seq) model for text summarization. It uses a bidirectional LSTM encoder and an LSTM decoder to generate summaries from input articles. The model was trained on a dataset with sequences of length up to 800 tokens.
Dataset
CNN-DailyMail News Text Summarization from kaggle
Model Architecture
Encoder
- Input Layer: Takes input sequences of length
max_len_article
. - Embedding Layer: Converts input sequences into dense vectors of size 100.
- Bidirectional LSTM Layer: Processes the embedded input, capturing dependencies in both forward and backward directions. Outputs hidden and cell states from both directions.
- State Concatenation: Combines forward and backward hidden and cell states to form the final encoder states.
Decoder
- Input Layer: Takes target sequences of variable length.
- Embedding Layer: Converts target sequences into dense vectors of size 100.
- LSTM Layer: Processes the embedded target sequences using an LSTM with the initial states set to the encoder states.
- Dense Layer: Applies a Dense layer with softmax activation to generate the probabilities for each word in the vocabulary.
Model Summary
Layer (type) | Output Shape | Param # | Connected to |
---|---|---|---|
input_1 (InputLayer) | [(None, 800)] | 0 | - |
embedding (Embedding) | (None, 800, 100) | 47,619,900 | input_1[0][0] |
bidirectional | [(None, 200), | 160,800 | embedding[0][0] |
(Bidirectional) | (None, 100), | ||
(None, 100), | |||
(None, 100), | |||
(None, 100)] | |||
input_2 (InputLayer) | [(None, None)] | 0 | - |
embedding_1 | (None, None, 100) | 15,515,800 | input_2[0][0] |
(Embedding) | |||
concatenate | (None, 200) | 0 | bidirectional[0][1] |
(Concatenate) | bidirectional[0][3] | ||
concatenate_1 | (None, 200) | 0 | bidirectional[0][2] |
(Concatenate) | bidirectional[0][4] | ||
lstm | [(None, None, 200), | 240,800 | embedding_1[0][0] |
(LSTM) | (None, 200), | concatenate[0][0] | |
(None, 200)] | concatenate_1[0][0] | ||
dense (Dense) | (None, None, 155158) | 31,186,758 | lstm[0][0] |
Total params: 94,724,060
Trainable params: 94,724,058
Non-trainable params: 0
Training
The model was trained on a dataset with sequences of length up to 800 tokens using the following configuration:
- Optimizer: Adam
- Loss Function: Categorical Crossentropy
- Metrics: Accuracy
Training Loss and Validation Loss
Epoch | Training Loss | Validation Loss | Time per Epoch (s) |
---|---|---|---|
1 | 3.9044 | 0.4543 | 3087 |
2 | 0.3429 | 0.0976 | 3091 |
3 | 0.1054 | 0.0427 | 3096 |
4 | 0.0490 | 0.0231 | 3099 |
5 | 0.0203 | 0.0148 | 3098 |
Test Loss
Test Loss |
---|
0.014802712015807629 |
Usage -- I will update this soon
To use this model, you can load it using the Hugging Face Transformers library:
from transformers import TFAutoModel
model = TFAutoModel.from_pretrained('your-model-name')
from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained('your-model-name')
model = TFAutoModelForSeq2SeqLM.from_pretrained('your-model-name')
article = "Your input text here."
inputs = tokenizer.encode("summarize: " + article, return_tensors="tf", max_length=800, truncation=True)
summary_ids = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(summary)