sumedh
/

lstm-seq2seq

Model card Files Files and versions Metrics Training metrics Community

sumedh commited on Jun 25, 2022

Commit

2dd7365

·

1 Parent(s): db44e0a

Update README.md

Files changed (1) hide show

README.md +24 -5

README.md CHANGED Viewed

@@ -1,22 +1,34 @@
 ---
 library_name: keras
 tags:
 - translation
 ---
-## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
@@ -26,6 +38,13 @@ The following hyperparameters were used during training:
 |----|-------------|-----|---|--------|-------|--------|------------------|
 |RMSprop|0.0010000000474974513|0.0|0.8999999761581421|0.0|1e-07|False|float32|
  ## Model Plot
 <details>

 ---
 library_name: keras
+license: apache-2.0
 tags:
+- seq2seq
 - translation
+language:
+- en
+- fr
 ---
+## Keras Implementation of Character-level recurrent sequence-to-sequence model
+This repo contains the model and the notebook [to this Keras example on Character-level recurrent sequence-to-sequence model](https://keras.io/examples/nlp/lstm_seq2seq/).
+Full credits to : [fchollet](https://twitter.com/fchollet)
+Model reproduced by : [Sumedh](https://huggingface.co/sumedh)
 ## Intended uses & limitations
+This model implements a basic character-level recurrent sequence-to-sequence network for translating short English sentences into short French sentences, character-by-character. Note that it is fairly unusual to do character-level machine translation, as word-level models are more common in this domain. It works best on text of length <= 15 characters.
 ## Training and evaluation data
+English to French translation data from
+https://www.manythings.org/anki/
 ## Training procedure
+- We start with input sequences from a domain (e.g. English sentences) and corresponding target sequences from another domain (e.g. French sentences).
+- An encoder LSTM turns input sequences to 2 state vectors (we keep the last LSTM state and discard the outputs).
+- A decoder LSTM is trained to turn the target sequences into the same sequence but offset by one timestep in the future, a training process called "teacher forcing" in this context. It uses as initial state the state vectors from the encoder. Effectively, the decoder learns to generate targets[t+1...] given targets[...t], conditioned on the input sequence.
+- In inference mode, when we want to decode unknown input sequences, we: - Encode the input sequence into state vectors - Start with a target sequence of size 1 (just the start-of-sequence character) - Feed the state vectors and 1-char target sequence to the decoder to produce predictions for the next character - Sample the next character using these predictions (we simply use argmax). - Append the sampled character to the target sequence - Repeat until we generate the end-of-sequence character or we hit the character limit.
 ### Training hyperparameters
 |----|-------------|-----|---|--------|-------|--------|------------------|
 |RMSprop|0.0010000000474974513|0.0|0.8999999761581421|0.0|1e-07|False|float32|
+```python
+batch_size = 64  # Batch size for training.
+epochs = 100  # Number of epochs to train for.
+latent_dim = 256  # Latent dimensionality of the encoding space.
+num_samples = 10000  # Number of samples to train on.
+```
  ## Model Plot
 <details>