ngia commited on
Commit
ca09c81
·
1 Parent(s): d91ea77

deploy on hugging face spaces for inference

Browse files
Files changed (1) hide show
  1. README.md +8 -73
README.md CHANGED
@@ -1,73 +1,8 @@
1
- # Training Transformers from Scratch for Language Translation (English to French)
2
-
3
- ## Overview
4
- This project focuses on training a Transformer model from scratch to perform English-to-French translation. It follows a structured approach, from data collection to model deployment using Gradio.
5
-
6
- ![Transformer Architecture](https://dassignies.law/wp-content/uploads/2024/04/DASSIGNIES-avocat-intelligence-artificielle-cybersecurite-strategie-protection-actifs-immateriels-formations-expertises-blog-transformer-architecture.webp)
7
-
8
- ## Project Steps
9
-
10
- 1. **Data Collection**
11
- - Gather parallel English-French text data for training.
12
-
13
- 2. **Dataset Creation and Upload to Hugging Face**
14
- - Preprocess and structure the dataset.
15
- - Upload the dataset to the Hugging Face Hub for easy access.
16
-
17
- 3. **Training Tokenizers**
18
- - Train separate tokenizers for English and French.
19
- - Save and store trained tokenizers.
20
-
21
- 4. **Creating a Tokenized Dataset**
22
- - Tokenize the dataset using the trained tokenizers.
23
- - Publish the tokenized dataset on Hugging Face.
24
-
25
- 5. **Building the Transformer Model from Scratch**
26
- - Implement custom Transformer components, including:
27
- - Encoder
28
- - Decoder
29
- - Embedding Layer
30
- - Positional Encoding
31
-
32
- 6. **Model Training and Evaluation**
33
- - Train the model using the prepared dataset.
34
- - Use Weights & Biases (Wandb) for real-time metric visualization.
35
-
36
- 7. **Inference**
37
- - Test the trained model with sample English inputs.
38
- - Generate translated French text.
39
-
40
- 8. **Web Interface with Gradio**
41
- - Develop an interactive UI using Gradio for easy model inference.
42
-
43
- ## Installation
44
-
45
- To use the application, install the required dependencies using either `uv` or `pip`:
46
-
47
- Using `uv`:
48
- ```bash
49
- uv pip install -r requirements.txt
50
- ```
51
-
52
- Using `pip`:
53
- ```bash
54
- pip install -r requirements.txt
55
- ```
56
-
57
- ## Running the Application
58
-
59
- To launch the application, run:
60
- ```bash
61
- python app.py
62
- ```
63
-
64
- This will start a Gradio interface where users can input English text and receive French translations.
65
-
66
- ## Repository Structure
67
- - **data_collector.py** - Script for data collection.
68
- - **tokenize_dataset.py** - Prepares and tokenizes dataset.
69
- - **model.py** - Contains the Transformer model implementation.
70
- - **train.py** - Training script.
71
- - **inference.py** - Inference script for model predictions.
72
- - **app.py** - Web interface with Gradio.
73
- - **requirements.txt** - List of dependencies.
 
1
+ title: Test Space
2
+ emoji: 📈
3
+ colorFrom: red
4
+ colorTo: green
5
+ sdk: gradio
6
+ sdk_version: 5.21.0
7
+ app_file: app.py
8
+ pinned: false