Spaces:
Sleeping
Sleeping
deploy on hugging face spaces for inference
Browse files
README.md
CHANGED
@@ -1,73 +1,8 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
1. **Data Collection**
|
11 |
-
- Gather parallel English-French text data for training.
|
12 |
-
|
13 |
-
2. **Dataset Creation and Upload to Hugging Face**
|
14 |
-
- Preprocess and structure the dataset.
|
15 |
-
- Upload the dataset to the Hugging Face Hub for easy access.
|
16 |
-
|
17 |
-
3. **Training Tokenizers**
|
18 |
-
- Train separate tokenizers for English and French.
|
19 |
-
- Save and store trained tokenizers.
|
20 |
-
|
21 |
-
4. **Creating a Tokenized Dataset**
|
22 |
-
- Tokenize the dataset using the trained tokenizers.
|
23 |
-
- Publish the tokenized dataset on Hugging Face.
|
24 |
-
|
25 |
-
5. **Building the Transformer Model from Scratch**
|
26 |
-
- Implement custom Transformer components, including:
|
27 |
-
- Encoder
|
28 |
-
- Decoder
|
29 |
-
- Embedding Layer
|
30 |
-
- Positional Encoding
|
31 |
-
|
32 |
-
6. **Model Training and Evaluation**
|
33 |
-
- Train the model using the prepared dataset.
|
34 |
-
- Use Weights & Biases (Wandb) for real-time metric visualization.
|
35 |
-
|
36 |
-
7. **Inference**
|
37 |
-
- Test the trained model with sample English inputs.
|
38 |
-
- Generate translated French text.
|
39 |
-
|
40 |
-
8. **Web Interface with Gradio**
|
41 |
-
- Develop an interactive UI using Gradio for easy model inference.
|
42 |
-
|
43 |
-
## Installation
|
44 |
-
|
45 |
-
To use the application, install the required dependencies using either `uv` or `pip`:
|
46 |
-
|
47 |
-
Using `uv`:
|
48 |
-
```bash
|
49 |
-
uv pip install -r requirements.txt
|
50 |
-
```
|
51 |
-
|
52 |
-
Using `pip`:
|
53 |
-
```bash
|
54 |
-
pip install -r requirements.txt
|
55 |
-
```
|
56 |
-
|
57 |
-
## Running the Application
|
58 |
-
|
59 |
-
To launch the application, run:
|
60 |
-
```bash
|
61 |
-
python app.py
|
62 |
-
```
|
63 |
-
|
64 |
-
This will start a Gradio interface where users can input English text and receive French translations.
|
65 |
-
|
66 |
-
## Repository Structure
|
67 |
-
- **data_collector.py** - Script for data collection.
|
68 |
-
- **tokenize_dataset.py** - Prepares and tokenizes dataset.
|
69 |
-
- **model.py** - Contains the Transformer model implementation.
|
70 |
-
- **train.py** - Training script.
|
71 |
-
- **inference.py** - Inference script for model predictions.
|
72 |
-
- **app.py** - Web interface with Gradio.
|
73 |
-
- **requirements.txt** - List of dependencies.
|
|
|
1 |
+
title: Test Space
|
2 |
+
emoji: 📈
|
3 |
+
colorFrom: red
|
4 |
+
colorTo: green
|
5 |
+
sdk: gradio
|
6 |
+
sdk_version: 5.21.0
|
7 |
+
app_file: app.py
|
8 |
+
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|