Sadjad Alikhani
commited on
Commit
•
be1a7b3
1
Parent(s):
e8c0c9d
Update README.md
Browse files
README.md
CHANGED
@@ -35,14 +35,14 @@ Once you have Conda or Mamba installed, follow these steps to create a new envir
|
|
35 |
|
36 |
#### **Step 1: Create a new environment**
|
37 |
|
38 |
-
You can create a new environment called `lwm_env` (or any other name) with Python 3.
|
39 |
|
40 |
```bash
|
41 |
# If you're using Conda:
|
42 |
-
conda create -n lwm_env python=3.
|
43 |
|
44 |
# If you're using Mamba:
|
45 |
-
mamba create -n lwm_env python=3.
|
46 |
```
|
47 |
|
48 |
#### **Step 2: Activate the environment**
|
@@ -56,82 +56,76 @@ conda activate lwm_env
|
|
56 |
|
57 |
---
|
58 |
|
59 |
-
|
60 |
|
61 |
-
|
62 |
|
63 |
-
```
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
import importlib.util
|
68 |
-
import torch
|
69 |
|
70 |
-
#
|
71 |
-
|
|
|
|
|
|
|
72 |
|
73 |
-
|
74 |
-
clone_dir = "./LWM"
|
75 |
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
result = subprocess.run(["git", "clone", repo_url, clone_dir], capture_output=True, text=True)
|
80 |
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
-
#
|
107 |
-
|
108 |
-
|
109 |
-
|
110 |
-
import_functions_from_file("input_preprocess", os.path.join(clone_dir, "input_preprocess.py"))
|
111 |
-
print("All required functions imported successfully.")
|
112 |
```
|
113 |
|
114 |
---
|
115 |
|
116 |
-
###
|
117 |
|
118 |
-
|
119 |
|
120 |
-
|
121 |
-
|
122 |
-
|
123 |
-
pip install -r requirements.txt
|
124 |
|
125 |
-
|
126 |
-
|
127 |
-
|
128 |
```
|
129 |
|
130 |
-
This will install **PyTorch**, **Torchvision**, and other required dependencies from the `requirements.txt` file in the cloned repository.
|
131 |
-
|
132 |
---
|
133 |
|
134 |
-
###
|
135 |
|
136 |
Before proceeding with tokenization and data processing, the **DeepMIMO** dataset—or any dataset generated using the operational settings outlined below—must first be loaded. The table below provides a list of available datasets and their respective links for further details:
|
137 |
|
@@ -155,100 +149,62 @@ The operational settings below were used in generating the datasets for both the
|
|
155 |
- **Antennas at UEs**: 1
|
156 |
- **Subcarriers**: 32
|
157 |
- **Paths**: 20
|
158 |
-
|
159 |
-
#### **Load Data Code**:
|
160 |
-
Select and load specific datasets by adjusting the `dataset_idxs`. In the example below, we select the first two datasets.
|
161 |
-
|
162 |
```python
|
163 |
-
# Step
|
164 |
-
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
|
171 |
-
|
|
|
|
|
|
|
|
|
|
|
172 |
```
|
173 |
|
174 |
---
|
175 |
|
176 |
-
###
|
177 |
-
|
178 |
-
After loading the data, tokenize the selected **DeepMIMO** datasets. This step prepares the data for the model to process.
|
179 |
-
|
180 |
-
#### **Tokenization Code**:
|
181 |
-
|
182 |
```python
|
183 |
-
|
184 |
-
|
185 |
-
|
186 |
-
|
187 |
-
|
188 |
-
print("Dataset tokenized successfully.")
|
189 |
```
|
190 |
|
191 |
---
|
192 |
|
193 |
-
###
|
|
|
|
|
|
|
|
|
194 |
|
195 |
-
|
|
|
|
|
196 |
|
197 |
-
```python
|
198 |
-
# Step 7: Load the LWM model (with flexibility for the device)
|
199 |
device = 'cuda' if torch.cuda.is_available() else 'cpu'
|
200 |
print(f"Loading the LWM model on {device}...")
|
201 |
-
model =
|
202 |
-
```
|
203 |
-
|
204 |
-
---
|
205 |
-
|
206 |
-
### 8. **LWM Inference**
|
207 |
-
|
208 |
-
Once the dataset is tokenized and the model is loaded, generate either **raw channels** or the **inferred LWM embeddings** by choosing the input type.
|
209 |
-
|
210 |
-
```python
|
211 |
-
# Step 8: Generate the dataset for inference
|
212 |
-
input_type = ['cls_emb', 'channel_emb', 'raw'][1] # Modify input type as needed
|
213 |
-
dataset = dataset_gen(preprocessed_chs, input_type, model)
|
214 |
```
|
215 |
|
216 |
-
You can choose between:
|
217 |
-
- `cls_emb`: LWM CLS token embeddings
|
218 |
-
- `channel_emb`: LWM channel embeddings
|
219 |
-
- `raw`: Raw wireless channel data
|
220 |
-
|
221 |
---
|
222 |
|
223 |
-
###
|
224 |
-
|
225 |
-
9. **Post-processing for Downstream Task**
|
226 |
-
|
227 |
-
#### **Use the Dataset in Downstream Tasks**
|
228 |
-
|
229 |
-
Finally, use the generated dataset for your downstream tasks, such as classification, prediction, or analysis.
|
230 |
-
|
231 |
```python
|
232 |
-
|
233 |
-
|
234 |
-
|
235 |
-
|
236 |
-
|
237 |
-
|
238 |
-
|
239 |
-
## 📋 **Requirements**
|
240 |
-
|
241 |
-
- **Python 3.x**
|
242 |
-
- **PyTorch**
|
243 |
-
- **Git**
|
244 |
|
245 |
---
|
246 |
-
|
247 |
-
### Summary of Steps:
|
248 |
-
|
249 |
-
1. **Install Conda/Mamba**: Install a package manager for environment management.
|
250 |
-
2. **Create Environment**: Use Conda or Mamba to create a new environment.
|
251 |
-
3. **Clone the Repository**: Download the project files from Hugging Face.
|
252 |
-
4. **Install Packages**: Install PyTorch and other dependencies.
|
253 |
-
5. **Load and Tokenize Data**: Load the DeepMIMO dataset and prepare it for the model.
|
254 |
-
6. **Load Model and Perform Inference**: Use the LWM model for generating embeddings or raw channels.
|
|
|
35 |
|
36 |
#### **Step 1: Create a new environment**
|
37 |
|
38 |
+
You can create a new environment called `lwm_env` (or any other name) with Python 3.12 or any required version:
|
39 |
|
40 |
```bash
|
41 |
# If you're using Conda:
|
42 |
+
conda create -n lwm_env python=3.12
|
43 |
|
44 |
# If you're using Mamba:
|
45 |
+
mamba create -n lwm_env python=3.12
|
46 |
```
|
47 |
|
48 |
#### **Step 2: Activate the environment**
|
|
|
56 |
|
57 |
---
|
58 |
|
59 |
+
#### **Step 3: Install Required Packages**
|
60 |
|
61 |
+
Install the necessary packages inside your new environment.
|
62 |
|
63 |
+
```bash
|
64 |
+
# If you're using Conda:
|
65 |
+
conda install pytorch torchvision torchaudio -c pytorch
|
66 |
+
pip install -r requirements.txt
|
|
|
|
|
67 |
|
68 |
+
# If you're using Mamba:
|
69 |
+
mamba install pytorch torchvision torchaudio -c pytorch
|
70 |
+
pip install -r requirements.txt
|
71 |
+
```
|
72 |
+
---
|
73 |
|
74 |
+
### 2. **Required Functions to Clone Datasets**
|
|
|
75 |
|
76 |
+
```python
|
77 |
+
import subprocess
|
78 |
+
import os
|
|
|
79 |
|
80 |
+
# Function to clone a specific dataset scenario folder
|
81 |
+
def clone_dataset_scenario(scenario_name, repo_url, model_repo_dir="./LWM", scenarios_dir="scenarios"):
|
82 |
+
# Create the scenarios directory if it doesn't exist
|
83 |
+
scenarios_path = os.path.join(model_repo_dir, scenarios_dir)
|
84 |
+
if not os.path.exists(scenarios_path):
|
85 |
+
os.makedirs(scenarios_path)
|
86 |
+
|
87 |
+
scenario_path = os.path.join(scenarios_path, scenario_name)
|
88 |
+
|
89 |
+
# Initialize sparse checkout for the dataset repository
|
90 |
+
if not os.path.exists(os.path.join(scenarios_path, ".git")):
|
91 |
+
print(f"Initializing sparse checkout in {scenarios_path}...")
|
92 |
+
subprocess.run(["git", "clone", "--sparse", repo_url, "."], cwd=scenarios_path, check=True)
|
93 |
+
subprocess.run(["git", "sparse-checkout", "init", "--cone"], cwd=scenarios_path, check=True)
|
94 |
+
subprocess.run(["git", "lfs", "install"], cwd=scenarios_path, check=True) # Install Git LFS if needed
|
95 |
+
|
96 |
+
# Add the requested scenario folder to sparse checkout
|
97 |
+
print(f"Adding {scenario_name} to sparse checkout...")
|
98 |
+
subprocess.run(["git", "sparse-checkout", "add", scenario_name], cwd=scenarios_path, check=True)
|
99 |
+
|
100 |
+
# Pull large files if needed (using Git LFS)
|
101 |
+
subprocess.run(["git", "lfs", "pull"], cwd=scenarios_path, check=True)
|
102 |
+
|
103 |
+
print(f"Successfully cloned {scenario_name} into {scenarios_path}.")
|
104 |
+
|
105 |
+
# Function to clone multiple dataset scenarios
|
106 |
+
def clone_dataset_scenarios(selected_scenario_names, dataset_repo_url, model_repo_dir):
|
107 |
+
for scenario_name in selected_scenario_names:
|
108 |
+
clone_dataset_scenario(scenario_name, dataset_repo_url, model_repo_dir)
|
|
|
|
|
109 |
```
|
110 |
|
111 |
---
|
112 |
|
113 |
+
### 3. **Clone the Model**
|
114 |
|
115 |
+
```python
|
116 |
|
117 |
+
# Step 1: Clone the model repository (if not already cloned)
|
118 |
+
model_repo_url = "https://huggingface.co/sadjadalikhani/lwm"
|
119 |
+
model_repo_dir = "./LWM"
|
|
|
120 |
|
121 |
+
if not os.path.exists(model_repo_dir):
|
122 |
+
print(f"Cloning model repository from {model_repo_url}...")
|
123 |
+
subprocess.run(["git", "clone", model_repo_url, model_repo_dir], check=True)
|
124 |
```
|
125 |
|
|
|
|
|
126 |
---
|
127 |
|
128 |
+
### 4. **Clone the Desired Datasets**
|
129 |
|
130 |
Before proceeding with tokenization and data processing, the **DeepMIMO** dataset—or any dataset generated using the operational settings outlined below—must first be loaded. The table below provides a list of available datasets and their respective links for further details:
|
131 |
|
|
|
149 |
- **Antennas at UEs**: 1
|
150 |
- **Subcarriers**: 32
|
151 |
- **Paths**: 20
|
152 |
+
|
|
|
|
|
|
|
153 |
```python
|
154 |
+
# Step 2: Clone specific dataset scenario folder(s) inside the "scenarios" folder
|
155 |
+
dataset_repo_url = "https://huggingface.co/datasets/sadjadalikhani/lwm" # Base URL for dataset repo
|
156 |
+
scenario_names = np.array(["city_18_denver",
|
157 |
+
"city_15_indianapolis",
|
158 |
+
"city_19_oklahoma",
|
159 |
+
"city_12_fortworth",
|
160 |
+
"city_11_santaclara",
|
161 |
+
"city_7_sandiego"]
|
162 |
+
)
|
163 |
+
scenario_idxs = np.array([3])
|
164 |
+
selected_scenario_names = scenario_names[scenario_idxs]
|
165 |
+
|
166 |
+
# Clone the requested scenario folders (this will clone every time)
|
167 |
+
clone_dataset_scenarios(selected_scenario_names, dataset_repo_url, model_repo_dir)
|
168 |
```
|
169 |
|
170 |
---
|
171 |
|
172 |
+
### 5. **Change the working directory to LWM folder**
|
|
|
|
|
|
|
|
|
|
|
173 |
```python
|
174 |
+
if os.path.exists(model_repo_dir):
|
175 |
+
os.chdir(model_repo_dir)
|
176 |
+
print(f"Changed working directory to {os.getcwd()}")
|
177 |
+
else:
|
178 |
+
print(f"Directory {model_repo_dir} does not exist. Please check if the repository is cloned properly.")
|
|
|
179 |
```
|
180 |
|
181 |
---
|
182 |
|
183 |
+
### 6. **Tokenize and Load the Model**
|
184 |
+
```python
|
185 |
+
from input_preprocess import tokenizer
|
186 |
+
from lwm_model import lwm
|
187 |
+
import torch
|
188 |
|
189 |
+
preprocessed_chs = tokenizer(selected_scenario_names=selected_scenario_names,
|
190 |
+
manual_data=None,
|
191 |
+
gen_raw=True)
|
192 |
|
|
|
|
|
193 |
device = 'cuda' if torch.cuda.is_available() else 'cpu'
|
194 |
print(f"Loading the LWM model on {device}...")
|
195 |
+
model = lwm.from_pretrained(device=device)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
196 |
```
|
197 |
|
|
|
|
|
|
|
|
|
|
|
198 |
---
|
199 |
|
200 |
+
### 7. **Perform Inference**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
201 |
```python
|
202 |
+
from inference import lwm_inference, create_raw_dataset
|
203 |
+
input_types = ['cls_emb', 'channel_emb', 'raw']
|
204 |
+
selected_input_type = input_types[0]
|
205 |
+
if selected_input_type in ['cls_emb', 'channel_emb']:
|
206 |
+
dataset = lwm_inference(preprocessed_chs, selected_input_type, model, device)
|
207 |
+
else:
|
208 |
+
dataset = create_raw_dataset(preprocessed_chs, device)
|
|
|
|
|
|
|
|
|
|
|
209 |
|
210 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|