wi-lab
/

lwm

@@ -35,14 +35,14 @@ Once you have Conda or Mamba installed, follow these steps to create a new envir
 #### **Step 1: Create a new environment**
-You can create a new environment called `lwm_env` (or any other name) with Python 3.9 or any required version:
 ```bash
 # If you're using Conda:
-conda create -n lwm_env python=3.9
 # If you're using Mamba:
-mamba create -n lwm_env python=3.9
 ```
 #### **Step 2: Activate the environment**
@@ -56,82 +56,76 @@ conda activate lwm_env
 ---
-### 3. **Clone the Repository**
-After setting up the environment, clone the Hugging Face repository to your local machine using the following Python code:
-```python
-import subprocess
-import os
-import sys
-import importlib.util
-import torch
-# Hugging Face public repository URL
-repo_url = "https://huggingface.co/sadjadalikhani/LWM"
-# Directory where the repo will be cloned
-clone_dir = "./LWM"
-# Step 1: Clone the repository if it hasn't been cloned already
-if not os.path.exists(clone_dir):
-    print(f"Cloning repository from {repo_url} into {clone_dir}...")
-    result = subprocess.run(["git", "clone", repo_url, clone_dir], capture_output=True, text=True)
-    if result.returncode != 0:
-        print(f"Error cloning repository: {result.stderr}")
-        sys.exit(1)
-    print(f"Repository cloned successfully into {clone_dir}")
-else:
-    print(f"Repository already cloned into {clone_dir}")
-# Step 2: Add the cloned directory to Python path
-sys.path.append(clone_dir)
-# Step 3: Import necessary functions
-def import_functions_from_file(module_name, file_path):
-    try:
-        spec = importlib.util.spec_from_file_location(module_name, file_path)
-        module = importlib.util.module_from_spec(spec)
-        spec.loader.exec_module(module)
-        for function_name in dir(module):
-            if callable(getattr(module, function_name)) and not function_name.startswith("__"):
-                globals()[function_name] = getattr(module, function_name)
-        return module
-    except FileNotFoundError:
-        print(f"Error: {file_path} not found!")
-        sys.exit(1)
-# Step 4: Import functions from the repository
-import_functions_from_file("lwm_model", os.path.join(clone_dir, "lwm_model.py"))
-import_functions_from_file("inference", os.path.join(clone_dir, "inference.py"))
-import_functions_from_file("load_data", os.path.join(clone_dir, "load_data.py"))
-import_functions_from_file("input_preprocess", os.path.join(clone_dir, "input_preprocess.py"))
-print("All required functions imported successfully.")
 ```
 ---
-### 4. **Install Required Packages**
-Install the necessary packages inside your new environment.
-```bash
-# If you're using Conda:
-conda install pytorch torchvision torchaudio -c pytorch
-pip install -r requirements.txt
-# If you're using Mamba:
-mamba install pytorch torchvision torchaudio -c pytorch
-pip install -r requirements.txt
 ```
-This will install **PyTorch**, **Torchvision**, and other required dependencies from the `requirements.txt` file in the cloned repository.
 ---
-### 5. **Load the DeepMIMO Dataset**
 Before proceeding with tokenization and data processing, the **DeepMIMO** dataset—or any dataset generated using the operational settings outlined below—must first be loaded. The table below provides a list of available datasets and their respective links for further details:
@@ -155,100 +149,62 @@ The operational settings below were used in generating the datasets for both the
 - **Antennas at UEs**: 1
 - **Subcarriers**: 32
 - **Paths**: 20
-#### **Load Data Code**:
-Select and load specific datasets by adjusting the `dataset_idxs`. In the example below, we select the first two datasets.
 ```python
-# Step 5: Load the DeepMIMO dataset
-print("Loading the DeepMIMO dataset...")
-# Load the DeepMIMO dataset
-deepmimo_data = load_DeepMIMO_data()
-# Select datasets to load
-dataset_idxs = torch.arange(2)  # Adjust the number of datasets as needed
-print("DeepMIMO dataset loaded successfully.")
 ```
 ---
-### 6. **Tokenize the DeepMIMO Dataset**
-After loading the data, tokenize the selected **DeepMIMO** datasets. This step prepares the data for the model to process.
-#### **Tokenization Code**:
 ```python
-# Step 6: Tokenize the dataset
-print("Tokenizing the DeepMIMO dataset...")
-# Tokenize the loaded datasets
-preprocessed_chs = tokenizer(deepmimo_data, dataset_idxs, gen_raw=True)
-print("Dataset tokenized successfully.")
 ```
 ---
-### 7. **Load the LWM Model**
-Once the dataset is tokenized, load the pre-trained **LWM** model using the following code:
-```python
-# Step 7: Load the LWM model (with flexibility for the device)
 device = 'cuda' if torch.cuda.is_available() else 'cpu'
 print(f"Loading the LWM model on {device}...")
-model = LWM.from_pretrained(device=device)
-```
----
-### 8. **LWM Inference**
-Once the dataset is tokenized and the model is loaded, generate either **raw channels** or the **inferred LWM embeddings** by choosing the input type.
-```python
-# Step 8: Generate the dataset for inference
-input_type = ['cls_emb', 'channel_emb', 'raw'][1]  # Modify input type as needed
-dataset = dataset_gen(preprocessed_chs, input_type, model)
 ```
-You can choose between:
-- `cls_emb`: LWM CLS token embeddings
-- `channel_emb`: LWM channel embeddings
-- `raw`: Raw wireless channel data
 ---
-###
- 9. **Post-processing for Downstream Task**
-#### **Use the Dataset in Downstream Tasks**
-Finally, use the generated dataset for your downstream tasks, such as classification, prediction, or analysis.
 ```python
-# Step 9: Print results
-print(f"Dataset generated with shape: {dataset.shape}")
-print("Inference completed successfully.")
-```
----
-## 📋 **Requirements**
-- **Python 3.x**
-- **PyTorch**
-- **Git**
 ---
-### Summary of Steps:
-1. **Install Conda/Mamba**: Install a package manager for environment management.
-2. **Create Environment**: Use Conda or Mamba to create a new environment.
-3. **Clone the Repository**: Download the project files from Hugging Face.
-4. **Install Packages**: Install PyTorch and other dependencies.
-5. **Load and Tokenize Data**: Load the DeepMIMO dataset and prepare it for the model.
-6. **Load Model and Perform Inference**: Use the LWM model for generating embeddings or raw channels.

 #### **Step 1: Create a new environment**
+You can create a new environment called `lwm_env` (or any other name) with Python 3.12 or any required version:
 ```bash
 # If you're using Conda:
+conda create -n lwm_env python=3.12
 # If you're using Mamba:
+mamba create -n lwm_env python=3.12
 ```
 #### **Step 2: Activate the environment**
 ---
+#### **Step 3: Install Required Packages**
+Install the necessary packages inside your new environment.
+```bash
+# If you're using Conda:
+conda install pytorch torchvision torchaudio -c pytorch
+pip install -r requirements.txt
+# If you're using Mamba:
+mamba install pytorch torchvision torchaudio -c pytorch
+pip install -r requirements.txt
+```
+---
+### 2. **Required Functions to Clone Datasets**
+```python
+import subprocess
+import os
+# Function to clone a specific dataset scenario folder
+def clone_dataset_scenario(scenario_name, repo_url, model_repo_dir="./LWM", scenarios_dir="scenarios"):
+    # Create the scenarios directory if it doesn't exist
+    scenarios_path = os.path.join(model_repo_dir, scenarios_dir)
+    if not os.path.exists(scenarios_path):
+        os.makedirs(scenarios_path)
+    scenario_path = os.path.join(scenarios_path, scenario_name)
+    # Initialize sparse checkout for the dataset repository
+    if not os.path.exists(os.path.join(scenarios_path, ".git")):
+        print(f"Initializing sparse checkout in {scenarios_path}...")
+        subprocess.run(["git", "clone", "--sparse", repo_url, "."], cwd=scenarios_path, check=True)
+        subprocess.run(["git", "sparse-checkout", "init", "--cone"], cwd=scenarios_path, check=True)
+        subprocess.run(["git", "lfs", "install"], cwd=scenarios_path, check=True)  # Install Git LFS if needed
+    # Add the requested scenario folder to sparse checkout
+    print(f"Adding {scenario_name} to sparse checkout...")
+    subprocess.run(["git", "sparse-checkout", "add", scenario_name], cwd=scenarios_path, check=True)
+    # Pull large files if needed (using Git LFS)
+    subprocess.run(["git", "lfs", "pull"], cwd=scenarios_path, check=True)
+    print(f"Successfully cloned {scenario_name} into {scenarios_path}.")
+# Function to clone multiple dataset scenarios
+def clone_dataset_scenarios(selected_scenario_names, dataset_repo_url, model_repo_dir):
+    for scenario_name in selected_scenario_names:
+        clone_dataset_scenario(scenario_name, dataset_repo_url, model_repo_dir)
 ```
 ---
+### 3. **Clone the Model**
+```python
+# Step 1: Clone the model repository (if not already cloned)
+model_repo_url = "https://huggingface.co/sadjadalikhani/lwm"
+model_repo_dir = "./LWM"
+if not os.path.exists(model_repo_dir):
+    print(f"Cloning model repository from {model_repo_url}...")
+    subprocess.run(["git", "clone", model_repo_url, model_repo_dir], check=True)
 ```
 ---
+### 4. **Clone the Desired Datasets**
 Before proceeding with tokenization and data processing, the **DeepMIMO** dataset—or any dataset generated using the operational settings outlined below—must first be loaded. The table below provides a list of available datasets and their respective links for further details:
 - **Antennas at UEs**: 1
 - **Subcarriers**: 32
 - **Paths**: 20
 ```python
+# Step 2: Clone specific dataset scenario folder(s) inside the "scenarios" folder
+dataset_repo_url = "https://huggingface.co/datasets/sadjadalikhani/lwm"  # Base URL for dataset repo
+scenario_names = np.array(["city_18_denver",
+                           "city_15_indianapolis",
+                           "city_19_oklahoma",
+                           "city_12_fortworth",
+                           "city_11_santaclara",
+                           "city_7_sandiego"]
+                          )
+scenario_idxs = np.array([3])
+selected_scenario_names = scenario_names[scenario_idxs]
+# Clone the requested scenario folders (this will clone every time)
+clone_dataset_scenarios(selected_scenario_names, dataset_repo_url, model_repo_dir)
 ```
 ---
+### 5. **Change the working directory to LWM folder**
 ```python
+if os.path.exists(model_repo_dir):
+    os.chdir(model_repo_dir)
+    print(f"Changed working directory to {os.getcwd()}")
+else:
+    print(f"Directory {model_repo_dir} does not exist. Please check if the repository is cloned properly.")
 ```
 ---
+### 6. **Tokenize and Load the Model**
+```python
+from input_preprocess import tokenizer
+from lwm_model import lwm
+import torch
+preprocessed_chs = tokenizer(selected_scenario_names=selected_scenario_names,
+                             manual_data=None,
+                             gen_raw=True)
 device = 'cuda' if torch.cuda.is_available() else 'cpu'
 print(f"Loading the LWM model on {device}...")
+model = lwm.from_pretrained(device=device)
 ```
 ---
+### 7. **Perform Inference**
 ```python
+from inference import lwm_inference, create_raw_dataset
+input_types = ['cls_emb', 'channel_emb', 'raw']
+selected_input_type = input_types[0]
+if selected_input_type in ['cls_emb', 'channel_emb']:
+    dataset = lwm_inference(preprocessed_chs, selected_input_type, model, device)
+else:
+    dataset = create_raw_dataset(preprocessed_chs, device)
 ---