sft-data / README.md
yuslzp's picture
Upload README.md
dba1671 verified
|
raw
history blame
2.33 kB

Instruction for downloading data from the sft-data repository.

First, you would want to log in and access the huggingface data through using

from huggingface_hub import login
login()

Then, you could either download the zip file of the all the sft data folders, which would look like

from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="LEVI-Project/sft-data", filename="sft-data.zip")

Notice that the sft-data.zip file above has the following structure:

sft-data
β”‚   README.md This README file.
└───alf Folder for ALFWORLD.
β”‚   β”‚   alfworld.json The JSON file for ALFWORLD.
β”‚   └───alf_data_folder Folder for the ALFWORLD environment.
β”‚       β”‚   alf_image_id_0  Folder 0 for ALFWORLD image data
β”‚       β”‚   alf_image_id_1  Folder 1 for ALFWORLD image data
β”‚       β”‚   alf_image_id_2  Folder 3 for ALFWORLD image data
β”‚       β”‚   alf_image_id_3  Folder 3 for ALFWORLD image data
β”‚       β”‚   alf_image_id_4  Folder 4 for ALFWORLD image data
└───blackjack Folder for blackjack environment in the `gym_cards`
    β”‚   blackjack_data_folder Folder for blackjack image data.
    β”‚   blackjack.json The JSON file for blackjack.
└───ezpoints Folder for ezpoints environment in the `gym_cards`.
    β”‚   ezpoints_data_folder Folder for ezpoints image data.
    β”‚   ezpoints.json The JSON file for ezpoints.
└───points24 Folder for points24 environment in the `gym_cards`.
    β”‚   points24_data_folder Folder for points24 image data.
    β”‚   points24.json The JSON file for points24.
└───numberline Folder for numberline environment in the `gym_cards`
    β”‚   numberline_data_folder Folder for numberline image data.
    β”‚   numberline.json The JSON file for numberline.

Also, you could choose to download the files for any environment out of the five ones. For example, you should be using the following code for downloading data from blackjack.

from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="LEVI-Project/llava-data", filename="blackjack.zip") # zip folder for image data folder
hf_hub_download(repo_id="LEVI-Project/llava-data", filename="blackjack.json") # JSON file 

For ALFWORLD, notice that the zip file for the image data folder is alf_data_folder.zip.