llm-arch / src /data_synthesis

Commit History

Updated naming convention for dataset databases to be clearer.
8a677b0

alfraser commited on

Fixed bugs in the dataload process with referencing the new json folder and then looking up the available databases.
7a2c982

alfraser commited on

Updated the dataloader to ignore case on product features. Note have deliberately not harmonised product feature capitalisation or similar features as this "messiness" is representative of real organisational data.
b5d446f

alfraser commited on

Fixed a bug in generate data where it didn't reference the json folder in the new file sructure
0ff95ad

alfraser commited on

Added the scripts which were used to build the dataset to the repo, and tweaked to use common code
a19a983

alfraser commited on