llm-arch / src /data_synthesis

Commit History

Added on type hint
63018b5

alfraser commited on

Reviewed for comments and type hints
564477a

alfraser commited on

Checked type hints
2cb7b84

alfraser commited on

Tidied up generate_data.py
a1317da

alfraser commited on

Updated from using random.choices to random.sample throughout where I need a random distinct set as choices does replacement so you can get the same item twice. Discovered in pricing testing.
b897a48

alfraser commited on

Tweaked the test generator and updated the tests
ca7e5c7

alfraser commited on

Added the test question generator and increased the size of the question bank to 500
59b2aff

alfraser commited on

Fixed less than/greater than bug where I was dropping the wrong reviews to achieve a target average review. Update the sql data set too.
2b08e8f

alfraser commited on

Added the script to shape the data for testing and the associated sqlite containg the test data
7e353fe

alfraser commited on

Tidied up comments
acb7b9c

alfraser commited on

Updated naming convention for dataset databases to be clearer.
8a677b0

alfraser commited on

Fixed bugs in the dataload process with referencing the new json folder and then looking up the available databases.
7a2c982

alfraser commited on

Updated the dataloader to ignore case on product features. Note have deliberately not harmonised product feature capitalisation or similar features as this "messiness" is representative of real organisational data.
b5d446f

alfraser commited on

Fixed a bug in generate data where it didn't reference the json folder in the new file sructure
0ff95ad

alfraser commited on

Added the scripts which were used to build the dataset to the repo, and tweaked to use common code
a19a983

alfraser commited on