Spaces:
Running
Running
Yew Chong
commited on
Commit
·
46f672e
1
Parent(s):
85c2d7f
simple readme and requirements
Browse files- Books/03_book_to_user_converter.ipynb +2 -2
- Books/2.6_cluster clustering with recs.ipynb +2 -2
- Data/Books/Recommended Storage/title_users_test.npy +3 -0
- Data/Books/Recommended Storage/title_users_train.npy +3 -0
- Data/Books/Recommended Storage/title_users_val.npy +3 -0
- README.md +57 -0
- requirements.txt +16 -0
Books/03_book_to_user_converter.ipynb
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9e1d6769b7de2a32c36297f8c4923b97bc7813cc943c2c89cb71f384c5d75c70
|
3 |
+
size 180730
|
Books/2.6_cluster clustering with recs.ipynb
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ff34991e0eb780b779f14a9aa1123d1e0ca10395af3cf7f32d68c432503998cf
|
3 |
+
size 346664
|
Data/Books/Recommended Storage/title_users_test.npy
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d8f666d360156f45c945c9ea6b33c254a26b6dc343acc4e8f1f617ff10c6ed2b
|
3 |
+
size 258885
|
Data/Books/Recommended Storage/title_users_train.npy
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0be88e7f87a168fa61e7a54c953266c21c82fb6f4a3c27222dd109dc852ce55e
|
3 |
+
size 678312
|
Data/Books/Recommended Storage/title_users_val.npy
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:91c6ee5f805ee57e9ca2a5691864b7c46bd9b1f230ad3e62993174ebdb538817
|
3 |
+
size 260383
|
README.md
CHANGED
@@ -7,3 +7,60 @@ sdk: static
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
# Books Recommendation Project (BT5153)
|
11 |
+
|
12 |
+
Hello, and welcome to our books recommendation project for BT5153!
|
13 |
+
|
14 |
+
# Project Directory
|
15 |
+
## Source Code
|
16 |
+
Codes are stored under `./Books` as `.ipynb` files, and named according to the order they should be run.
|
17 |
+
|
18 |
+
## Data
|
19 |
+
Data used for the project is stored in `./Data`.
|
20 |
+
|
21 |
+
Raw data, retrieved from the Goodreads dataset [here](https://mengtingwan.github.io/data/goodreads.html), can be found under `./raw-data`.
|
22 |
+
|
23 |
+
For our submission, we have created a representative subset of our dataset to be included in the zip submission, and can be found in `./Data-sub`.
|
24 |
+
|
25 |
+
# To run our project in Windows:
|
26 |
+
|
27 |
+
## Create a virtual environment (optional)
|
28 |
+
Run these commands:
|
29 |
+
1. `python -m venv venv`
|
30 |
+
2. `venv\Scripts\activate`
|
31 |
+
3. `python -m pip install -r requirements.txt`
|
32 |
+
|
33 |
+
## Locate python notebooks
|
34 |
+
All python notebooks can be found in the subdirectory `./Books/`.
|
35 |
+
|
36 |
+
## Data preprocessing
|
37 |
+
Run all cells in the file `1_data_split.ipynb`.
|
38 |
+
|
39 |
+
## Generating recommendations
|
40 |
+
Run all cells in the following files:
|
41 |
+
* `2.1`
|
42 |
+
* `2.2`
|
43 |
+
* `2.3`
|
44 |
+
* `2.4`
|
45 |
+
* `2.5`
|
46 |
+
* `2.6`
|
47 |
+
|
48 |
+
Then, run the following file to generate recommendation for users:
|
49 |
+
`3_book_to_user_converer.ipynb`
|
50 |
+
|
51 |
+
## Ensemble model
|
52 |
+
Run this file: `4_ensemble.ipynb`
|
53 |
+
|
54 |
+
----------------------------------------------------------------
|
55 |
+
# Project Description
|
56 |
+
In response to the overwhelming number of book choices online, which often leads to decision paralysis and wasted time, we propose the implementation of a Natural Language Processing (NLP) powered recommendation system to address this challenge.
|
57 |
+
|
58 |
+
|
59 |
+
For full project description, see the report file in submission.
|
60 |
+
|
61 |
+
## Members:
|
62 |
+
* Ang Kai En (A0221945E)
|
63 |
+
* Meritxell Camp Garcia (A0280366B)
|
64 |
+
* Sidharth Pahuja (A0218880X)
|
65 |
+
* Sim Jun You (A0200198L)
|
66 |
+
* Sim Yew Chong (A0189487A)
|
requirements.txt
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pandas
|
2 |
+
numpy
|
3 |
+
matplotlib
|
4 |
+
torch
|
5 |
+
transformers
|
6 |
+
tqdm
|
7 |
+
scikit-learn
|
8 |
+
gensim
|
9 |
+
contractions
|
10 |
+
langdetect
|
11 |
+
nltk
|
12 |
+
tiktoken
|
13 |
+
langchain==0.1.11
|
14 |
+
faiss-cpu
|
15 |
+
ipython
|
16 |
+
ipykernel
|