Spaces:
Sleeping
Sleeping
Update README: (1) add table of content; (2) change Topic to Subject; (3) Add emoji to sections; (4) Add roadmap section.
Browse files
README.md
CHANGED
@@ -1,20 +1,33 @@
|
|
1 |
-
# Personalized-
|
2 |
-
This repo aims to provide a better daily digest for newly published
|
3 |
|
4 |
-
##
|
5 |
|
6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
|
8 |
-
|
9 |
|
10 |
-
|
11 |
-
|
12 |
-
|
|
|
|
|
|
|
|
|
13 |
|
14 |
|
15 |
### Some examples:
|
16 |
|
17 |
-
-
|
18 |
- Categories: Artificial Intelligence, Computation and Language
|
19 |
- Interest:
|
20 |
- Large language model pretraining and finetunings
|
@@ -25,14 +38,14 @@ This repository provides a way to have this daily digest sorted by relevance via
|
|
25 |

|
26 |
|
27 |
|
28 |
-
-
|
29 |
- Interest: "making lots of money"
|
30 |
|
31 |

|
32 |
|
33 |
-
## Usage
|
34 |
|
35 |
-
### Running as a github action using SendGrid.
|
36 |
|
37 |
The recommended way to get started using this repository is to:
|
38 |
|
@@ -67,7 +80,7 @@ An alternative way to get started using this repository is to:
|
|
67 |
- `TO_EMAIL` (only if you don't have it set in `config.yaml`)
|
68 |
6. Manually trigger the action or wait until the scheduled action takes place.
|
69 |
|
70 |
-
|
71 |
|
72 |
If you do not wish to create a SendGrid account or use your email authentication, the action will also emit an artifact containing the HTML output. Simply do not create the SendGrid or SMTP secrets.
|
73 |
|
@@ -99,7 +112,16 @@ Install the requirements in `src/requirements.txt` as well as `gradio`. Set the
|
|
99 |
|
100 |
Run `python src/app.py` and go to the local URL. From there you will be able to preview the papers from today, as well as the generated digests.
|
101 |
|
102 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
|
104 |
You may (and are encourage to) modify the code in this repository to suit your personal needs. If you think your modifications would be in any way useful to others, please submit a pull request.
|
105 |
|
|
|
1 |
+
# Personalized-arXiv-digest
|
2 |
+
This repo aims to provide a better daily digest for newly published arXiv papers based on your own research interests and descriptions.
|
3 |
|
4 |
+
## π Contents
|
5 |
|
6 |
+
- [What this repo does](#what-this-repo-does)
|
7 |
+
* [Examples](#some-examples)
|
8 |
+
- [Usage](#usage)
|
9 |
+
* [Running as a github action using SendGrid (Recommended)](#running-as-a-github-action-using-sendgrid-recommended)
|
10 |
+
* [Running as a github action with SMTP credentials](#running-as-a-github-action-with-smtp-credentials)
|
11 |
+
* [Running as a github action without emails](#running-as-a-github-action-without-emails)
|
12 |
+
* [Running from the command line](#running-from-the-command-line)
|
13 |
+
* [Running with a user interface](#running-with-a-user-interface)
|
14 |
+
- [Roadmap](#roadmap)
|
15 |
+
- [Extending and Contributing](#extending-and-contributing)
|
16 |
|
17 |
+
## π What this repo does
|
18 |
|
19 |
+
Staying up to date on [arXiv](https://arxiv.org) papers can take a considerable amount of time, with on the order of hundreds of new papers each day to filter through. There is an [official daily digest service](https://info.arxiv.org/help/subscribe.html), however large categories like [cs.AI](https://arxiv.org/list/cs.AI/recent) still have 50-100 papers a day. Determining if these papers are relevant and important to you means reading through the title and abstract, which is time-consuming.
|
20 |
+
|
21 |
+
This repository offers a method to curate a daily digest, sorted by relevance, using large language models. These models are conditioned based on your personal research interests, which are described in natural language.
|
22 |
+
|
23 |
+
* You modify the configuration file `config.yaml` with an arXiv Subject, some set of Categories, and a natural language statement about the type of papers you are interested in.
|
24 |
+
* The code pulls all the abstracts for papers in those categories and ranks how relevant they are to your interest on a scale of 1-10 using `gpt-3.5-turbo`.
|
25 |
+
* The code then emits an HTML digest listing all the relevant papers, and optionally emails it to you using [SendGrid](https://sendgrid.com). You will need to have a SendGrid account with an API key for this functionality to work.
|
26 |
|
27 |
|
28 |
### Some examples:
|
29 |
|
30 |
+
- Subject: Computer Science
|
31 |
- Categories: Artificial Intelligence, Computation and Language
|
32 |
- Interest:
|
33 |
- Large language model pretraining and finetunings
|
|
|
38 |

|
39 |
|
40 |
|
41 |
+
- Subject: Quantitative Finance
|
42 |
- Interest: "making lots of money"
|
43 |
|
44 |

|
45 |
|
46 |
+
## π‘ Usage
|
47 |
|
48 |
+
### Running as a github action using SendGrid (Recommended).
|
49 |
|
50 |
The recommended way to get started using this repository is to:
|
51 |
|
|
|
80 |
- `TO_EMAIL` (only if you don't have it set in `config.yaml`)
|
81 |
6. Manually trigger the action or wait until the scheduled action takes place.
|
82 |
|
83 |
+
### Running as a github action without emails
|
84 |
|
85 |
If you do not wish to create a SendGrid account or use your email authentication, the action will also emit an artifact containing the HTML output. Simply do not create the SendGrid or SMTP secrets.
|
86 |
|
|
|
112 |
|
113 |
Run `python src/app.py` and go to the local URL. From there you will be able to preview the papers from today, as well as the generated digests.
|
114 |
|
115 |
+
## β
Roadmap
|
116 |
+
|
117 |
+
- [x] Support personalized paper recommendation using LLM.
|
118 |
+
- [x] Send emails for daily digest.
|
119 |
+
- [ ] Implement a ranking factor to prioritize content from specific authors.
|
120 |
+
- [ ] Support open-source models, e.g., LLaMA, Vicuna, MPT etc.
|
121 |
+
- [ ] Fine-tune an open-source model to better support paper ranking and stay updated with the latest research concepts..
|
122 |
+
|
123 |
+
|
124 |
+
## π Extending and Contributing
|
125 |
|
126 |
You may (and are encourage to) modify the code in this repository to suit your personal needs. If you think your modifications would be in any way useful to others, please submit a pull request.
|
127 |
|