Spaces:

linhkid91
/

ArxivDigest-extra

Sleeping

Linh Nguyen commited on Mar 31, 2024

Commit

94aa2bb

1 Parent(s): 0eadab7

edit prompts

Files changed (3) hide show

config.yaml CHANGED Viewed

@@ -3,13 +3,13 @@ topic: "Computer Science"
 # An empty list here will include all categories in a topic
 # Use the natural language names of the topics, found here: https://arxiv.org
 # Including more categories will result in more calls to the large language model
-categories: ["Artificial Intelligence", "Computation and Language"]
 # Relevance score threshold. abstracts that receive a score less than this from the large language model
 # will have their papers filtered out.
 #
 # Must be within 1-10
-threshold: 7
 # A natural language statement that the large language model will use to judge which papers are relevant
 #
@@ -23,5 +23,6 @@ threshold: 7
 interest: |
   1. Large language model pretraining and finetunings
   2. Multimodal machine learning
-  3. Do not care about specific application, for example, information extraction, summarization, etc.
-  4. Not interested in paper focus on specific languages, e.g., Arabic, Chinese, etc.

 # An empty list here will include all categories in a topic
 # Use the natural language names of the topics, found here: https://arxiv.org
 # Including more categories will result in more calls to the large language model
+categories: ["Artificial Intelligence", "Computation and Language", "Machine Learning"]
 # Relevance score threshold. abstracts that receive a score less than this from the large language model
 # will have their papers filtered out.
 #
 # Must be within 1-10
+threshold: 6
 # A natural language statement that the large language model will use to judge which papers are relevant
 #
 interest: |
   1. Large language model pretraining and finetunings
   2. Multimodal machine learning
+  3. RAGs
+  4. Optimization of LLM and GenAI
+  5. Do not care about specific application, for example, information extraction, summarization, etc.

src/relevancy.py CHANGED Viewed

@@ -46,8 +46,10 @@ def post_process_chat_gpt_response(paper_data, response, threshold_score=8):
         score_items = [
             json.loads(re.sub(pattern, "", line))
             for line in json_items if "relevancy score" in line.lower()]
-    except Exception:
         pprint.pprint([re.sub(pattern, "", line) for line in json_items if "relevancy score" in line.lower()])
         raise RuntimeError("failed")
     pprint.pprint(score_items)
     scores = []
@@ -136,7 +138,7 @@ def generate_relevance_score(
     return ans_data, hallucination
 def run_all_day_paper(
-    query={"interest":"", "subjects":["Computation and Language", "Artificial Intelligence"]},
     date=None,
     data_dir="../data",
     model_name="gpt-3.5-turbo-16k",

         score_items = [
             json.loads(re.sub(pattern, "", line))
             for line in json_items if "relevancy score" in line.lower()]
+    except Exception as e:
         pprint.pprint([re.sub(pattern, "", line) for line in json_items if "relevancy score" in line.lower()])
+        print(e)
+        #raise e
         raise RuntimeError("failed")
     pprint.pprint(score_items)
     scores = []
     return ans_data, hallucination
 def run_all_day_paper(
+    query={"interest":"Computer Science", "subjects":["Machine Learning", "Computation and Language", "Artificial Intelligence"]},
     date=None,
     data_dir="../data",
     model_name="gpt-3.5-turbo-16k",

src/relevancy_prompt.txt CHANGED Viewed

@@ -1,7 +1,8 @@
 You have been asked to read a list of a few arxiv papers, each with title, authors and abstract.
-Based on my specific research interests, elevancy score out of 10 for each paper, based on my specific research interest, with a higher score indicating greater relevance. A relevance score more than 7 will need person's attention for details.
-Additionally, please generate 1-2 sentence summary for each paper explaining why it's relevant to my research interests.
 Please keep the paper order the same as in the input list, with one json format per line. Example is:
-1. {"Relevancy score": "an integer score out of 10", "Reasons for match": "1-2 sentence short reasonings"}
-My research interests are:

 You have been asked to read a list of a few arxiv papers, each with title, authors and abstract.
+Based on my specific research interests, relevancy score out of 10 for each paper, based on my specific research interest, with a higher score indicating greater relevance. A relevance score more than 7 will need person's attention for details.
+Additionally, please generate summary, for each paper explaining why it's relevant to my research interests.
 Please keep the paper order the same as in the input list, with one json format per line. Example is:
+{"Relevancy score": "an integer score out of 10", "Reasons for match": "1-2 sentence short reasonings", "Goal":"Goal of the paper/What kind of pain points the paper is trying to solve?", "Data": "Short summary of the data source used in the paper", "Methodology": "Summary of methodologies authors described in the paper", "Experiments & Results": "Summary of results", "Git": "Link to the Github code repo (if available)", "Discussion & Next steps": "Further discussion and next steps of the research"}
+My research interests are: NLP, RAGs, LLM, Optmization in Machine learning, Data science, Generative AI, Optimization in LLM, Finance modelling ...