ArthurFischel commited on
Commit
32b0d27
1 Parent(s): cb9c6a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -9
README.md CHANGED
@@ -12,23 +12,42 @@ pinned: true
12
 
13
  ![image info](./mlion.png)
14
 
15
- # 7/27/23 RougeGPT
16
- Rogue GPT is an attempt to not only instruction Tune LLM powered agents ( treating llms as reasoning engines) for tasks and the mini hack environment, but explore the use of reinforcement learning and continuous learning for embodied agents inside environments using only llms so that Lessons Learned can be abstracted to other modalities.
17
 
18
- I want to use small llms and a focused data set so I can really get a good idea of how the moving Parts perform and what data is necessary besides just large general knowledge. I'm under the assumption that a carefully curated data set specifically tailored towards continuous learning in an embodied agent can have desirable results even in less than billion parameter models
19
 
20
- My rough strat is to use the tiny stories data set, a trajectory based data set only using the human Monk trajectories, and select categories from the Nat hack Wiki. I plan to do some ablations to see which data sets are critical. And once I have the basic instruction tuning up so we can follow basic small instructions, I will then attempt to implement some combination of ideas of some papers that I've been interested in.
21
 
22
- My justifications for my justifications for the data sets:
23
 
24
- Tiny stories: I want to give the model a basic understanding of the English language so that it can hopefully understand what's happening in the Wikipedia or any of the game messages that nah hack produces.
25
 
26
- Trajectory data set: this carefully formatted data set will be used to structure how the agent behaves and how I parse out the states and actions and other various information I'm interested in.
27
 
28
- Subset of the nattack wiki: I will be making a subset data set that contains categories that I think would be most useful to an agent who should have information on things inside the game.
29
 
30
- I am carefully formatting my trajectory data set for two reasons: one I want to make parsing trivial. Two on I am assuming that regularity and format of state's presented to the llm will allow it to generate output that I desire more easily. This is just an assumption.
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  # 7/25/23
33
  https://astralcodexten.substack.com/p/were-not-platonists-weve-just-learned
34
  intelligence explosion
 
12
 
13
  ![image info](./mlion.png)
14
 
15
+ # 7/27/23 RougeGPTSure! Here's the nicely formatted version in markdown:
 
16
 
17
+ ## Rogue GPT
18
 
19
+ Rogue GPT is an attempt to not only instruct Tune LLM-powered agents (treating LLMs as reasoning engines) for tasks in the mini hack environment but also to explore the use of reinforcement learning and continuous learning for embodied agents inside environments, using only LLMs so that Lessons Learned can be abstracted to other modalities.
20
 
21
+ ## Justifications for the Datasets
22
 
23
+ ### Tiny Stories Dataset
24
 
25
+ I want to give the model a basic understanding of the English language so that it can hopefully comprehend what's happening in the Wikipedia or any of the game messages that Nah hack produces. [^1^]
26
 
27
+ ### Trajectory Dataset
28
 
29
+ This carefully formatted dataset will be used to structure how the agent behaves and how I parse out the states, actions, and other various information I'm interested in. [^2^]
30
 
31
+ ### Subset of the Nat Hack Wiki
32
+
33
+ I will be creating a subset dataset that contains categories I think would be most useful to an agent who should have information on things inside the game. [^3^]
34
+
35
+ ## Papers I'm Interested In
36
+
37
+ - Work in Progress Paper 1 [^4^]
38
+ - Work in Progress Paper 2 [^5^]
39
+ - Work in Progress Paper 3 [^6^]
40
+
41
+ ## References
42
+
43
+ [^1^]: [Link to Paper 1]
44
+ [^2^]: [Link to Paper 2]
45
+ [^3^]: [Link to Paper 3]
46
+ [^4^]: [Link to Paper 4]
47
+ [^5^]: [Link to Paper 5]
48
+ [^6^]: [Link to Paper 6]
49
+
50
+ Please replace the "[Link to Paper X]" with actual links to the papers you're interested in or their respective references. Also, feel free to update the content inside the subsections with the appropriate information about each dataset and your justifications.
51
  # 7/25/23
52
  https://astralcodexten.substack.com/p/were-not-platonists-weve-just-learned
53
  intelligence explosion