Spaces:

ArthurFischel
/

Blog

App Files Files Community

Blog / README.md

ArthurFischel

Update README.md

32b0d27 about 1 year ago

preview code

raw

history blame

3.3 kB

	---
	title: Blog
	fullWidth: true
	emoji: ⚔️
	colorFrom: red
	colorTo: green
	sdk: streamlit
	sdk_version: 1.21.0
	app_file: app.py
	pinned: true
	---

	![image info](./mlion.png)

	# 7/27/23 RougeGPTSure! Here's the nicely formatted version in markdown:

	## Rogue GPT

	Rogue GPT is an attempt to not only instruct Tune LLM-powered agents (treating LLMs as reasoning engines) for tasks in the mini hack environment but also to explore the use of reinforcement learning and continuous learning for embodied agents inside environments, using only LLMs so that Lessons Learned can be abstracted to other modalities.

	## Justifications for the Datasets

	### Tiny Stories Dataset

	I want to give the model a basic understanding of the English language so that it can hopefully comprehend what's happening in the Wikipedia or any of the game messages that Nah hack produces. [^1^]

	### Trajectory Dataset

	This carefully formatted dataset will be used to structure how the agent behaves and how I parse out the states, actions, and other various information I'm interested in. [^2^]

	### Subset of the Nat Hack Wiki

	I will be creating a subset dataset that contains categories I think would be most useful to an agent who should have information on things inside the game. [^3^]

	## Papers I'm Interested In

	- Work in Progress Paper 1 [^4^]
	- Work in Progress Paper 2 [^5^]
	- Work in Progress Paper 3 [^6^]

	## References

	[^1^]: [Link to Paper 1]
	[^2^]: [Link to Paper 2]
	[^3^]: [Link to Paper 3]
	[^4^]: [Link to Paper 4]
	[^5^]: [Link to Paper 5]
	[^6^]: [Link to Paper 6]

	Please replace the "[Link to Paper X]" with actual links to the papers you're interested in or their respective references. Also, feel free to update the content inside the subsections with the appropriate information about each dataset and your justifications.
	# 7/25/23
	https://astralcodexten.substack.com/p/were-not-platonists-weve-just-learned
	intelligence explosion

	# 7/23/23 - Towards A Unified Agent with Foundation Models
	https://arxiv.org/abs/2307.09668

	Generate synthetic data set for the state that you want, search over the action space until you find a trajectory that reaches a cosine similarity threshold denoted by the state you want, add all those frames and states of the buffer and incorporate into training

	You can bootstrap process with priors still search for the desired state


	## reward
	Reward any trajectory proportionally to a semantically similar state as any state in a run with a victory condition.
	Linear or some function reward curve


	## Sample curve
	Sections of states with more changes in them

	## notes
	http://www.incompleteideas.net/IncIdeas/BitterLesson.html


	# 7/21/23
	am going to naively, without evidence, state that you can represent any function in text with a large language model.

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
	# measured steps
	Probably should have figured out sooner that small measured steps done in consistent application leads to results. Predicting the outcome while getting there can be interesting but is ultimately just an イメージ in your head.

	# Stack More Layers Differently:
	High-Rank Training Through Low-Rank Updates

	https://arxiv.org/pdf/2307.05695.pdf
	https://github.com/guitaricet/peft_pretraining