Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
title: ๐น๐ฅ๐ธDeepResearchEvaluator | |
emoji: ๐น๐ฅ๐ธ | |
colorFrom: red | |
colorTo: purple | |
sdk: streamlit | |
sdk_version: 1.41.1 | |
app_file: app.py | |
pinned: true | |
license: mit | |
short_description: Deep Research Evaluator for Long Horizon Learning Tasks | |
# ๐ต', '๐ถ', '๐ธ', '๐น', '๐บ', '๐ท', '๐ฅ', '๐ป | |
A Deep Research Evaluator is a conceptual AI system designed to analyze and synthesize information from extensive research literature, such as arXiv papers, to learn about specific topics and generate code applicable to long-horizon tasks in AI. This involves understanding complex subjects, identifying relevant methodologies, and implementing solutions that require planning and execution over extended sequences. | |
Key Topics and Related Papers: | |
Long-Horizon Task Planning in Robotics: | |
"MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model" | |
Authors: Yike Wu, Jiatao Zhang, Nan Hu, LanLing Tang, Guilin Qi, Jun Shao, Jie Ren, Wei Song | |
This paper introduces a method that decomposes complex tasks at multiple levels to enhance planning capabilities using open-source large language models. | |
ARXIV | |
"ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning" | |
Authors: Zhehua Zhou, Jiayang Song, Kunpeng Yao, Zhan Shu, Lei Ma | |
The study presents a framework that improves LLM-based planning through an iterative self-refinement process, enhancing feasibility and correctness in task plans. | |
ARXIV | |
Skill-Based Reinforcement Learning: | |
"Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks" | |
Authors: Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu | |
This research focuses on building multi-task agents in open-world environments by learning basic skills and planning over them to accomplish long-horizon tasks efficiently. | |
ARXIV | |
"SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks" | |
Authors: Yongyan Wen, Siyuan Li, Rongchang Zuo, Lei Yuan, Hangyu Mao, Peng Liu | |
The paper proposes a framework that integrates a differentiable decision tree within the high-level policy to generate skill embeddings, enhancing explainability in decision-making for complex tasks. | |
ARXIV | |
Neuro-Symbolic Approaches: | |
"Learning for Long-Horizon Planning via Neuro-Symbolic Abductive Imitation" | |
Authors: Jie-Jing Shao, Hao-Ran Hao, Xiao-Wen Yang, Yu-Feng Li | |
This work introduces a framework that combines data-driven learning and symbolic-based reasoning to enable long-horizon planning through abductive imitation learning. | |
ARXIV | |
"CaStL: Constraints as Specifications through LLM Translation for Long-Horizon Task and Motion Planning" | |
Authors: [Authors not specified] | |
The study presents a method that utilizes large language models to translate constraints into formal specifications, facilitating long-horizon task and motion planning. | |
ARXIV | |
Evaluation Frameworks for AI Models: | |
"ASI: Accuracy-Stability Index for Evaluating Deep Learning Models" | |
Authors: Wei Dai, Daniel Berleant | |
The paper introduces the Accuracy-Stability Index (ASI), a quantitative measure that incorporates both accuracy and stability for assessing deep learning models. | |
ARXIV | |
"Benchmarks for Deep Off-Policy Evaluation" | |
Authors: Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine | |
This research provides a collection of policies that, in conjunction with existing offline datasets, can be used for benchmarking off-policy evaluation in deep learning. | |
ARXIV | |
These topics and papers contribute to the development of AI systems capable of understanding research literature and applying the acquired knowledge to complex, long-horizon tasks, thereby advancing the field of artificial intelligence. | |
--- | |
Features: | |
๐ฏ Core Configuration & Setup | |
Configures Streamlit page with title "๐ฒBikeAI๐ Claude/GPT Research" | |
๐ API Setup & Clients | |
Initializes OpenAI, Anthropic, and HuggingFace API clients with environment variables | |
๐ Session State Management | |
Manages conversation history, transcripts, file editing states, and model selections | |
๐ง get_high_info_terms() | |
Extracts meaningful keywords from text while filtering common stop words | |
๐ท๏ธ clean_text_for_filename() | |
Sanitizes text to create valid filenames by removing special characters | |
๐ generate_filename() | |
Creates intelligent filenames based on content and timestamps | |
๐พ create_file() | |
Saves prompt and response content to files with smart naming | |
๐ get_download_link() | |
Generates base64-encoded download links for files | |
๐ค clean_for_speech() | |
Prepares text for speech synthesis by removing special characters | |
๐ฃ๏ธ speech_synthesis_html() | |
Creates HTML for browser-based speech synthesis | |
๐ edge_tts_generate_audio() | |
Generates MP3 audio files using Edge TTS | |
๐ต speak_with_edge_tts() | |
Wrapper for Edge TTS audio generation | |
๐ง play_and_download_audio() | |
Creates audio player interface with download option | |
๐ธ process_image() | |
Analyzes images using GPT-4V | |
๐๏ธ process_audio() | |
Transcribes audio using Whisper | |
๐ฅ process_video() | |
Extracts frames from video files | |
๐ค process_video_with_gpt() | |
Analyzes video frames using GPT-4V | |
๐ parse_arxiv_refs() | |
Parses research paper references into structured format | |
๐ perform_ai_lookup() | |
Searches and processes arXiv papers with audio summaries | |
๐ create_zip_of_files() | |
Bundles multiple files into a zip with smart naming | |
๐ load_files_for_sidebar() | |
Organizes files by timestamp for sidebar display | |
๐ท๏ธ extract_keywords_from_md() | |
Pulls keywords from markdown files for organization | |
๐ display_file_manager_sidebar() | |
Creates interactive sidebar for file management | |
๐ฌ main() | |
Orchestrates overall application flow and UI components | |