LeetTools commited on
Commit
7b647a2
·
verified ·
1 Parent(s): 6eabcb6

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +88 -126
  2. requirements.txt +2 -1
README.md CHANGED
@@ -1,9 +1,3 @@
1
- ---
2
- title: ask.py
3
- app_file: ask.py
4
- sdk: gradio
5
- sdk_version: 5.3.0
6
- ---
7
  # ask.py
8
 
9
  [![License](https://img.shields.io/github/license/pengfeng/ask.py)](LICENSE)
@@ -11,18 +5,34 @@ sdk_version: 5.3.0
11
  A single Python program to implement the search-extract-summarize flow, similar to AI search
12
  engines such as Perplexity.
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  > [UPDATE]
15
  >
 
 
 
16
  > - 2024-10-22: add GradIO integation
17
  > - 2024-10-21: use DuckDB for the vector search and use API for embedding
18
  > - 2024-10-20: allow to specify a list of input urls
19
  > - 2024-10-18: output-language and output-length parameters for LLM
20
  > - 2024-10-18: date-restrict and target-site parameters for seach
21
 
22
- > [!NOTE]
23
- > Our main goal is to illustrate the basic concepts of AI search engines with the raw constructs.
24
- > Performance or scalability is not in the scope of this program.
25
-
26
  ## The search-extract-summarize flow
27
 
28
  Given a query, the program will
@@ -31,7 +41,9 @@ Given a query, the program will
31
  - crawl and scape the pages for their text content
32
  - chunk the text content into chunks and save them into a vectordb
33
  - perform a vector search with the query and find the top 10 matched chunks
34
- - use the top 10 chunks as the context to ask an LLM to generate the answer
 
 
35
  - output the answer with the references
36
 
37
  Of course this flow is a very simplified version of the real AI search engines, but it is a good
@@ -50,30 +62,33 @@ For example, we can:
50
  ## Quick start
51
 
52
  ```bash
53
-
54
- pip install -r requirements.txt
55
 
56
  # modify .env file to set the API keys or export them as environment variables as below
57
 
58
  # right now we use Google search API
59
- export SEARCH_API_KEY="your-google-search-api-key"
60
- export SEARCH_PROJECT_KEY="your-google-cx-key"
61
 
62
  # right now we use OpenAI API
63
- export LLM_API_KEY="your-openai-api-key"
64
 
65
- # run the program
66
- python ask.py -q "What is an LLM agent?"
 
67
 
68
  # we can specify more parameters to control the behavior such as date_restrict and target_site
69
- python ask.py --help
70
  Usage: ask.py [OPTIONS]
71
 
72
- Search web for the query and summarize the results
73
 
74
  Options:
75
- --web-ui Launch the web interface
76
  -q, --query TEXT Query to search
 
 
 
77
  -d, --date-restrict INTEGER Restrict search results to a specific date
78
  range, default is no restriction
79
  -s, --target-site TEXT Restrict search results to a specific site,
@@ -83,7 +98,12 @@ Options:
83
  --url-list-file TEXT Instead of doing web search, scrape the
84
  target URL list and answer the query based
85
  on the content
86
- -m, --model-name TEXT Model name to use for inference
 
 
 
 
 
87
  -l, --log-level [DEBUG|INFO|WARNING|ERROR]
88
  Set the logging level [default: INFO]
89
  --help Show this message and exit.
@@ -96,117 +116,59 @@ Options:
96
  - [Jinja2](https://jinja.palletsprojects.com/en/3.0.x/)
97
  - [bs4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
98
  - [DuckDB](https://github.com/duckdb/duckdb)
99
- - [GradIO](https://grad.io)
 
100
 
101
- ## Screenshot for the GradIO integration
102
 
103
- ![image](https://github.com/user-attachments/assets/0483e6a2-75d7-4fbd-813f-bfa13839c836)
 
104
 
105
- ## Sample output
106
 
107
- ### General Search
108
 
109
- ```
110
- % python ask.py -q "Why do we need agentic RAG even if we have ChatGPT?"
111
-
112
- Found 10 links for query: Why do we need agentic RAG even if we have ChatGPT?
113
- ✅ Scraping the URLs ...
114
- Scraped 10 URLs ...
115
- Chunking the text ...
116
- Saving to vector DB ...
117
- Querying the vector DB ...
118
- ✅ Running inference with context ...
119
-
120
- # Answer
121
-
122
- Agentic RAG (Retrieval-Augmented Generation) is needed alongside ChatGPT for several reasons:
123
-
124
- 1. **Precision and Contextual Relevance**: While ChatGPT offers generative responses, it may not
125
- reliably provide precise answers, especially when specific, accurate information is critical[5].
126
- Agentic RAG enhances this by integrating retrieval mechanisms that improve response context and
127
- accuracy, allowing users to access the most relevant and recent data without the need for costly
128
- model fine-tuning[2].
129
-
130
- 2. **Customizability**: RAG allows businesses to create tailored chatbots that can securely
131
- reference company-specific data[2]. In contrast, ChatGPT’s broader capabilities may not be
132
- directly suited for specialized, domain-specific questions without comprehensive customization[3].
133
-
134
- 3. **Complex Query Handling**: RAG can be optimized for complex queries and can be adjusted to
135
- work better with specific types of inputs, such as comparing and contrasting information, a task
136
- where ChatGPT may struggle under certain circumstances[9]. This level of customization can lead to
137
- better performance in niche applications where precise retrieval of information is crucial.
138
-
139
- 4. **Asynchronous Processing Capabilities**: Future agentic systems aim to integrate asynchronous
140
- handling of actions, allowing for parallel processing and reducing wait times for retrieval and
141
- computation, which is a limitation in the current form of ChatGPT[7]. This advancement would enhance
142
- overall efficiency and responsiveness in conversations.
143
-
144
- 5. **Incorporating Retrieved Information Effectively**: Using RAG can significantly improve how
145
- retrieved information is utilized within a conversation. By effectively managing the context and
146
- relevance of retrieved documents, RAG helps in framing prompts that can guide ChatGPT towards
147
- delivering more accurate responses[10].
148
-
149
- In summary, while ChatGPT excels in generating conversational responses, agentic RAG brings
150
- precision, customization, and efficiency that can significantly enhance the overall conversational
151
- AI experience.
152
-
153
- # References
154
-
155
- [1] https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204
156
- [2] https://www.linkedin.com/posts/brianjuliusdc_dax-powerbi-chatgpt-activity-7235953280177041408-wQqq
157
- [3] https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204
158
- [4] https://community.openai.com/t/prompt-engineering-for-rag/621495
159
- [5] https://www.ben-evans.com/benedictevans/2024/6/8/building-ai-products
160
- [6] https://community.openai.com/t/prompt-engineering-for-rag/621495
161
- [7] https://www.linkedin.com/posts/kurtcagle_agentic-rag-personalizing-and-optimizing-activity-7198097129993613312-z7Sm
162
- [8] https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204
163
- [9] https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204
164
- [10] https://community.openai.com/t/prompt-engineering-for-rag/621495
165
  ```
166
 
167
- ### Only use the latest information from a specific site
168
 
169
- This following query will only use the information from openai.com that are updated in the previous
170
- day. The behavior is similar to the "site:openai.com" and "date-restrict" search parameters in Google
171
- search.
 
172
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
173
  ```
174
- % python ask.py -q "OpenAI Swarm Framework" -d 1 -s openai.com
175
- Found 10 links for query: OpenAI Swarm Framework
176
- ✅ Scraping the URLs ...
177
- Scraped 10 URLs ...
178
- ✅ Chunking the text ...
179
- Saving to vector DB ...
180
- Querying the vector DB to get context ...
181
- Running inference with context ...
182
-
183
- # Answer
184
-
185
- OpenAI Swarm Framework is an experimental platform designed for building, orchestrating, and
186
- deploying multi-agent systems, enabling multiple AI agents to collaborate on complex tasks. It contrasts
187
- with traditional single-agent models by facilitating agent interaction and coordination, thus enhancing
188
- efficiency[5][9]. The framework provides developers with a way to orchestrate these agent systems in
189
- a lightweight manner, leveraging Node.js for scalable applications[1][4].
190
-
191
- One implementation of this framework is Swarm.js, which serves as a Node.js SDK, allowing users to
192
- create and manage agents that perform tasks and hand off conversations. Swarm.js is positioned as
193
- an educational tool, making it accessible for both beginners and experts, although it may still contain
194
- bugs and is currently lightweight[1][3][7]. This new approach emphasizes multi-agent collaboration and is
195
- well-suited for back-end development, requiring some programming expertise for effective implementation[9].
196
-
197
- Overall, OpenAI Swarm facilitates a shift in how AI systems can collaborate, differing from existing
198
- OpenAI tools by focusing on backend orchestration rather than user-interactive front-end applications[9].
199
-
200
- # References
201
-
202
- [1] https://community.openai.com/t/introducing-swarm-js-node-js-implementation-of-openai-swarm/977510
203
- [2] https://community.openai.com/t/introducing-swarm-js-a-node-js-implementation-of-openai-swarm/977510
204
- [3] https://community.openai.com/t/introducing-swarm-js-node-js-implementation-of-openai-swarm/977510
205
- [4] https://community.openai.com/t/introducing-swarm-js-a-node-js-implementation-of-openai-swarm/977510
206
- [5] https://community.openai.com/t/swarm-some-initial-insights/976602
207
- [6] https://community.openai.com/t/swarm-some-initial-insights/976602
208
- [7] https://community.openai.com/t/introducing-swarm-js-node-js-implementation-of-openai-swarm/977510
209
- [8] https://community.openai.com/t/introducing-swarm-js-a-node-js-implementation-of-openai-swarm/977510
210
- [9] https://community.openai.com/t/swarm-some-initial-insights/976602
211
- [10] https://community.openai.com/t/swarm-some-initial-insights/976602
212
- ```
 
 
 
 
 
 
 
1
  # ask.py
2
 
3
  [![License](https://img.shields.io/github/license/pengfeng/ask.py)](LICENSE)
 
5
  A single Python program to implement the search-extract-summarize flow, similar to AI search
6
  engines such as Perplexity.
7
 
8
+ - You can run it on command line or with a GradIO UI.
9
+ - You can control the output behavior, e.g., extract structured data or change output language,
10
+ - You can control the search behavior, e.g., restrict to a specific site or date, or just scrape
11
+ a specified list of URLs.
12
+ - You can run it in a cron job or bash script to automate complex search/data extraction tasks.
13
+
14
+ We have a running UI example [in HuggingFace Spaces](https://huggingface.co/spaces/leettools/AskPy).
15
+
16
+ ![image](https://github.com/user-attachments/assets/0483e6a2-75d7-4fbd-813f-bfa13839c836)
17
+
18
+ > [!NOTE]
19
+ >
20
+ > - Our main goal is to illustrate the basic concepts of AI search engines with the raw constructs.
21
+ > Performance or scalability is not in the scope of this program.
22
+ > - We are planning to open source a real search-enabled AI toolset with real DB setup, real document
23
+ > pipeline, and real query engine soon. Star and watch this repo for updates!
24
+
25
  > [UPDATE]
26
  >
27
+ > - 2024-11-10: add Chonkie as the default chunker
28
+ > - 2024-10-28: add extract function as a new output mode
29
+ > - 2024-10-25: add hybrid search demo using DuckDB full-text search
30
  > - 2024-10-22: add GradIO integation
31
  > - 2024-10-21: use DuckDB for the vector search and use API for embedding
32
  > - 2024-10-20: allow to specify a list of input urls
33
  > - 2024-10-18: output-language and output-length parameters for LLM
34
  > - 2024-10-18: date-restrict and target-site parameters for seach
35
 
 
 
 
 
36
  ## The search-extract-summarize flow
37
 
38
  Given a query, the program will
 
41
  - crawl and scape the pages for their text content
42
  - chunk the text content into chunks and save them into a vectordb
43
  - perform a vector search with the query and find the top 10 matched chunks
44
+ - [Optional] search using full-text search and combine the results with the vector search
45
+ - [Optional] use a reranker to re-rank the top chunks
46
+ - use the top chunks as the context to ask an LLM to generate the answer
47
  - output the answer with the references
48
 
49
  Of course this flow is a very simplified version of the real AI search engines, but it is a good
 
62
  ## Quick start
63
 
64
  ```bash
65
+ # recommend to use Python 3.10 or later and use venv or conda to create a virtual environment
66
+ % pip install -r requirements.txt
67
 
68
  # modify .env file to set the API keys or export them as environment variables as below
69
 
70
  # right now we use Google search API
71
+ % export SEARCH_API_KEY="your-google-search-api-key"
72
+ % export SEARCH_PROJECT_KEY="your-google-cx-key"
73
 
74
  # right now we use OpenAI API
75
+ % export LLM_API_KEY="your-openai-api-key"
76
 
77
+ # By default, the program will start a web UI. See GradIO Deployment section for more info.
78
+ # Run the program on command line with -c option
79
+ % python ask.py -c -q "What is an LLM agent?"
80
 
81
  # we can specify more parameters to control the behavior such as date_restrict and target_site
82
+ % python ask.py --help
83
  Usage: ask.py [OPTIONS]
84
 
85
+ Search web for the query and summarize the results.
86
 
87
  Options:
 
88
  -q, --query TEXT Query to search
89
+ -o, --output-mode [answer|extract]
90
+ Output mode for the answer, default is a
91
+ simple answer
92
  -d, --date-restrict INTEGER Restrict search results to a specific date
93
  range, default is no restriction
94
  -s, --target-site TEXT Restrict search results to a specific site,
 
98
  --url-list-file TEXT Instead of doing web search, scrape the
99
  target URL list and answer the query based
100
  on the content
101
+ --extract-schema-file TEXT Pydantic schema for the extract mode
102
+ --inference-model-name TEXT Model name to use for inference
103
+ --hybrid-search Use hybrid search mode with both vector
104
+ search and full-text search
105
+ -c, --run-cli Run as a command line tool instead of
106
+ launching the Gradio UI
107
  -l, --log-level [DEBUG|INFO|WARNING|ERROR]
108
  Set the logging level [default: INFO]
109
  --help Show this message and exit.
 
116
  - [Jinja2](https://jinja.palletsprojects.com/en/3.0.x/)
117
  - [bs4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
118
  - [DuckDB](https://github.com/duckdb/duckdb)
119
+ - [GradIO](https://github.com/gradio-app/gradio)
120
+ - [Chonkie](https://github.com/bhavnicksm/chonkie)
121
 
122
+ ## GradIO Deployment
123
 
124
+ > [!NOTE]
125
+ > Original GradIO app-sharing document [here](https://www.gradio.app/guides/sharing-your-app).
126
 
127
+ ### Quick test and sharing
128
 
129
+ By default, the program will start a web UI and share through GradIO.
130
 
131
+ ```bash
132
+ % python ask.py
133
+ * Running on local URL: http://127.0.0.1:7860
134
+ * Running on public URL: https://77c277af0330326587.gradio.live
135
+
136
+ # you can also specify SHARE_GRADIO_UI to only run locally
137
+ % export SHARE_GRADIO_UI=False
138
+ % python ask.py
139
+ * Running on local URL: http://127.0.0.1:7860
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
  ```
141
 
142
+ ### To share a more permanent link using HuggingFace Spaces
143
 
144
+ - First, you need to [create a free HuggingFace account](https://huggingface.co/welcome).
145
+ - Then in your [settings/token page](https://huggingface.co/settings/tokens), create a new token with Write permissions.
146
+ - In your terminal, run the following commands in you app directory to deploy your program to
147
+ HuggingFace Spaces:
148
 
149
+ ```bash
150
+ % pip install gradio
151
+ % gradio deploy
152
+ Creating new Spaces Repo in '/home/you/ask.py'. Collecting metadata, press Enter to accept default value.
153
+ Enter Spaces app title [ask.py]: ask.py
154
+ Enter Gradio app file [ask.py]:
155
+ Enter Spaces hardware (cpu-basic, cpu-upgrade, t4-small, t4-medium, l4x1, l4x4, zero-a10g, a10g-small, a10g-large, a10g-largex2, a10g-largex4, a100-large, v5e-1x1, v5e-2x2, v5e-2x4) [cpu-basic]:
156
+ Any Spaces secrets (y/n) [n]: y
157
+ Enter secret name (leave blank to end): SEARCH_API_KEY
158
+ Enter secret value for SEARCH_API_KEY: YOUR_SEARCH_API_KEY
159
+ Enter secret name (leave blank to end): SEARCH_PROJECT_KEY
160
+ Enter secret value for SEARCH_API_KEY: YOUR_SEARCH_PROJECT_KEY
161
+ Enter secret name (leave blank to end): LLM_API_KEY
162
+ Enter secret value for LLM_API_KEY: YOUR_LLM_API_KEY
163
+ Enter secret name (leave blank to end):
164
+ Create Github Action to automatically update Space on 'git push'? [n]: n
165
+ Space available at https://huggingface.co/spaces/your_user_name/ask.py
166
  ```
167
+
168
+ Now you can use the HuggingFace space app to run your queries.
169
+
170
+ ## Use Cases
171
+
172
+ - [Search like Perplexity](demos/search_and_answer.md)
173
+ - [Only use the latest information from a specific site](demos/search_on_site_and_date.md)
174
+ - [Extract information from web search results](demos/search_and_extract.md)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -6,4 +6,5 @@ bs4==0.0.2
6
  lxml==4.8.0
7
  python-dotenv==1.0.1
8
  duckdb==1.1.2
9
- gradio==5.3.0
 
 
6
  lxml==4.8.0
7
  python-dotenv==1.0.1
8
  duckdb==1.1.2
9
+ gradio==5.3.0
10
+ chonkie==0.1.2