tmlinhdinh commited on
Commit
69408b5
1 Parent(s): b8e2ff5
Files changed (2) hide show
  1. report/report.ipynb +0 -0
  2. report/report.md +0 -34
report/report.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
report/report.md CHANGED
@@ -19,11 +19,7 @@ I would also like to test out **Semantic Chunking** because it focuses on dividi
19
  # Task 2: Building a Quick End-to-End Prototype
20
 
21
  ### 1. Build a prototype and deploy to a Hugging Face Space, and create a short (< 2 min) loom video demonstrating some initial testing inputs and outputs.
22
- <<<<<<< HEAD
23
  Loom link: https://www.loom.com/share/2ee6825aed60424da6aadc414bbc800a?sid=1994ecda-c52e-4e71-afe9-166e862296e4
24
- =======
25
- Link:
26
- >>>>>>> e107018 (add report)
27
 
28
  ### 2. How did you choose your stack, and why did you select each tool the way you did?
29
  I built a Retrieval-Augmented Generation (RAG) system with:
@@ -44,10 +40,6 @@ Each tool was selected to balance performance, scalability, and efficiency for t
44
  ![](./ragas_prototype.png)
45
 
46
  ### 2. What conclusions can you draw about performance and effectiveness of your pipeline with this information?
47
- The pipeline shows strong retrieval performance but needs improvement in generating accurate answers:
48
-
49
- <<<<<<< HEAD
50
- ### What conclusions can you draw about performance and effectiveness of your pipeline with this information?
51
 
52
  The pipeline demonstrates reasonably strong retrieval capabilities but reveals areas for improvement in the accuracy and relevance of generated answers:
53
 
@@ -58,57 +50,31 @@ The pipeline demonstrates reasonably strong retrieval capabilities but reveals a
58
 
59
  ### Summary:
60
  - The pipeline is doing well at retrieving relevant context, but the generation of accurate and faithful answers needs refinement. This suggests a potential need for either a more advanced QA model or improvements in how retrieved chunks are passed to the answer generation process.
61
- =======
62
- ### Strengths:
63
- - **Context Recall (92.2%)** and **Precision (91.9%)** are high, meaning the system effectively retrieves relevant information.
64
- - **Faithfulness (75.9%)** indicates that the generated answers are mostly grounded in retrieved data.
65
-
66
- ### Weaknesses:
67
- - **Answer Correctness (52.6%)** and **Relevancy (67.6%)** need improvement, as the system struggles to generate consistently correct and relevant responses.
68
-
69
- **Summary**: Retrieval is excellent, but the QA generation needs refinement for more accurate answers.
70
- >>>>>>> e107018 (add report)
71
 
72
  # Task 4: Fine-Tuning Open-Source Embeddings
73
 
74
  ### 1. Swap out your existing embedding model for the new fine-tuned version. Provide a link to your fine-tuned embedding model on the Hugging Face Hub.
75
- <<<<<<< HEAD
76
  Fine-tuning model link: https://huggingface.co/ldldld/snowflake-arctic-embed-m-finetuned
77
 
78
  ### 2. How did you choose the embedding model for this application?
79
  I selected `Snowflake/snowflake-arctic-embed-m` as the model for fine-tuning. To make this choice, I referred to the `mteb/leaderboard`, filtered for models with fewer than 250M parameters. Then I looked at all the top ranking models, filtered out models from personal accounts and models that require me to execute some suspicious executable. That ultimately left me with `Snowflake/snowflake-arctic-embed-m`, which is actually the one we used in class.
80
- =======
81
-
82
- ### 2. How did you choose the embedding model for this application?
83
- >>>>>>> e107018 (add report)
84
 
85
  # Task 5: Assessing Performance
86
 
87
  ### 1. Test the fine-tuned embedding model using the RAGAS frameworks to quantify any improvements. Provide results in a table.
88
- <<<<<<< HEAD
89
  ![](./ragas_finetune.png)
90
 
91
  It seems that off-the-shelve embedding model from OpenAI `text-embedding-3-small` is still better for our RAG, which honestly isn't too surprising.
92
- =======
93
- >>>>>>> e107018 (add report)
94
 
95
  ### 2. Test the two chunking strategies using the RAGAS frameworks to quantify any improvements. Provide results in a table.
96
 
97
  ### 3. The AI Solutions Engineer asks you “Which one is the best to test with internal stakeholders next week, and why?”
98
- <<<<<<< HEAD
99
  The original prototype is ideal for testing with internal stakeholders next week: it offers strong performance and is straightforward to implement. The only drawback is that it's not open-sourced. If this is a critical requirement, we can confirm with stakeholders and then explore the fine-tuning path. Based on initial results, we could likely fine-tune open-source models to achieve performance similar to that of OpenAI's `text-embedding-3-small`.
100
- =======
101
- >>>>>>> e107018 (add report)
102
 
103
  # Task 6: Managing Your Boss and User Expectations
104
 
105
  ### 1. What is the story that you will give to the CEO to tell the whole company at the launch next month?
106
- <<<<<<< HEAD
107
  We're excited to introduce our **AI Industry Insights chatbot**, designed to provide real-time, nuanced guidance on the rapidly evolving impact of AI—especially in the context of politics and ethical enterprise applications. As we move through an election cycle and navigate the uncertainties around AI regulations, our chatbot empowers users to stay informed and make confident decisions. The tool leverages cutting-edge technology, offering insightful, up-to-date information on how AI is shaping industries and government policies. It’s a reliable companion for anyone looking to understand the future of AI in business and governance.
108
 
109
  ### 2. There appears to be important information not included in our build, for instance, the 270-day update on the 2023 executive order on Safe, Secure, and Trustworthy AI. How might you incorporate relevant white-house briefing information into future versions?
110
  I'd add the new relevant white-house briefing information into the QDrant vectorstore. Then depending on if we use an open-sourced model or not, I'd proceed with re-finetuning the embedding model and evaluate with RAGAS.
111
- =======
112
-
113
- ### 2. There appears to be important information not included in our build, for instance, the 270-day update on the 2023 executive order on Safe, Secure, and Trustworthy AI. How might you incorporate relevant white-house briefing information into future versions?
114
- >>>>>>> e107018 (add report)
 
19
  # Task 2: Building a Quick End-to-End Prototype
20
 
21
  ### 1. Build a prototype and deploy to a Hugging Face Space, and create a short (< 2 min) loom video demonstrating some initial testing inputs and outputs.
 
22
  Loom link: https://www.loom.com/share/2ee6825aed60424da6aadc414bbc800a?sid=1994ecda-c52e-4e71-afe9-166e862296e4
 
 
 
23
 
24
  ### 2. How did you choose your stack, and why did you select each tool the way you did?
25
  I built a Retrieval-Augmented Generation (RAG) system with:
 
40
  ![](./ragas_prototype.png)
41
 
42
  ### 2. What conclusions can you draw about performance and effectiveness of your pipeline with this information?
 
 
 
 
43
 
44
  The pipeline demonstrates reasonably strong retrieval capabilities but reveals areas for improvement in the accuracy and relevance of generated answers:
45
 
 
50
 
51
  ### Summary:
52
  - The pipeline is doing well at retrieving relevant context, but the generation of accurate and faithful answers needs refinement. This suggests a potential need for either a more advanced QA model or improvements in how retrieved chunks are passed to the answer generation process.
 
 
 
 
 
 
 
 
 
 
53
 
54
  # Task 4: Fine-Tuning Open-Source Embeddings
55
 
56
  ### 1. Swap out your existing embedding model for the new fine-tuned version. Provide a link to your fine-tuned embedding model on the Hugging Face Hub.
 
57
  Fine-tuning model link: https://huggingface.co/ldldld/snowflake-arctic-embed-m-finetuned
58
 
59
  ### 2. How did you choose the embedding model for this application?
60
  I selected `Snowflake/snowflake-arctic-embed-m` as the model for fine-tuning. To make this choice, I referred to the `mteb/leaderboard`, filtered for models with fewer than 250M parameters. Then I looked at all the top ranking models, filtered out models from personal accounts and models that require me to execute some suspicious executable. That ultimately left me with `Snowflake/snowflake-arctic-embed-m`, which is actually the one we used in class.
 
 
 
 
61
 
62
  # Task 5: Assessing Performance
63
 
64
  ### 1. Test the fine-tuned embedding model using the RAGAS frameworks to quantify any improvements. Provide results in a table.
 
65
  ![](./ragas_finetune.png)
66
 
67
  It seems that off-the-shelve embedding model from OpenAI `text-embedding-3-small` is still better for our RAG, which honestly isn't too surprising.
 
 
68
 
69
  ### 2. Test the two chunking strategies using the RAGAS frameworks to quantify any improvements. Provide results in a table.
70
 
71
  ### 3. The AI Solutions Engineer asks you “Which one is the best to test with internal stakeholders next week, and why?”
 
72
  The original prototype is ideal for testing with internal stakeholders next week: it offers strong performance and is straightforward to implement. The only drawback is that it's not open-sourced. If this is a critical requirement, we can confirm with stakeholders and then explore the fine-tuning path. Based on initial results, we could likely fine-tune open-source models to achieve performance similar to that of OpenAI's `text-embedding-3-small`.
 
 
73
 
74
  # Task 6: Managing Your Boss and User Expectations
75
 
76
  ### 1. What is the story that you will give to the CEO to tell the whole company at the launch next month?
 
77
  We're excited to introduce our **AI Industry Insights chatbot**, designed to provide real-time, nuanced guidance on the rapidly evolving impact of AI—especially in the context of politics and ethical enterprise applications. As we move through an election cycle and navigate the uncertainties around AI regulations, our chatbot empowers users to stay informed and make confident decisions. The tool leverages cutting-edge technology, offering insightful, up-to-date information on how AI is shaping industries and government policies. It’s a reliable companion for anyone looking to understand the future of AI in business and governance.
78
 
79
  ### 2. There appears to be important information not included in our build, for instance, the 270-day update on the 2023 executive order on Safe, Secure, and Trustworthy AI. How might you incorporate relevant white-house briefing information into future versions?
80
  I'd add the new relevant white-house briefing information into the QDrant vectorstore. Then depending on if we use an open-sourced model or not, I'd proceed with re-finetuning the embedding model and evaluate with RAGAS.