Spaces:

cjber
/

planning-ai

Build error

App Files Files Community

cjber commited on Feb 14

Commit

d5b7cf9

1 Parent(s): d98a335

feat: add eval comparison

Browse files

Files changed (3) hide show

planning_ai/eval/compare_summaries.py +58 -0
planning_ai/eval/eval.txt +32 -0
planning_ai/eval/summary.txt +5 -0

planning_ai/eval/compare_summaries.py ADDED Viewed

	@@ -0,0 +1,58 @@

+import polars as pl
+from langchain_core.output_parsers import StrOutputParser
+from langchain_core.prompts import ChatPromptTemplate
+from pydantic import BaseModel, Field
+from planning_ai.common.utils import Paths
+from planning_ai.llms.llm import GPT4o
+with open("./planning_ai/eval/eval.txt", "r") as f:
+    compare_template = f.read()
+with open("./planning_ai/eval/summary.txt", "r") as f:
+    summary_template = f.read()
+class SummaryEvaluator(BaseModel):
+    score: int = Field(..., description="The number of the best summary.")
+SLLM = GPT4o.with_structured_output(SummaryEvaluator, strict=True)
+compare_prompt = ChatPromptTemplate([("system", compare_template)])
+compare_chain = compare_prompt | SLLM
+summary_prompt = ChatPromptTemplate([("system", summary_template)])
+summary_chain = summary_prompt | GPT4o | StrOutputParser()
+original = pl.read_parquet(Paths.STAGING / "gcpt3.parquet").filter(
+    pl.col("attachments_id").is_null()
+)
+summaries1 = original[["text", "representations_summary"]].unique().head(20)
+summaries2 = summaries1[["text"]]
+summaries2 = summaries2.with_columns(
+    pl.col("text")
+    .map_elements(
+        lambda x: summary_chain.invoke({"content": x}), return_dtype=pl.String
+    )
+    .alias("summary")
+)
+summaries = summaries1.join(summaries2, on="text")
+summaries = summaries.with_columns(
+    pl.struct(["text", "representations_summary", "summary"])
+    .map_elements(
+        lambda x: compare_chain.invoke(
+            {
+                "document": x["text"],
+                "summary_1": x["representations_summary"],
+                "summary_2": x["summary"],
+            }
+        ).score,
+        return_dtype=pl.Int8,
+    )
+    .alias("score")
+)
+summaries["score"].value_counts()

planning_ai/eval/eval.txt ADDED Viewed

	@@ -0,0 +1,32 @@

+**Task:**
+You are grading **two** text summaries of a source document to determine which summary more comprehensively captures the key points within the source document.
+### **Evaluation Criteria:**
+A good summary should:
+1. **Be accurate** – It should not include information that is not present in the source document.
+2. **Be comprehensive** – It should reflect all key points in the source document without omitting important details.
+3. **Be well-grounded** – It should be based entirely on the source document without adding interpretations, opinions, or external information.
+### **Scoring System:**
+- **Score 0:** Neither summary sufficiently captures the key points.
+- **Score 1:** The first summary better reflects the content of the source document.
+- **Score 2:** The second summary better reflects the content of the source document.
+- **Score 3:** Both summaries are sufficiently accurate and comprehensive.
+### **Evaluation Process:**
+1. **Compare each summary to the source document.** Identify whether each summary includes all key points, omits critical details, or introduces extraneous information.
+2. **Assess which summary better aligns with the source document.** Determine whether one summary is significantly more accurate and comprehensive.
+---
+**Source Document:**
+{document}
+**Summary 1:**
+{summary_1}
+**Summary 2:**
+{summary_2}
+**Final Score (0-3):**

planning_ai/eval/summary.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+Please analyze the response to the planning application provided below. Provide a concise summary of the response, highlighting the main points and any significant details.
+Response:
+{content}