Spaces:

DrishtiSharma
/

sql-rag

Running

App Files Files Community

DrishtiSharma commited on Jan 14

Commit

9f3c9dc

verified ·

1 Parent(s): e4ab33c

Update app.py

Browse files

Files changed (1) hide show

app.py +31 -46

app.py CHANGED Viewed

@@ -136,20 +136,17 @@ def ask_gpt4o_for_visualization(query, df, llm, retries=2):
     numeric_columns = df.select_dtypes(include='number').columns.tolist()
     categorical_columns = df.select_dtypes(exclude='number').columns.tolist()
-    # Enhanced Prompt with Diverse, Query-Based Examples
     prompt = f"""
     Analyze the following query and suggest the most suitable visualization(s) using the dataset.
     **Query:** "{query}"
     **Dataset Overview:**
     - **Numeric Columns (for Y-axis):** {', '.join(numeric_columns) if numeric_columns else 'None'}
     - **Categorical Columns (for X-axis or grouping):** {', '.join(categorical_columns) if categorical_columns else 'None'}
     Suggest visualizations in this exact JSON format:
     [
       {{
-        "chart_type": "bar/box/line/scatter/pie/heatmap",
         "x_axis": "categorical_or_time_column",
         "y_axis": "numeric_column",
         "group_by": "optional_column_for_grouping",
@@ -157,9 +154,7 @@ def ask_gpt4o_for_visualization(query, df, llm, retries=2):
         "description": "Why this chart is suitable"
       }}
     ]
     **Query-Based Examples:**
     - **Query:** "What is the salary distribution across different job titles?"
       **Suggested Visualization:**
       {{
@@ -170,84 +165,74 @@ def ask_gpt4o_for_visualization(query, df, llm, retries=2):
         "title": "Salary Distribution by Job Title and Experience",
         "description": "A box plot to show how salaries vary across different job titles and experience levels."
       }}
-    - **Query:** "Show the average salary by company size and industry."
       **Suggested Visualizations:**
       [
         {{
           "chart_type": "bar",
           "x_axis": "company_size",
           "y_axis": "salary_in_usd",
-          "group_by": "industry",
-          "title": "Average Salary by Company Size and Industry",
-          "description": "A grouped bar chart comparing average salaries across company sizes and industries."
         }},
         {{
           "chart_type": "heatmap",
-          "x_axis": "industry",
-          "y_axis": "company_size",
-          "group_by": null,
-          "title": "Salary Heatmap by Industry and Company Size",
-          "description": "A heatmap showing salary concentration across industries and company sizes."
         }}
       ]
-    - **Query:** "How has the company's revenue changed over the years?"
       **Suggested Visualization:**
       {{
         "chart_type": "line",
-        "x_axis": "year",
-        "y_axis": "revenue",
-        "group_by": null,
-        "title": "Yearly Revenue Growth",
-        "description": "A line chart showing revenue growth over time."
       }}
-    - **Query:** "What is the market share of each product category?"
       **Suggested Visualization:**
       {{
         "chart_type": "pie",
-        "x_axis": "product_category",
         "y_axis": null,
         "group_by": null,
-        "title": "Market Share by Product Category",
-        "description": "A pie chart to show the market share distribution across different product categories."
       }}
-    - **Query:** "Is there a correlation between years of experience and salary?"
       **Suggested Visualization:**
       {{
         "chart_type": "scatter",
-        "x_axis": "years_of_experience",
         "y_axis": "salary_in_usd",
-        "group_by": "job_title",
-        "title": "Experience vs Salary by Job Title",
-        "description": "A scatter plot to analyze the relationship between experience and salary across different job titles."
       }}
-    - **Query:** "Which departments have the highest concentration of employees across regions?"
       **Suggested Visualization:**
       {{
         "chart_type": "heatmap",
-        "x_axis": "department",
-        "y_axis": "region",
         "group_by": null,
-        "title": "Employee Distribution by Department and Region",
-        "description": "A heatmap to visualize employee density across departments and regions."
       }}
     Only suggest visualizations that logically match the query and dataset.
     """
     for attempt in range(retries + 1):
         try:
-            # Generate response from the model
             response = llm.generate(prompt)
-            # Load JSON response
             suggestions = json.loads(response)
-            # Validate response structure using the helper function
             if isinstance(suggestions, list):
                 valid_suggestions = [s for s in suggestions if is_valid_suggestion(s)]
                 if valid_suggestions:

     numeric_columns = df.select_dtypes(include='number').columns.tolist()
     categorical_columns = df.select_dtypes(exclude='number').columns.tolist()
+    # Prompt with Dataset-Specific, Query-Based Examples
     prompt = f"""
     Analyze the following query and suggest the most suitable visualization(s) using the dataset.
     **Query:** "{query}"
     **Dataset Overview:**
     - **Numeric Columns (for Y-axis):** {', '.join(numeric_columns) if numeric_columns else 'None'}
     - **Categorical Columns (for X-axis or grouping):** {', '.join(categorical_columns) if categorical_columns else 'None'}
     Suggest visualizations in this exact JSON format:
     [
       {{
+        "chdart_type": "bar/box/line/scatter/pie/heatmap",
         "x_axis": "categorical_or_time_column",
         "y_axis": "numeric_column",
         "group_by": "optional_column_for_grouping",
         "description": "Why this chart is suitable"
       }}
     ]
     **Query-Based Examples:**
     - **Query:** "What is the salary distribution across different job titles?"
       **Suggested Visualization:**
       {{
         "title": "Salary Distribution by Job Title and Experience",
         "description": "A box plot to show how salaries vary across different job titles and experience levels."
       }}
+    - **Query:** "Show the average salary by company size and employment type."
       **Suggested Visualizations:**
       [
         {{
           "chart_type": "bar",
           "x_axis": "company_size",
           "y_axis": "salary_in_usd",
+          "group_by": "employment_type",
+          "title": "Average Salary by Company Size and Employment Type",
+          "description": "A grouped bar chart comparing average salaries across company sizes and employment types."
         }},
         {{
           "chart_type": "heatmap",
+          "x_axis": "company_size",
+          "y_axis": "salary_in_usd",
+          "group_by": "employment_type",
+          "title": "Salary Heatmap by Company Size and Employment Type",
+          "description": "A heatmap showing salary concentration across company sizes and employment types."
         }}
       ]
+    - **Query:** "How has the average salary changed over the years?"
       **Suggested Visualization:**
       {{
         "chart_type": "line",
+        "x_axis": "work_year",
+        "y_axis": "salary_in_usd",
+        "group_by": "experience_level",
+        "title": "Average Salary Trend Over Years",
+        "description": "A line chart showing how the average salary has changed across different experience levels over the years."
       }}
+    - **Query:** "What is the employee distribution by company location?"
       **Suggested Visualization:**
       {{
         "chart_type": "pie",
+        "x_axis": "company_location",
         "y_axis": null,
         "group_by": null,
+        "title": "Employee Distribution by Company Location",
+        "description": "A pie chart showing the distribution of employees across company locations."
       }}
+    - **Query:** "Is there a relationship between remote work ratio and salary?"
       **Suggested Visualization:**
       {{
         "chart_type": "scatter",
+        "x_axis": "remote_ratio",
         "y_axis": "salary_in_usd",
+        "group_by": "experience_level",
+        "title": "Remote Work Ratio vs Salary",
+        "description": "A scatter plot to analyze the relationship between remote work ratio and salary."
       }}
+    - **Query:** "Which job titles have the highest salaries across regions?"
       **Suggested Visualization:**
       {{
         "chart_type": "heatmap",
+        "x_axis": "job_title",
+        "y_axis": "employee_residence",
         "group_by": null,
+        "title": "Salary Heatmap by Job Title and Region",
+        "description": "A heatmap showing the concentration of high-paying job titles across regions."
       }}
     Only suggest visualizations that logically match the query and dataset.
     """
     for attempt in range(retries + 1):
         try:
             response = llm.generate(prompt)
             suggestions = json.loads(response)
             if isinstance(suggestions, list):
                 valid_suggestions = [s for s in suggestions if is_valid_suggestion(s)]
                 if valid_suggestions: