Spaces:

bijayjr
/

SentimentNews

Sleeping

App Files Files Community

bijayjr commited on Mar 23

Commit

8c60672

1 Parent(s): b2c3642

Documentation added

Browse files

Files changed (8) hide show

README.md +163 -0
api.py +30 -36
app.py +56 -51
research/trials.ipynb +0 -0
setup.py +0 -10
src/comparison.py +26 -17
src/summarization.py +17 -9
src/utils.py +57 -18

README.md CHANGED Viewed

@@ -6,3 +6,166 @@ colorTo: yellow
 sdk: docker
 pinned: false
 ---

 sdk: docker
 pinned: false
 ---
+# **Insightful News AI** 🤖
+## Overview
+Insightful News AI is a FastAPI-based application tool that fetches news articles for a given company, analyzes their sentiment, extracts key topics, and provides Hindi audio summaries. The application uses NLP models for text processing and an AI-powered TTS engine for speech synthesis.
+## **Project Setup**
+### Prerequisites
+- Python 3.10 (Virtual Environment: `tts`)
+- FastAPI
+- Streamlit
+- Required Python packages (listed in `requirements.txt`)
+## 🚀 Installation Steps
+### **1️⃣ Clone the Repository**
+```bash
+git clone https://github.com/bijaycd/News-Summarization-and-Text-to-Speech-Application.git
+cd News-Summarization-and-Text-to-Speech-Application
+```
+### **2️⃣ Create and Activate a Virtual Environment**
+```bash
+python -m venv tts
+source venv/bin/activate  # On macOS/Linux
+venv\Scripts\activate  # On Windows
+```
+### **3️⃣ Install Dependencies**
+```bash
+pip install -r requirements.txt
+```
+### **4️⃣ Set Up Environment Variables**
+Create a `.env` file and add your API keys:
+```
+GROQ_API_KEY=your_api_key_here
+```
+### **5️⃣ Run the Backend (FastAPI Server)**
+```bash
+uvicorn app:app --host 127.0.0.1 --port 8000 --reload
+```
+### **6️⃣ Run the Frontend (Streamlit UI)**
+```bash
+streamlit run app.py
+```
+---
+## **🧠 Model Details**
+This project uses **three AI models**:
+1. **Summarization Model** (Mistral)
+   - Extracts key insights from news articles.
+   - Uses the `mistral-saba-24b` model for concise summarization.
+2. **Sentiment Analysis Model** (DistilBERT)
+   - Categorizes sentiment as **Positive, Negative, or Neutral**.
+   - Analyzes news sentiment using pre-trained NLP models.
+3. **Text-to-Speech (TTS) Model** (Google Text-to-Speech)
+   - Converts AI-generated summaries into **Hindi speech**.
+   - Uses a custom speech synthesis model.
+---
+## **🛠 API Development**
+### **Endpoints**
+| Method | Endpoint                  | Description |
+|--------|---------------------------|-------------|
+| `GET`  | `/`                        | Welcome message. |
+| `GET`  | `/news-analysis/?company=XYZ` | Extracts news titles, summaries and analyzes sentiments |
+| `GET`  | `/comparative-analyst/?company=XYZ` | Performs a comparative sentiment analysis. |
+| `GET`  | `/generate-audio/?company=XYZ` | Generates a Hindi audio summary. |
+---
+## **📡 API Usage**
+### **1️⃣ Fetch News Sentiment Summary**
+**Request (Postman, cURL, Python)**
+```bash
+curl -X GET "http://127.0.0.1:8000/news-analysis/?company=Google"
+```
+**Response (JSON)**
+```json
+{
+    "Company": "Google",
+    "Articles": [
+        {
+            "Title": "Google removes 331 malicious apps from Play Store",
+            "Summary": "Vapor Operation infected 331 apps with 60M+ downloads, engaging in ad fraud and phishing...",
+            "Sentiment": "Negative",
+            "Topics": ["Vapor Operation", "Andriod 13", "Security"]
+        }
+    ]
+}
+```
+### **2️⃣ Comparative Sentiment Analysis**
+```bash
+curl -X GET "http://127.0.0.1:8000/comparative-analyst/?company=Google"
+```
+**Response Example**
+```json
+{
+    "Sentiment Analysis": {
+        "Sentiment Distribution": {"Positive": 5, "Negative": 4, "Neutral": 1},
+        "Final Sentiment Summary": "Overall, Google news has a positive sentiment..."
+    }
+}
+```
+### **3️⃣ Generate Hindi Audio Summary**
+```bash
+curl -X GET "http://127.0.0.1:8000/generate-audio/?company=Google"
+```
+**Response**
+- Returns an `mp3` file with the Hindi audio summary.
+---
+## **🔗 Third-Party APIs Used**
+| API       | Purpose |
+|-----------|---------|
+| **Groq API** | Used for text summarization (Mistral). |
+---
+## **⚠ Assumptions & Limitations**
+### **✅ Assumptions**
+1. **Company Name Input**: The company name entered exists in publicly available news.
+2. **Sentiment Accuracy**: The sentiment analysis model is trained on general news data but may not capture sarcasm or nuanced sentiment.
+3. **Keyword Extraction**: Uses KeyBERT for topic extraction, assuming that key topics can be identified from short summaries.
+### **🚨 Limitations**
+1. **News Data Availability**: If fewer than 10 articles are found, comparative analysis may not be performed.
+2. **TTS Language Support**: Currently, speech generation is limited to **Hindi** only.
+3. **Rate Limits**: Using the Groq API requires an API key with rate limits.
+---
+## **📌 Future Enhancements**
+✅ **Expand TTS support to more languages**
+✅ **Improve sentiment classification using fine-tuned LLMs**
+✅ **Enable real-time news updates using WebSockets**
+---
+### **🔗 Contributors**
+👤 **Bijay Chandra Das**
+📧 **[email protected]**
+📌 **GitHub Repo**: [GitHub](https://github.com/bijaycd/News-Summarization-and-Text-to-Speech-Application)
+---

api.py CHANGED Viewed

@@ -4,76 +4,70 @@ import uvicorn
 from src.utils import extract_news, analyze_sentiment, extract_keywords_keybert, generate_hindi_speech
 from src.comparison import comparison_analysis
 from src.summarization import summarize_overall_sentiment
 app = FastAPI()
 @app.get("/")
-def home():
     return {"message": "Welcome to the News Analysis API!"}
 @app.get("/news-analysis/")
-def get_news_analysis(company: str):
     """Extracts news, analyzes sentiment, and provides a JSON response."""
     articles = extract_news(company)[:10]  # Extract first 10 articles
     if not articles:
         raise HTTPException(status_code=404, detail="No articles found for the given company.")
-    news_data = {"Company": company, "Articles": []}
-    for article in articles:
-        sentiment = analyze_sentiment(article["summary"])  # Analyze sentiment
-        topics = extract_keywords_keybert(article["summary"])  # Extract key topics
-        news_data["Articles"].append({
-            "Title": article["title"],
-            "Summary": article["summary"],
-            "Sentiment": sentiment,
-            "Topics": topics
-        })
     return JSONResponse(content=news_data)
 @app.get("/comparative-analyst/")
-def get_comparative_analysis(company: str):
-    # ✅ Extract 10 articles
     articles = extract_news(company)[:10]
     if len(articles) < 10:
         raise HTTPException(status_code=400, detail="Not enough articles for a full comparison.")
-    # ✅ Run comprehensive comparative analysis
-    comparison_data = comparison_analysis(articles)
     return JSONResponse(content=comparison_data)
-# Generate audio summary
 @app.get("/generate-audio/")
-def generate_audio(company: str):
     """Generates a Hindi audio summary using LLM response."""
-    # ✅ Extract 10 news articles
     articles = extract_news(company)[:10]
     if not articles:
         raise HTTPException(status_code=404, detail="No articles found for the given company.")
-    # ✅ Generate LLM-based sentiment summary
-    summary_text = summarize_overall_sentiment(articles)
-    # ✅ Convert summary to Hindi speech
-    audio_buffer = generate_hindi_speech(summary_text)
-    # ✅ Return only the Hindi audio as a file response
-    return StreamingResponse(audio_buffer, media_type="audio/mpeg", headers={
-        "Content-Disposition": "attachment; filename=hindi_summary.mp3"
-    })
 if __name__ == "__main__":

 from src.utils import extract_news, analyze_sentiment, extract_keywords_keybert, generate_hindi_speech
 from src.comparison import comparison_analysis
 from src.summarization import summarize_overall_sentiment
+from typing import Dict
 app = FastAPI()
 @app.get("/")
+def home() -> Dict[str, str]:
+    """Home route for API"""
     return {"message": "Welcome to the News Analysis API!"}
 @app.get("/news-analysis/")
+def get_news_analysis(company: str) -> JSONResponse:
     """Extracts news, analyzes sentiment, and provides a JSON response."""
     articles = extract_news(company)[:10]  # Extract first 10 articles
     if not articles:
         raise HTTPException(status_code=404, detail="No articles found for the given company.")
+    news_data = {
+        "Company": company,
+        "Articles": [
+            {
+                "Title": article.get("title", "No Title"),
+                "Summary": article.get("summary", "No Summary"),
+                "Sentiment": analyze_sentiment(article.get("summary", "")),  # Sentiment analysis
+                "Topics": extract_keywords_keybert(article.get("summary", ""))  # Extract topics
+            }
+            for article in articles
+        ]
+    }
     return JSONResponse(content=news_data)
 @app.get("/comparative-analyst/")
+def get_comparative_analysis(company: str) -> JSONResponse:
+    """Performs comparative sentiment analysis for a given company."""
     articles = extract_news(company)[:10]
     if len(articles) < 10:
         raise HTTPException(status_code=400, detail="Not enough articles for a full comparison.")
+    comparison_data = comparison_analysis(articles)  # Perform comparative analysis
     return JSONResponse(content=comparison_data)
 @app.get("/generate-audio/")
+def generate_audio(company: str) -> StreamingResponse:
     """Generates a Hindi audio summary using LLM response."""
     articles = extract_news(company)[:10]
     if not articles:
         raise HTTPException(status_code=404, detail="No articles found for the given company.")
+    summary_text = summarize_overall_sentiment(articles)  # Generate summary text
+    audio_buffer = generate_hindi_speech(summary_text)  # Convert summary to speech
+    return StreamingResponse(
+        audio_buffer,
+        media_type="audio/mpeg",
+        headers={"Content-Disposition": "attachment; filename=hindi_summary.mp3"}
+    )
 if __name__ == "__main__":

app.py CHANGED Viewed

@@ -1,10 +1,11 @@
 import streamlit as st
 import requests
 FASTAPI_URL = "http://127.0.0.1:8000"
-# Use normal width layout
-st.set_page_config(page_title="Insightful News AI", page_icon = "🤖", layout="centered")
 st.title("Sentiment-Driven News Summarization with AI-Powered Speech")
 # Sidebar Controls
@@ -14,109 +15,113 @@ get_news = st.sidebar.button("Get News Summary")
 compare_news = st.sidebar.button("Comparative Analysis")
 generate_audio = st.sidebar.button("Generate Audio")
-# Get News Summary
-if get_news:
-    st.write("## Comany: ", company)
-    response = requests.get(f"{FASTAPI_URL}/news-analysis/", params={"company": company})
-    if response.status_code == 200:
-        news_data = response.json()
-        for i, article in enumerate(news_data["Articles"], start=1):
-            st.write(f"### {i}. {article['Title']}")
-            st.write(f"**Summary:** {article['Summary']}")
-            st.write(f"**Sentiment:** {article['Sentiment']}")
-            st.write(f"**Topics:** {', '.join(article['Topics'])}")
-            st.markdown("---")  # Adds a separator between articles
     else:
         st.error("Error fetching news. Please try again.")
-# ✅ Comparative Analysis
 if compare_news:
     st.write("## Comparative Analysis")
-    response = requests.get(f"{FASTAPI_URL}/comparative-analyst/", params={"company": company})
-    if response.status_code == 200:
-        comparison_data = response.json()
-        # ✅ Extract overall sentiment analysis
         sentiment_data = comparison_data.get("Sentiment Analysis", {})
         sentiment_distribution = sentiment_data.get("Sentiment Distribution", {})
         final_sentiment = sentiment_data.get("Final Sentiment Summary", "No sentiment summary available.")
-        # ✅ Extract sentiment counts safely
         positive_count = sentiment_distribution.get("Positive", 0)
         negative_count = sentiment_distribution.get("Negative", 0)
         neutral_count = sentiment_distribution.get("Neutral", 0)
-        # ✅ Display sentiment distribution with metrics
         st.write("### Sentiment Distribution")
         col1, col2, col3 = st.columns(3)
         col1.metric(label="Positive", value=positive_count)
         col2.metric(label="Negative", value=negative_count)
         col3.metric(label="Neutral", value=neutral_count)
-        # ✅ Display final sentiment
         st.write("### Final Sentiment")
         if positive_count > negative_count:
-            st.success(f"**{final_sentiment}**")  # ✅ Green for Positive
         elif positive_count < negative_count:
-            st.error(f"**{final_sentiment}**")  # ✅ Red for Negative
         else:
-            st.warning(f"**{final_sentiment}**")  # ✅ Yellow for Neutral
-        # ✅ Display topic overlap
         st.write("### Topic Overlap")
         topic_overlap = comparison_data.get("Topic Overlap", {})
-        # ✅ Display common topics
         common_topics = topic_overlap.get("Common Topics", [])
-        if common_topics:
-            st.write(f"**Common Topics (Appearing in ≥3 articles):** {', '.join(common_topics)}")
-        else:
-            st.write("No significant common topics found.")
-        # ✅ Display unique topics per article
         unique_topics_per_article = topic_overlap.get("Unique Topics Per Article", [])
-        if unique_topics_per_article:
-            for topic_data in unique_topics_per_article:
-                article_number = topic_data.get("Article", "Unknown")
-                unique_topics = topic_data.get("Unique Topics", [])
-                st.write(f"**Unique Topics in Article {article_number}:** {', '.join(unique_topics) if unique_topics else 'None'}")
-        # ✅ Display Final LLM-Based Sentiment Analysis
         st.write("## Overall Sentiment Summary")
         final_llm_summary = comparison_data.get("Final Sentiment Analysis", "No summary available.")
         st.info(f"**{final_llm_summary}**")
     else:
-        st.error(f"🚨 Error fetching comparative analysis: {response.status_code}")
 # Generate Hindi Speech Audio
 if generate_audio:
     st.write("### Hindi Audio Summary")
-    # ✅ Fetch the audio file from API
     audio_url = f"{FASTAPI_URL}/generate-audio/?company={company}"
     response = requests.get(audio_url)
     if response.status_code == 200:
-        # ✅ Save the audio file in memory
         audio_data = response.content
-        # ✅ Play the audio directly in UI
         st.audio(audio_data, format="audio/mp3")
-        # ✅ Download button for Hindi summary
-        st.download_button(label="Download Hindi Audio",
-                           data=audio_data,
-                           file_name="hindi_summary.mp3",
-                           mime="audio/mpeg")
     else:
-        st.error("❌ Failed to generate audio. Please try again.")

 import streamlit as st
 import requests
+# Define API URL
 FASTAPI_URL = "http://127.0.0.1:8000"
+# Configure Streamlit app layout
+st.set_page_config(page_title="Insightful News AI", page_icon="🤖", layout="centered")
 st.title("Sentiment-Driven News Summarization with AI-Powered Speech")
 # Sidebar Controls
 compare_news = st.sidebar.button("Comparative Analysis")
 generate_audio = st.sidebar.button("Generate Audio")
+def fetch_data(endpoint, params=None):
+    """
+    Fetch data from the given API endpoint.
+    Args:
+        endpoint (str): API endpoint to fetch data from.
+        params (dict, optional): Query parameters for the request.
+    Returns:
+        dict: JSON response if successful, else None.
+    """
+    response = requests.get(f"{FASTAPI_URL}/{endpoint}", params=params)
+    return response.json() if response.status_code == 200 else None
+# Fetch News Summary
+if get_news:
+    st.write("## Company: ", company)
+    news_data = fetch_data("news-analysis", {"company": company})
+    if news_data:
+        for i, article in enumerate(news_data.get("Articles", []), start=1):
+            st.write(f"### {i}. {article.get('Title', 'No Title')}")
+            st.write(f"**Summary:** {article.get('Summary', 'No Summary')}")
+            st.write(f"**Sentiment:** {article.get('Sentiment', 'Unknown')}")
+            st.write(f"**Topics:** {', '.join(article.get('Topics', []))}")
+            st.markdown("---")  # Separator between articles
     else:
         st.error("Error fetching news. Please try again.")
+# Comparative Analysis
 if compare_news:
     st.write("## Comparative Analysis")
+    comparison_data = fetch_data("comparative-analyst", {"company": company})
+    if comparison_data:
+        # Extract overall sentiment analysis
         sentiment_data = comparison_data.get("Sentiment Analysis", {})
         sentiment_distribution = sentiment_data.get("Sentiment Distribution", {})
         final_sentiment = sentiment_data.get("Final Sentiment Summary", "No sentiment summary available.")
+        # Extract sentiment counts safely
         positive_count = sentiment_distribution.get("Positive", 0)
         negative_count = sentiment_distribution.get("Negative", 0)
         neutral_count = sentiment_distribution.get("Neutral", 0)
+        # Display sentiment distribution
         st.write("### Sentiment Distribution")
         col1, col2, col3 = st.columns(3)
         col1.metric(label="Positive", value=positive_count)
         col2.metric(label="Negative", value=negative_count)
         col3.metric(label="Neutral", value=neutral_count)
+        # Display final sentiment summary
         st.write("### Final Sentiment")
         if positive_count > negative_count:
+            st.success(f"**{final_sentiment}**")  # Green for Positive
         elif positive_count < negative_count:
+            st.error(f"**{final_sentiment}**")  # Red for Negative
         else:
+            st.warning(f"**{final_sentiment}**")  # Yellow for Neutral
+        # Display topic overlap
         st.write("### Topic Overlap")
         topic_overlap = comparison_data.get("Topic Overlap", {})
+        # Display common topics
         common_topics = topic_overlap.get("Common Topics", [])
+        st.write(f"**Common Topics (Appearing in ≥3 articles):** {', '.join(common_topics) if common_topics else 'None'}")
+        # Display unique topics per article
         unique_topics_per_article = topic_overlap.get("Unique Topics Per Article", [])
+        for topic_data in unique_topics_per_article:
+            article_number = topic_data.get("Article", "Unknown")
+            unique_topics = topic_data.get("Unique Topics", [])
+            st.write(f"**Unique Topics in Article {article_number}:** {', '.join(unique_topics) if unique_topics else 'None'}")
+        # Display Final LLM-Based Sentiment Analysis
         st.write("## Overall Sentiment Summary")
         final_llm_summary = comparison_data.get("Final Sentiment Analysis", "No summary available.")
         st.info(f"**{final_llm_summary}**")
     else:
+        st.error("Error fetching comparative analysis. Please try again.")
 # Generate Hindi Speech Audio
 if generate_audio:
     st.write("### Hindi Audio Summary")
     audio_url = f"{FASTAPI_URL}/generate-audio/?company={company}"
     response = requests.get(audio_url)
     if response.status_code == 200:
         audio_data = response.content
+        # Play the audio
         st.audio(audio_data, format="audio/mp3")
+        # Provide a download button
+        st.download_button(
+            label="Download Hindi Audio",
+            data=audio_data,
+            file_name="hindi_summary.mp3",
+            mime="audio/mpeg"
+        )
     else:
+        st.error("Failed to generate audio. Please try again.")

research/trials.ipynb CHANGED Viewed

The diff for this file is too large to render. See raw diff

setup.py DELETED Viewed

@@ -1,10 +0,0 @@
-from setuptools import find_packages, setup
-setup(
-    name="News-Summarization-and-Text-to-Speech-Application",
-    version="0.0.1",
-    author="bijay",
-    author_email="[email protected]",
-    packages=find_packages(),
-    install_requires=["SpeechRecognition","pipwin","pyaudio","gTTS","google-generativeai","python-dotenv","streamlit"]
-)

src/comparison.py CHANGED Viewed

@@ -4,43 +4,52 @@ from keybert import KeyBERT
 from src.utils import extract_keywords_keybert, analyze_sentiment
 from src.summarization import summarize_overall_sentiment
-# ✅ Load necessary models
-sentiment_pipeline = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
-kw_model = KeyBERT("distilbert-base-nli-mean-tokens")
 def comparison_analysis(articles):
-    """Compares articles based on sentiment, topics, and provides a final sentiment summary."""
     if len(articles) < 10:
         return {"error": "Not enough articles for a full comparison."}
-    # ✅ Extract keywords from all 10 articles
     article_keywords = [extract_keywords_keybert(article["summary"]) for article in articles]
-    # ✅ Count occurrences of each keyword
     all_keywords = [kw for sublist in article_keywords for kw in sublist]
     keyword_counts = Counter(all_keywords)
-    # ✅ Identify Common & Unique Topics
-    common_topics = [kw for kw, count in keyword_counts.items() if count >= 3]  # ✅ Common if in ≥3 articles
     unique_topics_per_article = [
-        {"Article": i+1, "Unique Topics": list(set(article_keywords[i]) - set(common_topics))}
         for i in range(len(articles))
     ]
-    # ✅ Sentiment Distribution
     sentiments = [analyze_sentiment(article["summary"]) for article in articles]
     sentiment_counts = Counter(sentiments)
-    formatted_counts = {sent.capitalize(): count for sent, count in sentiment_counts.items()}  # Proper Case
-    # ✅ Determine Overall Sentiment
     overall_sentiment = max(sentiment_counts, key=sentiment_counts.get, default="Neutral").capitalize()
-    sentiment_summary = f"Overall sentiment is {overall_sentiment} ({formatted_counts.get('Negative', 0)} Negative, {formatted_counts.get('Positive', 0)} Positive)."
-    # ✅ LLM-Based Sentiment Summary
     overall_summary = summarize_overall_sentiment(articles)
-    # ✅ Return the final comparative analysis
     return {
         "Sentiment Analysis": {
             "Sentiment Distribution": formatted_counts,
@@ -50,5 +59,5 @@ def comparison_analysis(articles):
             "Common Topics": common_topics,
             "Unique Topics Per Article": unique_topics_per_article
         },
-        "Final Sentiment Analysis": overall_summary  # ✅ LLM-generated summary
     }

 from src.utils import extract_keywords_keybert, analyze_sentiment
 from src.summarization import summarize_overall_sentiment
 def comparison_analysis(articles):
+    """
+    Compares articles based on sentiment and topics, providing a final sentiment summary.
+    Args:
+        articles (list[dict]): A list of articles, each containing a "summary" key.
+    Returns:
+        dict: A dictionary containing sentiment analysis, topic overlap, and final sentiment summary.
+    """
     if len(articles) < 10:
         return {"error": "Not enough articles for a full comparison."}
+    # Extract keywords from all articles
     article_keywords = [extract_keywords_keybert(article["summary"]) for article in articles]
+    # Count occurrences of each keyword
     all_keywords = [kw for sublist in article_keywords for kw in sublist]
     keyword_counts = Counter(all_keywords)
+    # Identify common and unique topics
+    common_topics = [kw for kw, count in keyword_counts.items() if count >= 3]  # Common if in ≥3 articles
     unique_topics_per_article = [
+        {"Article": i + 1, "Unique Topics": list(set(article_keywords[i]) - set(common_topics))}
         for i in range(len(articles))
     ]
+    # Perform sentiment analysis
     sentiments = [analyze_sentiment(article["summary"]) for article in articles]
     sentiment_counts = Counter(sentiments)
+    # Format sentiment counts for readability
+    formatted_counts = {sent.capitalize(): count for sent, count in sentiment_counts.items()}
+    # Determine overall sentiment
     overall_sentiment = max(sentiment_counts, key=sentiment_counts.get, default="Neutral").capitalize()
+    sentiment_summary = (
+        f"Overall sentiment is {overall_sentiment} "
+        f"({formatted_counts.get('Negative', 0)} Negative, {formatted_counts.get('Positive', 0)} Positive)."
+    )
+    # Generate LLM-based sentiment summary
     overall_summary = summarize_overall_sentiment(articles)
+    # Return the final comparative analysis
     return {
         "Sentiment Analysis": {
             "Sentiment Distribution": formatted_counts,
             "Common Topics": common_topics,
             "Unique Topics Per Article": unique_topics_per_article
         },
+        "Final Sentiment Analysis": overall_summary  # LLM-generated summary
     }

src/summarization.py CHANGED Viewed

@@ -4,20 +4,28 @@ import groq
 working_dir = os.path.dirname(os.path.abspath(__file__))
 GROQ_API_KEY = os.environ["GROQ_API_KEY"]
-# ✅ Check if API Key is available
 if not GROQ_API_KEY:
-    raise ValueError("🚨 Error: GROQ_API_KEY is missing! Set it as an environment variable.")
-# ✅ Initialize Groq Client
 client = groq.Groq(api_key=GROQ_API_KEY)
 def summarize_overall_sentiment(articles):
-    """Uses Groq API (LLaMA-3, Mixtral) to summarize sentiment analysis."""
-    # ✅ Concatenate all article summaries
     concatenated_text = " ".join(article["summary"] for article in articles)
-    # ✅ Define the prompt
     prompt = f"""
     You are an AI model designed for news sentiment summarization.
     Analyze the following news articles and determine the overall sentiment
@@ -28,7 +36,7 @@ def summarize_overall_sentiment(articles):
     Provide a concise summary without additional formatting or headers in two paragraphs.
     """
-    # ✅ Use a valid Groq model (Mixtral or LLaMA-3)
     response = client.chat.completions.create(
         model="mistral-saba-24b",
         messages=[
@@ -38,5 +46,5 @@ def summarize_overall_sentiment(articles):
         max_tokens=250
     )
-    # ✅ Return a cleaned response (no extra characters)
     return response.choices[0].message.content.strip()

 working_dir = os.path.dirname(os.path.abspath(__file__))
 GROQ_API_KEY = os.environ["GROQ_API_KEY"]
+# Check if API Key is available
 if not GROQ_API_KEY:
+    raise ValueError("Error: GROQ_API_KEY is missing! Set it as an environment variable.")
+# Initialize Groq Client
 client = groq.Groq(api_key=GROQ_API_KEY)
 def summarize_overall_sentiment(articles):
+    """
+    Summarizes sentiment analysis using the Groq API (LLaMA-3, Mixtral).
+    Args:
+        articles (list[dict]): A list of articles, each containing a "summary" key.
+    Returns:
+        str: A concise sentiment summary based on the news articles.
+    """
+    # Concatenate all article summaries
     concatenated_text = " ".join(article["summary"] for article in articles)
+    # Define the prompt for sentiment summarization
     prompt = f"""
     You are an AI model designed for news sentiment summarization.
     Analyze the following news articles and determine the overall sentiment
     Provide a concise summary without additional formatting or headers in two paragraphs.
     """
+    # Use a valid Groq model (Mixtral or LLaMA-3)
     response = client.chat.completions.create(
         model="mistral-saba-24b",
         messages=[
         max_tokens=250
     )
+    # Return a cleaned response
     return response.choices[0].message.content.strip()

src/utils.py CHANGED Viewed

@@ -1,7 +1,6 @@
 import requests
 import io
 from bs4 import BeautifulSoup
-from collections import Counter
 from gtts import gTTS
 from deep_translator import GoogleTranslator
 from transformers import pipeline
@@ -9,18 +8,29 @@ from keybert import KeyBERT
 # News Extraction
 def extract_news(topic):
     url = f"https://economictimes.indiatimes.com/topic/{topic}"
     headers = {"User-Agent": "Mozilla/5.0"}
-    response = requests.get(url, headers=headers)
-    if response.status_code != 200:
-        print(f"Failed to retrieve page. Status Code: {response.status_code}")
         return []
     soup = BeautifulSoup(response.text, "html.parser")
     articles = []
-    article_blocks = soup.find_all("div", class_="clr flt topicstry story_list")  # Find all articles
     for article in article_blocks:
         title_tag = article.find("a", class_="wrapLines l2")
@@ -34,33 +44,62 @@ def extract_news(topic):
     return articles
-# Sentiment Analysis
-sentiment_pipeline = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
-# Function to analyze sentiment
 def analyze_sentiment(text):
-    result = sentiment_pipeline(text)[0]  # Run text through the model
-    sentiment = result["label"]  # Extract sentiment label
-    return sentiment  # Returns 'POSITIVE' or 'NEGATIVE'
-# Keyword Extraction
-kw_model = KeyBERT('distilbert-base-nli-mean-tokens')
 def extract_keywords_keybert(text):
-    return [kw[0].title() for kw in kw_model.extract_keywords(text, keyphrase_ngram_range=(1, 2), top_n=3)]
-# Summarized Text to Hindi Speech
 def generate_hindi_speech(text):
-    hindi_text = GoogleTranslator(source="auto", target="hi").translate(text)  # Translate to Hindi
     tts = gTTS(hindi_text, lang="hi")
-    # Store the speech in memory instead of a file
     audio_buffer = io.BytesIO()
     tts.write_to_fp(audio_buffer)
-    audio_buffer.seek(0)  # Move to start for playback
     return audio_buffer

 import requests
 import io
 from bs4 import BeautifulSoup
 from gtts import gTTS
 from deep_translator import GoogleTranslator
 from transformers import pipeline
 # News Extraction
 def extract_news(topic):
+    """
+    Extracts news articles related to the given topic from the Economic Times website.
+    Args:
+        topic (str): The topic for which news articles need to be extracted.
+    Returns:
+        list[dict]: A list of dictionaries containing news titles and summaries.
+    """
     url = f"https://economictimes.indiatimes.com/topic/{topic}"
     headers = {"User-Agent": "Mozilla/5.0"}
+    try:
+        response = requests.get(url, headers=headers)
+        response.raise_for_status()  # Raise an HTTPError for bad responses (4xx and 5xx)
+    except requests.RequestException as e:
+        print(f"Error fetching news: {e}")
         return []
     soup = BeautifulSoup(response.text, "html.parser")
     articles = []
+    article_blocks = soup.find_all("div", class_="clr flt topicstry story_list")
     for article in article_blocks:
         title_tag = article.find("a", class_="wrapLines l2")
     return articles
+# Sentiment Analysis Pipeline
+sentiment_pipeline = pipeline(
+    "sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english"
+)
 def analyze_sentiment(text):
+    """
+    Analyzes the sentiment of the given text using a pre-trained model.
+    Args:
+        text (str): The input text to analyze.
+    Returns:
+        str: Sentiment label ('POSITIVE' or 'NEGATIVE').
+    """
+    result = sentiment_pipeline(text)[0]  # Process text through the model
+    return result["label"]  # Extract and return sentiment label
+# Keyword Extraction using KeyBERT
+kw_model = KeyBERT("distilbert-base-nli-mean-tokens")
 def extract_keywords_keybert(text):
+    """
+    Extracts keywords from the given text using KeyBERT.
+    Args:
+        text (str): The input text for keyword extraction.
+    Returns:
+        list[str]: A list of extracted keywords (title-cased).
+    """
+    keywords = kw_model.extract_keywords(text, keyphrase_ngram_range=(1, 2), top_n=3)
+    return [kw[0].title() for kw in keywords]
+# Hindi Speech Generation
 def generate_hindi_speech(text):
+    """
+    Converts the given text into Hindi speech.
+    Args:
+        text (str): The input text to be translated and converted to speech.
+    Returns:
+        io.BytesIO: A buffer containing the generated speech audio.
+    """
+    # Translate text to Hindi
+    hindi_text = GoogleTranslator(source="auto", target="hi").translate(text)
+    # Convert translated text to speech
     tts = gTTS(hindi_text, lang="hi")
+    # Store the generated speech in memory
     audio_buffer = io.BytesIO()
     tts.write_to_fp(audio_buffer)
+    audio_buffer.seek(0)  # Reset buffer position for playback
     return audio_buffer