bijayjr commited on
Commit
8c60672
·
1 Parent(s): b2c3642

Documentation added

Browse files
Files changed (8) hide show
  1. README.md +163 -0
  2. api.py +30 -36
  3. app.py +56 -51
  4. research/trials.ipynb +0 -0
  5. setup.py +0 -10
  6. src/comparison.py +26 -17
  7. src/summarization.py +17 -9
  8. src/utils.py +57 -18
README.md CHANGED
@@ -6,3 +6,166 @@ colorTo: yellow
6
  sdk: docker
7
  pinned: false
8
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  sdk: docker
7
  pinned: false
8
  ---
9
+
10
+ # **Insightful News AI** 🤖
11
+
12
+ ## Overview
13
+ Insightful News AI is a FastAPI-based application tool that fetches news articles for a given company, analyzes their sentiment, extracts key topics, and provides Hindi audio summaries. The application uses NLP models for text processing and an AI-powered TTS engine for speech synthesis.
14
+
15
+ ## **Project Setup**
16
+ ### Prerequisites
17
+ - Python 3.10 (Virtual Environment: `tts`)
18
+ - FastAPI
19
+ - Streamlit
20
+ - Required Python packages (listed in `requirements.txt`)
21
+
22
+ ## 🚀 Installation Steps
23
+
24
+ ### **1️⃣ Clone the Repository**
25
+ ```bash
26
+ git clone https://github.com/bijaycd/News-Summarization-and-Text-to-Speech-Application.git
27
+ cd News-Summarization-and-Text-to-Speech-Application
28
+ ```
29
+
30
+ ### **2️⃣ Create and Activate a Virtual Environment**
31
+ ```bash
32
+ python -m venv tts
33
+ source venv/bin/activate # On macOS/Linux
34
+ venv\Scripts\activate # On Windows
35
+ ```
36
+
37
+ ### **3️⃣ Install Dependencies**
38
+ ```bash
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ ### **4️⃣ Set Up Environment Variables**
43
+ Create a `.env` file and add your API keys:
44
+ ```
45
+ GROQ_API_KEY=your_api_key_here
46
+ ```
47
+
48
+ ### **5️⃣ Run the Backend (FastAPI Server)**
49
+ ```bash
50
+ uvicorn app:app --host 127.0.0.1 --port 8000 --reload
51
+ ```
52
+
53
+ ### **6️⃣ Run the Frontend (Streamlit UI)**
54
+ ```bash
55
+ streamlit run app.py
56
+ ```
57
+
58
+ ---
59
+
60
+ ## **🧠 Model Details**
61
+
62
+ This project uses **three AI models**:
63
+
64
+ 1. **Summarization Model** (Mistral)
65
+ - Extracts key insights from news articles.
66
+ - Uses the `mistral-saba-24b` model for concise summarization.
67
+
68
+ 2. **Sentiment Analysis Model** (DistilBERT)
69
+ - Categorizes sentiment as **Positive, Negative, or Neutral**.
70
+ - Analyzes news sentiment using pre-trained NLP models.
71
+
72
+ 3. **Text-to-Speech (TTS) Model** (Google Text-to-Speech)
73
+ - Converts AI-generated summaries into **Hindi speech**.
74
+ - Uses a custom speech synthesis model.
75
+
76
+ ---
77
+
78
+ ## **🛠 API Development**
79
+
80
+ ### **Endpoints**
81
+
82
+ | Method | Endpoint | Description |
83
+ |--------|---------------------------|-------------|
84
+ | `GET` | `/` | Welcome message. |
85
+ | `GET` | `/news-analysis/?company=XYZ` | Extracts news titles, summaries and analyzes sentiments |
86
+ | `GET` | `/comparative-analyst/?company=XYZ` | Performs a comparative sentiment analysis. |
87
+ | `GET` | `/generate-audio/?company=XYZ` | Generates a Hindi audio summary. |
88
+
89
+ ---
90
+
91
+ ## **📡 API Usage**
92
+
93
+ ### **1️⃣ Fetch News Sentiment Summary**
94
+ **Request (Postman, cURL, Python)**
95
+ ```bash
96
+ curl -X GET "http://127.0.0.1:8000/news-analysis/?company=Google"
97
+ ```
98
+ **Response (JSON)**
99
+ ```json
100
+ {
101
+ "Company": "Google",
102
+ "Articles": [
103
+ {
104
+ "Title": "Google removes 331 malicious apps from Play Store",
105
+ "Summary": "Vapor Operation infected 331 apps with 60M+ downloads, engaging in ad fraud and phishing...",
106
+ "Sentiment": "Negative",
107
+ "Topics": ["Vapor Operation", "Andriod 13", "Security"]
108
+ }
109
+ ]
110
+ }
111
+ ```
112
+
113
+ ### **2️⃣ Comparative Sentiment Analysis**
114
+ ```bash
115
+ curl -X GET "http://127.0.0.1:8000/comparative-analyst/?company=Google"
116
+ ```
117
+ **Response Example**
118
+ ```json
119
+ {
120
+ "Sentiment Analysis": {
121
+ "Sentiment Distribution": {"Positive": 5, "Negative": 4, "Neutral": 1},
122
+ "Final Sentiment Summary": "Overall, Google news has a positive sentiment..."
123
+ }
124
+ }
125
+ ```
126
+
127
+ ### **3️⃣ Generate Hindi Audio Summary**
128
+ ```bash
129
+ curl -X GET "http://127.0.0.1:8000/generate-audio/?company=Google"
130
+ ```
131
+ **Response**
132
+ - Returns an `mp3` file with the Hindi audio summary.
133
+
134
+ ---
135
+
136
+ ## **🔗 Third-Party APIs Used**
137
+
138
+ | API | Purpose |
139
+ |-----------|---------|
140
+ | **Groq API** | Used for text summarization (Mistral). |
141
+
142
+ ---
143
+
144
+ ## **⚠ Assumptions & Limitations**
145
+
146
+ ### **✅ Assumptions**
147
+ 1. **Company Name Input**: The company name entered exists in publicly available news.
148
+ 2. **Sentiment Accuracy**: The sentiment analysis model is trained on general news data but may not capture sarcasm or nuanced sentiment.
149
+ 3. **Keyword Extraction**: Uses KeyBERT for topic extraction, assuming that key topics can be identified from short summaries.
150
+
151
+ ### **🚨 Limitations**
152
+ 1. **News Data Availability**: If fewer than 10 articles are found, comparative analysis may not be performed.
153
+ 2. **TTS Language Support**: Currently, speech generation is limited to **Hindi** only.
154
+ 3. **Rate Limits**: Using the Groq API requires an API key with rate limits.
155
+
156
+ ---
157
+
158
+ ## **📌 Future Enhancements**
159
+ ✅ **Expand TTS support to more languages**
160
+ ✅ **Improve sentiment classification using fine-tuned LLMs**
161
+ ✅ **Enable real-time news updates using WebSockets**
162
+
163
+ ---
164
+
165
+ ### **🔗 Contributors**
166
+ 👤 **Bijay Chandra Das**
167
+ 📧 **[email protected]**
168
+
169
+ 📌 **GitHub Repo**: [GitHub](https://github.com/bijaycd/News-Summarization-and-Text-to-Speech-Application)
170
+
171
+ ---
api.py CHANGED
@@ -4,76 +4,70 @@ import uvicorn
4
  from src.utils import extract_news, analyze_sentiment, extract_keywords_keybert, generate_hindi_speech
5
  from src.comparison import comparison_analysis
6
  from src.summarization import summarize_overall_sentiment
7
-
8
 
9
  app = FastAPI()
10
 
 
11
  @app.get("/")
12
- def home():
 
13
  return {"message": "Welcome to the News Analysis API!"}
14
 
15
 
16
  @app.get("/news-analysis/")
17
- def get_news_analysis(company: str):
18
  """Extracts news, analyzes sentiment, and provides a JSON response."""
19
  articles = extract_news(company)[:10] # Extract first 10 articles
 
20
  if not articles:
21
  raise HTTPException(status_code=404, detail="No articles found for the given company.")
22
 
23
- news_data = {"Company": company, "Articles": []}
24
-
25
- for article in articles:
26
- sentiment = analyze_sentiment(article["summary"]) # Analyze sentiment
27
- topics = extract_keywords_keybert(article["summary"]) # Extract key topics
28
-
29
- news_data["Articles"].append({
30
- "Title": article["title"],
31
- "Summary": article["summary"],
32
- "Sentiment": sentiment,
33
- "Topics": topics
34
- })
35
 
36
  return JSONResponse(content=news_data)
37
 
38
 
39
-
40
  @app.get("/comparative-analyst/")
41
- def get_comparative_analysis(company: str):
42
-
43
- # ✅ Extract 10 articles
44
  articles = extract_news(company)[:10]
45
 
46
  if len(articles) < 10:
47
  raise HTTPException(status_code=400, detail="Not enough articles for a full comparison.")
48
 
49
- # Run comprehensive comparative analysis
50
- comparison_data = comparison_analysis(articles)
51
 
52
  return JSONResponse(content=comparison_data)
53
 
54
 
55
-
56
- # Generate audio summary
57
  @app.get("/generate-audio/")
58
- def generate_audio(company: str):
59
  """Generates a Hindi audio summary using LLM response."""
60
-
61
- # ✅ Extract 10 news articles
62
  articles = extract_news(company)[:10]
 
63
  if not articles:
64
  raise HTTPException(status_code=404, detail="No articles found for the given company.")
65
 
66
- # Generate LLM-based sentiment summary
67
- summary_text = summarize_overall_sentiment(articles)
68
-
69
- # ✅ Convert summary to Hindi speech
70
- audio_buffer = generate_hindi_speech(summary_text)
71
-
72
- # ✅ Return only the Hindi audio as a file response
73
- return StreamingResponse(audio_buffer, media_type="audio/mpeg", headers={
74
- "Content-Disposition": "attachment; filename=hindi_summary.mp3"
75
- })
76
 
 
 
 
 
 
77
 
78
 
79
  if __name__ == "__main__":
 
4
  from src.utils import extract_news, analyze_sentiment, extract_keywords_keybert, generate_hindi_speech
5
  from src.comparison import comparison_analysis
6
  from src.summarization import summarize_overall_sentiment
7
+ from typing import Dict
8
 
9
  app = FastAPI()
10
 
11
+
12
  @app.get("/")
13
+ def home() -> Dict[str, str]:
14
+ """Home route for API"""
15
  return {"message": "Welcome to the News Analysis API!"}
16
 
17
 
18
  @app.get("/news-analysis/")
19
+ def get_news_analysis(company: str) -> JSONResponse:
20
  """Extracts news, analyzes sentiment, and provides a JSON response."""
21
  articles = extract_news(company)[:10] # Extract first 10 articles
22
+
23
  if not articles:
24
  raise HTTPException(status_code=404, detail="No articles found for the given company.")
25
 
26
+ news_data = {
27
+ "Company": company,
28
+ "Articles": [
29
+ {
30
+ "Title": article.get("title", "No Title"),
31
+ "Summary": article.get("summary", "No Summary"),
32
+ "Sentiment": analyze_sentiment(article.get("summary", "")), # Sentiment analysis
33
+ "Topics": extract_keywords_keybert(article.get("summary", "")) # Extract topics
34
+ }
35
+ for article in articles
36
+ ]
37
+ }
38
 
39
  return JSONResponse(content=news_data)
40
 
41
 
 
42
  @app.get("/comparative-analyst/")
43
+ def get_comparative_analysis(company: str) -> JSONResponse:
44
+ """Performs comparative sentiment analysis for a given company."""
 
45
  articles = extract_news(company)[:10]
46
 
47
  if len(articles) < 10:
48
  raise HTTPException(status_code=400, detail="Not enough articles for a full comparison.")
49
 
50
+ comparison_data = comparison_analysis(articles) # Perform comparative analysis
 
51
 
52
  return JSONResponse(content=comparison_data)
53
 
54
 
 
 
55
  @app.get("/generate-audio/")
56
+ def generate_audio(company: str) -> StreamingResponse:
57
  """Generates a Hindi audio summary using LLM response."""
 
 
58
  articles = extract_news(company)[:10]
59
+
60
  if not articles:
61
  raise HTTPException(status_code=404, detail="No articles found for the given company.")
62
 
63
+ summary_text = summarize_overall_sentiment(articles) # Generate summary text
64
+ audio_buffer = generate_hindi_speech(summary_text) # Convert summary to speech
 
 
 
 
 
 
 
 
65
 
66
+ return StreamingResponse(
67
+ audio_buffer,
68
+ media_type="audio/mpeg",
69
+ headers={"Content-Disposition": "attachment; filename=hindi_summary.mp3"}
70
+ )
71
 
72
 
73
  if __name__ == "__main__":
app.py CHANGED
@@ -1,10 +1,11 @@
1
  import streamlit as st
2
  import requests
3
 
 
4
  FASTAPI_URL = "http://127.0.0.1:8000"
5
 
6
- # Use normal width layout
7
- st.set_page_config(page_title="Insightful News AI", page_icon = "🤖", layout="centered")
8
  st.title("Sentiment-Driven News Summarization with AI-Powered Speech")
9
 
10
  # Sidebar Controls
@@ -14,109 +15,113 @@ get_news = st.sidebar.button("Get News Summary")
14
  compare_news = st.sidebar.button("Comparative Analysis")
15
  generate_audio = st.sidebar.button("Generate Audio")
16
 
17
- # Get News Summary
18
- if get_news:
19
- st.write("## Comany: ", company)
20
 
21
- response = requests.get(f"{FASTAPI_URL}/news-analysis/", params={"company": company})
 
 
22
 
23
- if response.status_code == 200:
24
- news_data = response.json()
 
 
 
 
 
 
 
25
 
26
- for i, article in enumerate(news_data["Articles"], start=1):
27
- st.write(f"### {i}. {article['Title']}")
28
- st.write(f"**Summary:** {article['Summary']}")
29
- st.write(f"**Sentiment:** {article['Sentiment']}")
30
- st.write(f"**Topics:** {', '.join(article['Topics'])}")
31
- st.markdown("---") # Adds a separator between articles
32
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  else:
34
  st.error("Error fetching news. Please try again.")
35
 
36
 
37
- # Comparative Analysis
38
  if compare_news:
39
  st.write("## Comparative Analysis")
 
40
 
41
- response = requests.get(f"{FASTAPI_URL}/comparative-analyst/", params={"company": company})
42
-
43
- if response.status_code == 200:
44
- comparison_data = response.json()
45
-
46
- # ✅ Extract overall sentiment analysis
47
  sentiment_data = comparison_data.get("Sentiment Analysis", {})
48
  sentiment_distribution = sentiment_data.get("Sentiment Distribution", {})
49
  final_sentiment = sentiment_data.get("Final Sentiment Summary", "No sentiment summary available.")
50
 
51
- # Extract sentiment counts safely
52
  positive_count = sentiment_distribution.get("Positive", 0)
53
  negative_count = sentiment_distribution.get("Negative", 0)
54
  neutral_count = sentiment_distribution.get("Neutral", 0)
55
 
56
- # Display sentiment distribution with metrics
57
  st.write("### Sentiment Distribution")
58
  col1, col2, col3 = st.columns(3)
59
  col1.metric(label="Positive", value=positive_count)
60
  col2.metric(label="Negative", value=negative_count)
61
  col3.metric(label="Neutral", value=neutral_count)
62
 
63
- # Display final sentiment
64
  st.write("### Final Sentiment")
65
  if positive_count > negative_count:
66
- st.success(f"**{final_sentiment}**") # Green for Positive
67
  elif positive_count < negative_count:
68
- st.error(f"**{final_sentiment}**") # Red for Negative
69
  else:
70
- st.warning(f"**{final_sentiment}**") # Yellow for Neutral
71
 
72
- # Display topic overlap
73
  st.write("### Topic Overlap")
74
  topic_overlap = comparison_data.get("Topic Overlap", {})
75
 
76
- # Display common topics
77
  common_topics = topic_overlap.get("Common Topics", [])
78
- if common_topics:
79
- st.write(f"**Common Topics (Appearing in ≥3 articles):** {', '.join(common_topics)}")
80
- else:
81
- st.write("No significant common topics found.")
82
 
83
- # Display unique topics per article
84
  unique_topics_per_article = topic_overlap.get("Unique Topics Per Article", [])
85
- if unique_topics_per_article:
86
- for topic_data in unique_topics_per_article:
87
- article_number = topic_data.get("Article", "Unknown")
88
- unique_topics = topic_data.get("Unique Topics", [])
89
- st.write(f"**Unique Topics in Article {article_number}:** {', '.join(unique_topics) if unique_topics else 'None'}")
90
 
91
- # Display Final LLM-Based Sentiment Analysis
92
  st.write("## Overall Sentiment Summary")
93
  final_llm_summary = comparison_data.get("Final Sentiment Analysis", "No summary available.")
94
  st.info(f"**{final_llm_summary}**")
95
 
96
  else:
97
- st.error(f"🚨 Error fetching comparative analysis: {response.status_code}")
98
-
99
 
100
 
101
  # Generate Hindi Speech Audio
102
  if generate_audio:
103
  st.write("### Hindi Audio Summary")
104
 
105
- # ✅ Fetch the audio file from API
106
  audio_url = f"{FASTAPI_URL}/generate-audio/?company={company}"
107
  response = requests.get(audio_url)
108
 
109
  if response.status_code == 200:
110
- # ✅ Save the audio file in memory
111
  audio_data = response.content
112
 
113
- # Play the audio directly in UI
114
  st.audio(audio_data, format="audio/mp3")
115
 
116
- # Download button for Hindi summary
117
- st.download_button(label="Download Hindi Audio",
118
- data=audio_data,
119
- file_name="hindi_summary.mp3",
120
- mime="audio/mpeg")
 
 
121
  else:
122
- st.error("Failed to generate audio. Please try again.")
 
1
  import streamlit as st
2
  import requests
3
 
4
+ # Define API URL
5
  FASTAPI_URL = "http://127.0.0.1:8000"
6
 
7
+ # Configure Streamlit app layout
8
+ st.set_page_config(page_title="Insightful News AI", page_icon="🤖", layout="centered")
9
  st.title("Sentiment-Driven News Summarization with AI-Powered Speech")
10
 
11
  # Sidebar Controls
 
15
  compare_news = st.sidebar.button("Comparative Analysis")
16
  generate_audio = st.sidebar.button("Generate Audio")
17
 
 
 
 
18
 
19
+ def fetch_data(endpoint, params=None):
20
+ """
21
+ Fetch data from the given API endpoint.
22
 
23
+ Args:
24
+ endpoint (str): API endpoint to fetch data from.
25
+ params (dict, optional): Query parameters for the request.
26
+
27
+ Returns:
28
+ dict: JSON response if successful, else None.
29
+ """
30
+ response = requests.get(f"{FASTAPI_URL}/{endpoint}", params=params)
31
+ return response.json() if response.status_code == 200 else None
32
 
 
 
 
 
 
 
33
 
34
+ # Fetch News Summary
35
+ if get_news:
36
+ st.write("## Company: ", company)
37
+ news_data = fetch_data("news-analysis", {"company": company})
38
+
39
+ if news_data:
40
+ for i, article in enumerate(news_data.get("Articles", []), start=1):
41
+ st.write(f"### {i}. {article.get('Title', 'No Title')}")
42
+ st.write(f"**Summary:** {article.get('Summary', 'No Summary')}")
43
+ st.write(f"**Sentiment:** {article.get('Sentiment', 'Unknown')}")
44
+ st.write(f"**Topics:** {', '.join(article.get('Topics', []))}")
45
+ st.markdown("---") # Separator between articles
46
  else:
47
  st.error("Error fetching news. Please try again.")
48
 
49
 
50
+ # Comparative Analysis
51
  if compare_news:
52
  st.write("## Comparative Analysis")
53
+ comparison_data = fetch_data("comparative-analyst", {"company": company})
54
 
55
+ if comparison_data:
56
+ # Extract overall sentiment analysis
 
 
 
 
57
  sentiment_data = comparison_data.get("Sentiment Analysis", {})
58
  sentiment_distribution = sentiment_data.get("Sentiment Distribution", {})
59
  final_sentiment = sentiment_data.get("Final Sentiment Summary", "No sentiment summary available.")
60
 
61
+ # Extract sentiment counts safely
62
  positive_count = sentiment_distribution.get("Positive", 0)
63
  negative_count = sentiment_distribution.get("Negative", 0)
64
  neutral_count = sentiment_distribution.get("Neutral", 0)
65
 
66
+ # Display sentiment distribution
67
  st.write("### Sentiment Distribution")
68
  col1, col2, col3 = st.columns(3)
69
  col1.metric(label="Positive", value=positive_count)
70
  col2.metric(label="Negative", value=negative_count)
71
  col3.metric(label="Neutral", value=neutral_count)
72
 
73
+ # Display final sentiment summary
74
  st.write("### Final Sentiment")
75
  if positive_count > negative_count:
76
+ st.success(f"**{final_sentiment}**") # Green for Positive
77
  elif positive_count < negative_count:
78
+ st.error(f"**{final_sentiment}**") # Red for Negative
79
  else:
80
+ st.warning(f"**{final_sentiment}**") # Yellow for Neutral
81
 
82
+ # Display topic overlap
83
  st.write("### Topic Overlap")
84
  topic_overlap = comparison_data.get("Topic Overlap", {})
85
 
86
+ # Display common topics
87
  common_topics = topic_overlap.get("Common Topics", [])
88
+ st.write(f"**Common Topics (Appearing in ≥3 articles):** {', '.join(common_topics) if common_topics else 'None'}")
 
 
 
89
 
90
+ # Display unique topics per article
91
  unique_topics_per_article = topic_overlap.get("Unique Topics Per Article", [])
92
+ for topic_data in unique_topics_per_article:
93
+ article_number = topic_data.get("Article", "Unknown")
94
+ unique_topics = topic_data.get("Unique Topics", [])
95
+ st.write(f"**Unique Topics in Article {article_number}:** {', '.join(unique_topics) if unique_topics else 'None'}")
 
96
 
97
+ # Display Final LLM-Based Sentiment Analysis
98
  st.write("## Overall Sentiment Summary")
99
  final_llm_summary = comparison_data.get("Final Sentiment Analysis", "No summary available.")
100
  st.info(f"**{final_llm_summary}**")
101
 
102
  else:
103
+ st.error("Error fetching comparative analysis. Please try again.")
 
104
 
105
 
106
  # Generate Hindi Speech Audio
107
  if generate_audio:
108
  st.write("### Hindi Audio Summary")
109
 
 
110
  audio_url = f"{FASTAPI_URL}/generate-audio/?company={company}"
111
  response = requests.get(audio_url)
112
 
113
  if response.status_code == 200:
 
114
  audio_data = response.content
115
 
116
+ # Play the audio
117
  st.audio(audio_data, format="audio/mp3")
118
 
119
+ # Provide a download button
120
+ st.download_button(
121
+ label="Download Hindi Audio",
122
+ data=audio_data,
123
+ file_name="hindi_summary.mp3",
124
+ mime="audio/mpeg"
125
+ )
126
  else:
127
+ st.error("Failed to generate audio. Please try again.")
research/trials.ipynb CHANGED
The diff for this file is too large to render. See raw diff
 
setup.py DELETED
@@ -1,10 +0,0 @@
1
- from setuptools import find_packages, setup
2
-
3
- setup(
4
- name="News-Summarization-and-Text-to-Speech-Application",
5
- version="0.0.1",
6
- author="bijay",
7
- author_email="[email protected]",
8
- packages=find_packages(),
9
- install_requires=["SpeechRecognition","pipwin","pyaudio","gTTS","google-generativeai","python-dotenv","streamlit"]
10
- )
 
 
 
 
 
 
 
 
 
 
 
src/comparison.py CHANGED
@@ -4,43 +4,52 @@ from keybert import KeyBERT
4
  from src.utils import extract_keywords_keybert, analyze_sentiment
5
  from src.summarization import summarize_overall_sentiment
6
 
7
- # ✅ Load necessary models
8
- sentiment_pipeline = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
9
- kw_model = KeyBERT("distilbert-base-nli-mean-tokens")
10
-
11
  def comparison_analysis(articles):
12
- """Compares articles based on sentiment, topics, and provides a final sentiment summary."""
 
 
 
 
 
 
 
 
13
 
14
  if len(articles) < 10:
15
  return {"error": "Not enough articles for a full comparison."}
16
 
17
- # Extract keywords from all 10 articles
18
  article_keywords = [extract_keywords_keybert(article["summary"]) for article in articles]
19
 
20
- # Count occurrences of each keyword
21
  all_keywords = [kw for sublist in article_keywords for kw in sublist]
22
  keyword_counts = Counter(all_keywords)
23
 
24
- # Identify Common & Unique Topics
25
- common_topics = [kw for kw, count in keyword_counts.items() if count >= 3] # Common if in ≥3 articles
26
  unique_topics_per_article = [
27
- {"Article": i+1, "Unique Topics": list(set(article_keywords[i]) - set(common_topics))}
28
  for i in range(len(articles))
29
  ]
30
 
31
- # Sentiment Distribution
32
  sentiments = [analyze_sentiment(article["summary"]) for article in articles]
33
  sentiment_counts = Counter(sentiments)
34
- formatted_counts = {sent.capitalize(): count for sent, count in sentiment_counts.items()} # Proper Case
 
 
35
 
36
- # Determine Overall Sentiment
37
  overall_sentiment = max(sentiment_counts, key=sentiment_counts.get, default="Neutral").capitalize()
38
- sentiment_summary = f"Overall sentiment is {overall_sentiment} ({formatted_counts.get('Negative', 0)} Negative, {formatted_counts.get('Positive', 0)} Positive)."
 
 
 
39
 
40
- # LLM-Based Sentiment Summary
41
  overall_summary = summarize_overall_sentiment(articles)
42
 
43
- # Return the final comparative analysis
44
  return {
45
  "Sentiment Analysis": {
46
  "Sentiment Distribution": formatted_counts,
@@ -50,5 +59,5 @@ def comparison_analysis(articles):
50
  "Common Topics": common_topics,
51
  "Unique Topics Per Article": unique_topics_per_article
52
  },
53
- "Final Sentiment Analysis": overall_summary # LLM-generated summary
54
  }
 
4
  from src.utils import extract_keywords_keybert, analyze_sentiment
5
  from src.summarization import summarize_overall_sentiment
6
 
 
 
 
 
7
  def comparison_analysis(articles):
8
+ """
9
+ Compares articles based on sentiment and topics, providing a final sentiment summary.
10
+
11
+ Args:
12
+ articles (list[dict]): A list of articles, each containing a "summary" key.
13
+
14
+ Returns:
15
+ dict: A dictionary containing sentiment analysis, topic overlap, and final sentiment summary.
16
+ """
17
 
18
  if len(articles) < 10:
19
  return {"error": "Not enough articles for a full comparison."}
20
 
21
+ # Extract keywords from all articles
22
  article_keywords = [extract_keywords_keybert(article["summary"]) for article in articles]
23
 
24
+ # Count occurrences of each keyword
25
  all_keywords = [kw for sublist in article_keywords for kw in sublist]
26
  keyword_counts = Counter(all_keywords)
27
 
28
+ # Identify common and unique topics
29
+ common_topics = [kw for kw, count in keyword_counts.items() if count >= 3] # Common if in ≥3 articles
30
  unique_topics_per_article = [
31
+ {"Article": i + 1, "Unique Topics": list(set(article_keywords[i]) - set(common_topics))}
32
  for i in range(len(articles))
33
  ]
34
 
35
+ # Perform sentiment analysis
36
  sentiments = [analyze_sentiment(article["summary"]) for article in articles]
37
  sentiment_counts = Counter(sentiments)
38
+
39
+ # Format sentiment counts for readability
40
+ formatted_counts = {sent.capitalize(): count for sent, count in sentiment_counts.items()}
41
 
42
+ # Determine overall sentiment
43
  overall_sentiment = max(sentiment_counts, key=sentiment_counts.get, default="Neutral").capitalize()
44
+ sentiment_summary = (
45
+ f"Overall sentiment is {overall_sentiment} "
46
+ f"({formatted_counts.get('Negative', 0)} Negative, {formatted_counts.get('Positive', 0)} Positive)."
47
+ )
48
 
49
+ # Generate LLM-based sentiment summary
50
  overall_summary = summarize_overall_sentiment(articles)
51
 
52
+ # Return the final comparative analysis
53
  return {
54
  "Sentiment Analysis": {
55
  "Sentiment Distribution": formatted_counts,
 
59
  "Common Topics": common_topics,
60
  "Unique Topics Per Article": unique_topics_per_article
61
  },
62
+ "Final Sentiment Analysis": overall_summary # LLM-generated summary
63
  }
src/summarization.py CHANGED
@@ -4,20 +4,28 @@ import groq
4
  working_dir = os.path.dirname(os.path.abspath(__file__))
5
  GROQ_API_KEY = os.environ["GROQ_API_KEY"]
6
 
7
- # Check if API Key is available
8
  if not GROQ_API_KEY:
9
- raise ValueError("🚨 Error: GROQ_API_KEY is missing! Set it as an environment variable.")
10
 
11
- # Initialize Groq Client
12
  client = groq.Groq(api_key=GROQ_API_KEY)
13
 
14
  def summarize_overall_sentiment(articles):
15
- """Uses Groq API (LLaMA-3, Mixtral) to summarize sentiment analysis."""
16
-
17
- # ✅ Concatenate all article summaries
 
 
 
 
 
 
 
 
18
  concatenated_text = " ".join(article["summary"] for article in articles)
19
 
20
- # Define the prompt
21
  prompt = f"""
22
  You are an AI model designed for news sentiment summarization.
23
  Analyze the following news articles and determine the overall sentiment
@@ -28,7 +36,7 @@ def summarize_overall_sentiment(articles):
28
  Provide a concise summary without additional formatting or headers in two paragraphs.
29
  """
30
 
31
- # Use a valid Groq model (Mixtral or LLaMA-3)
32
  response = client.chat.completions.create(
33
  model="mistral-saba-24b",
34
  messages=[
@@ -38,5 +46,5 @@ def summarize_overall_sentiment(articles):
38
  max_tokens=250
39
  )
40
 
41
- # Return a cleaned response (no extra characters)
42
  return response.choices[0].message.content.strip()
 
4
  working_dir = os.path.dirname(os.path.abspath(__file__))
5
  GROQ_API_KEY = os.environ["GROQ_API_KEY"]
6
 
7
+ # Check if API Key is available
8
  if not GROQ_API_KEY:
9
+ raise ValueError("Error: GROQ_API_KEY is missing! Set it as an environment variable.")
10
 
11
+ # Initialize Groq Client
12
  client = groq.Groq(api_key=GROQ_API_KEY)
13
 
14
  def summarize_overall_sentiment(articles):
15
+ """
16
+ Summarizes sentiment analysis using the Groq API (LLaMA-3, Mixtral).
17
+
18
+ Args:
19
+ articles (list[dict]): A list of articles, each containing a "summary" key.
20
+
21
+ Returns:
22
+ str: A concise sentiment summary based on the news articles.
23
+ """
24
+
25
+ # Concatenate all article summaries
26
  concatenated_text = " ".join(article["summary"] for article in articles)
27
 
28
+ # Define the prompt for sentiment summarization
29
  prompt = f"""
30
  You are an AI model designed for news sentiment summarization.
31
  Analyze the following news articles and determine the overall sentiment
 
36
  Provide a concise summary without additional formatting or headers in two paragraphs.
37
  """
38
 
39
+ # Use a valid Groq model (Mixtral or LLaMA-3)
40
  response = client.chat.completions.create(
41
  model="mistral-saba-24b",
42
  messages=[
 
46
  max_tokens=250
47
  )
48
 
49
+ # Return a cleaned response
50
  return response.choices[0].message.content.strip()
src/utils.py CHANGED
@@ -1,7 +1,6 @@
1
  import requests
2
  import io
3
  from bs4 import BeautifulSoup
4
- from collections import Counter
5
  from gtts import gTTS
6
  from deep_translator import GoogleTranslator
7
  from transformers import pipeline
@@ -9,18 +8,29 @@ from keybert import KeyBERT
9
 
10
  # News Extraction
11
  def extract_news(topic):
 
 
 
 
 
 
 
 
 
12
  url = f"https://economictimes.indiatimes.com/topic/{topic}"
13
  headers = {"User-Agent": "Mozilla/5.0"}
14
 
15
- response = requests.get(url, headers=headers)
16
- if response.status_code != 200:
17
- print(f"Failed to retrieve page. Status Code: {response.status_code}")
 
 
18
  return []
19
 
20
  soup = BeautifulSoup(response.text, "html.parser")
21
 
22
  articles = []
23
- article_blocks = soup.find_all("div", class_="clr flt topicstry story_list") # Find all articles
24
 
25
  for article in article_blocks:
26
  title_tag = article.find("a", class_="wrapLines l2")
@@ -34,33 +44,62 @@ def extract_news(topic):
34
  return articles
35
 
36
 
37
- # Sentiment Analysis
38
- sentiment_pipeline = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
 
 
39
 
40
- # Function to analyze sentiment
41
  def analyze_sentiment(text):
42
- result = sentiment_pipeline(text)[0] # Run text through the model
43
- sentiment = result["label"] # Extract sentiment label
44
- return sentiment # Returns 'POSITIVE' or 'NEGATIVE'
 
 
 
 
 
 
 
 
45
 
46
 
47
- # Keyword Extraction
48
- kw_model = KeyBERT('distilbert-base-nli-mean-tokens')
49
 
50
  def extract_keywords_keybert(text):
51
- return [kw[0].title() for kw in kw_model.extract_keywords(text, keyphrase_ngram_range=(1, 2), top_n=3)]
 
52
 
 
 
53
 
 
 
 
 
 
54
 
55
- # Summarized Text to Hindi Speech
 
56
  def generate_hindi_speech(text):
 
 
 
 
 
 
 
 
 
 
 
57
 
58
- hindi_text = GoogleTranslator(source="auto", target="hi").translate(text) # Translate to Hindi
59
  tts = gTTS(hindi_text, lang="hi")
60
 
61
- # Store the speech in memory instead of a file
62
  audio_buffer = io.BytesIO()
63
  tts.write_to_fp(audio_buffer)
64
- audio_buffer.seek(0) # Move to start for playback
65
 
66
  return audio_buffer
 
1
  import requests
2
  import io
3
  from bs4 import BeautifulSoup
 
4
  from gtts import gTTS
5
  from deep_translator import GoogleTranslator
6
  from transformers import pipeline
 
8
 
9
  # News Extraction
10
  def extract_news(topic):
11
+ """
12
+ Extracts news articles related to the given topic from the Economic Times website.
13
+
14
+ Args:
15
+ topic (str): The topic for which news articles need to be extracted.
16
+
17
+ Returns:
18
+ list[dict]: A list of dictionaries containing news titles and summaries.
19
+ """
20
  url = f"https://economictimes.indiatimes.com/topic/{topic}"
21
  headers = {"User-Agent": "Mozilla/5.0"}
22
 
23
+ try:
24
+ response = requests.get(url, headers=headers)
25
+ response.raise_for_status() # Raise an HTTPError for bad responses (4xx and 5xx)
26
+ except requests.RequestException as e:
27
+ print(f"Error fetching news: {e}")
28
  return []
29
 
30
  soup = BeautifulSoup(response.text, "html.parser")
31
 
32
  articles = []
33
+ article_blocks = soup.find_all("div", class_="clr flt topicstry story_list")
34
 
35
  for article in article_blocks:
36
  title_tag = article.find("a", class_="wrapLines l2")
 
44
  return articles
45
 
46
 
47
+ # Sentiment Analysis Pipeline
48
+ sentiment_pipeline = pipeline(
49
+ "sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english"
50
+ )
51
 
 
52
  def analyze_sentiment(text):
53
+ """
54
+ Analyzes the sentiment of the given text using a pre-trained model.
55
+
56
+ Args:
57
+ text (str): The input text to analyze.
58
+
59
+ Returns:
60
+ str: Sentiment label ('POSITIVE' or 'NEGATIVE').
61
+ """
62
+ result = sentiment_pipeline(text)[0] # Process text through the model
63
+ return result["label"] # Extract and return sentiment label
64
 
65
 
66
+ # Keyword Extraction using KeyBERT
67
+ kw_model = KeyBERT("distilbert-base-nli-mean-tokens")
68
 
69
  def extract_keywords_keybert(text):
70
+ """
71
+ Extracts keywords from the given text using KeyBERT.
72
 
73
+ Args:
74
+ text (str): The input text for keyword extraction.
75
 
76
+ Returns:
77
+ list[str]: A list of extracted keywords (title-cased).
78
+ """
79
+ keywords = kw_model.extract_keywords(text, keyphrase_ngram_range=(1, 2), top_n=3)
80
+ return [kw[0].title() for kw in keywords]
81
 
82
+
83
+ # Hindi Speech Generation
84
  def generate_hindi_speech(text):
85
+ """
86
+ Converts the given text into Hindi speech.
87
+
88
+ Args:
89
+ text (str): The input text to be translated and converted to speech.
90
+
91
+ Returns:
92
+ io.BytesIO: A buffer containing the generated speech audio.
93
+ """
94
+ # Translate text to Hindi
95
+ hindi_text = GoogleTranslator(source="auto", target="hi").translate(text)
96
 
97
+ # Convert translated text to speech
98
  tts = gTTS(hindi_text, lang="hi")
99
 
100
+ # Store the generated speech in memory
101
  audio_buffer = io.BytesIO()
102
  tts.write_to_fp(audio_buffer)
103
+ audio_buffer.seek(0) # Reset buffer position for playback
104
 
105
  return audio_buffer