Spaces:
Sleeping
Sleeping
Documentation added
Browse files- README.md +163 -0
- api.py +30 -36
- app.py +56 -51
- research/trials.ipynb +0 -0
- setup.py +0 -10
- src/comparison.py +26 -17
- src/summarization.py +17 -9
- src/utils.py +57 -18
README.md
CHANGED
@@ -6,3 +6,166 @@ colorTo: yellow
|
|
6 |
sdk: docker
|
7 |
pinned: false
|
8 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
sdk: docker
|
7 |
pinned: false
|
8 |
---
|
9 |
+
|
10 |
+
# **Insightful News AI** 🤖
|
11 |
+
|
12 |
+
## Overview
|
13 |
+
Insightful News AI is a FastAPI-based application tool that fetches news articles for a given company, analyzes their sentiment, extracts key topics, and provides Hindi audio summaries. The application uses NLP models for text processing and an AI-powered TTS engine for speech synthesis.
|
14 |
+
|
15 |
+
## **Project Setup**
|
16 |
+
### Prerequisites
|
17 |
+
- Python 3.10 (Virtual Environment: `tts`)
|
18 |
+
- FastAPI
|
19 |
+
- Streamlit
|
20 |
+
- Required Python packages (listed in `requirements.txt`)
|
21 |
+
|
22 |
+
## 🚀 Installation Steps
|
23 |
+
|
24 |
+
### **1️⃣ Clone the Repository**
|
25 |
+
```bash
|
26 |
+
git clone https://github.com/bijaycd/News-Summarization-and-Text-to-Speech-Application.git
|
27 |
+
cd News-Summarization-and-Text-to-Speech-Application
|
28 |
+
```
|
29 |
+
|
30 |
+
### **2️⃣ Create and Activate a Virtual Environment**
|
31 |
+
```bash
|
32 |
+
python -m venv tts
|
33 |
+
source venv/bin/activate # On macOS/Linux
|
34 |
+
venv\Scripts\activate # On Windows
|
35 |
+
```
|
36 |
+
|
37 |
+
### **3️⃣ Install Dependencies**
|
38 |
+
```bash
|
39 |
+
pip install -r requirements.txt
|
40 |
+
```
|
41 |
+
|
42 |
+
### **4️⃣ Set Up Environment Variables**
|
43 |
+
Create a `.env` file and add your API keys:
|
44 |
+
```
|
45 |
+
GROQ_API_KEY=your_api_key_here
|
46 |
+
```
|
47 |
+
|
48 |
+
### **5️⃣ Run the Backend (FastAPI Server)**
|
49 |
+
```bash
|
50 |
+
uvicorn app:app --host 127.0.0.1 --port 8000 --reload
|
51 |
+
```
|
52 |
+
|
53 |
+
### **6️⃣ Run the Frontend (Streamlit UI)**
|
54 |
+
```bash
|
55 |
+
streamlit run app.py
|
56 |
+
```
|
57 |
+
|
58 |
+
---
|
59 |
+
|
60 |
+
## **🧠 Model Details**
|
61 |
+
|
62 |
+
This project uses **three AI models**:
|
63 |
+
|
64 |
+
1. **Summarization Model** (Mistral)
|
65 |
+
- Extracts key insights from news articles.
|
66 |
+
- Uses the `mistral-saba-24b` model for concise summarization.
|
67 |
+
|
68 |
+
2. **Sentiment Analysis Model** (DistilBERT)
|
69 |
+
- Categorizes sentiment as **Positive, Negative, or Neutral**.
|
70 |
+
- Analyzes news sentiment using pre-trained NLP models.
|
71 |
+
|
72 |
+
3. **Text-to-Speech (TTS) Model** (Google Text-to-Speech)
|
73 |
+
- Converts AI-generated summaries into **Hindi speech**.
|
74 |
+
- Uses a custom speech synthesis model.
|
75 |
+
|
76 |
+
---
|
77 |
+
|
78 |
+
## **🛠 API Development**
|
79 |
+
|
80 |
+
### **Endpoints**
|
81 |
+
|
82 |
+
| Method | Endpoint | Description |
|
83 |
+
|--------|---------------------------|-------------|
|
84 |
+
| `GET` | `/` | Welcome message. |
|
85 |
+
| `GET` | `/news-analysis/?company=XYZ` | Extracts news titles, summaries and analyzes sentiments |
|
86 |
+
| `GET` | `/comparative-analyst/?company=XYZ` | Performs a comparative sentiment analysis. |
|
87 |
+
| `GET` | `/generate-audio/?company=XYZ` | Generates a Hindi audio summary. |
|
88 |
+
|
89 |
+
---
|
90 |
+
|
91 |
+
## **📡 API Usage**
|
92 |
+
|
93 |
+
### **1️⃣ Fetch News Sentiment Summary**
|
94 |
+
**Request (Postman, cURL, Python)**
|
95 |
+
```bash
|
96 |
+
curl -X GET "http://127.0.0.1:8000/news-analysis/?company=Google"
|
97 |
+
```
|
98 |
+
**Response (JSON)**
|
99 |
+
```json
|
100 |
+
{
|
101 |
+
"Company": "Google",
|
102 |
+
"Articles": [
|
103 |
+
{
|
104 |
+
"Title": "Google removes 331 malicious apps from Play Store",
|
105 |
+
"Summary": "Vapor Operation infected 331 apps with 60M+ downloads, engaging in ad fraud and phishing...",
|
106 |
+
"Sentiment": "Negative",
|
107 |
+
"Topics": ["Vapor Operation", "Andriod 13", "Security"]
|
108 |
+
}
|
109 |
+
]
|
110 |
+
}
|
111 |
+
```
|
112 |
+
|
113 |
+
### **2️⃣ Comparative Sentiment Analysis**
|
114 |
+
```bash
|
115 |
+
curl -X GET "http://127.0.0.1:8000/comparative-analyst/?company=Google"
|
116 |
+
```
|
117 |
+
**Response Example**
|
118 |
+
```json
|
119 |
+
{
|
120 |
+
"Sentiment Analysis": {
|
121 |
+
"Sentiment Distribution": {"Positive": 5, "Negative": 4, "Neutral": 1},
|
122 |
+
"Final Sentiment Summary": "Overall, Google news has a positive sentiment..."
|
123 |
+
}
|
124 |
+
}
|
125 |
+
```
|
126 |
+
|
127 |
+
### **3️⃣ Generate Hindi Audio Summary**
|
128 |
+
```bash
|
129 |
+
curl -X GET "http://127.0.0.1:8000/generate-audio/?company=Google"
|
130 |
+
```
|
131 |
+
**Response**
|
132 |
+
- Returns an `mp3` file with the Hindi audio summary.
|
133 |
+
|
134 |
+
---
|
135 |
+
|
136 |
+
## **🔗 Third-Party APIs Used**
|
137 |
+
|
138 |
+
| API | Purpose |
|
139 |
+
|-----------|---------|
|
140 |
+
| **Groq API** | Used for text summarization (Mistral). |
|
141 |
+
|
142 |
+
---
|
143 |
+
|
144 |
+
## **⚠ Assumptions & Limitations**
|
145 |
+
|
146 |
+
### **✅ Assumptions**
|
147 |
+
1. **Company Name Input**: The company name entered exists in publicly available news.
|
148 |
+
2. **Sentiment Accuracy**: The sentiment analysis model is trained on general news data but may not capture sarcasm or nuanced sentiment.
|
149 |
+
3. **Keyword Extraction**: Uses KeyBERT for topic extraction, assuming that key topics can be identified from short summaries.
|
150 |
+
|
151 |
+
### **🚨 Limitations**
|
152 |
+
1. **News Data Availability**: If fewer than 10 articles are found, comparative analysis may not be performed.
|
153 |
+
2. **TTS Language Support**: Currently, speech generation is limited to **Hindi** only.
|
154 |
+
3. **Rate Limits**: Using the Groq API requires an API key with rate limits.
|
155 |
+
|
156 |
+
---
|
157 |
+
|
158 |
+
## **📌 Future Enhancements**
|
159 |
+
✅ **Expand TTS support to more languages**
|
160 |
+
✅ **Improve sentiment classification using fine-tuned LLMs**
|
161 |
+
✅ **Enable real-time news updates using WebSockets**
|
162 |
+
|
163 |
+
---
|
164 |
+
|
165 |
+
### **🔗 Contributors**
|
166 |
+
👤 **Bijay Chandra Das**
|
167 |
+
📧 **[email protected]**
|
168 |
+
|
169 |
+
📌 **GitHub Repo**: [GitHub](https://github.com/bijaycd/News-Summarization-and-Text-to-Speech-Application)
|
170 |
+
|
171 |
+
---
|
api.py
CHANGED
@@ -4,76 +4,70 @@ import uvicorn
|
|
4 |
from src.utils import extract_news, analyze_sentiment, extract_keywords_keybert, generate_hindi_speech
|
5 |
from src.comparison import comparison_analysis
|
6 |
from src.summarization import summarize_overall_sentiment
|
7 |
-
|
8 |
|
9 |
app = FastAPI()
|
10 |
|
|
|
11 |
@app.get("/")
|
12 |
-
def home():
|
|
|
13 |
return {"message": "Welcome to the News Analysis API!"}
|
14 |
|
15 |
|
16 |
@app.get("/news-analysis/")
|
17 |
-
def get_news_analysis(company: str):
|
18 |
"""Extracts news, analyzes sentiment, and provides a JSON response."""
|
19 |
articles = extract_news(company)[:10] # Extract first 10 articles
|
|
|
20 |
if not articles:
|
21 |
raise HTTPException(status_code=404, detail="No articles found for the given company.")
|
22 |
|
23 |
-
news_data = {
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
|
36 |
return JSONResponse(content=news_data)
|
37 |
|
38 |
|
39 |
-
|
40 |
@app.get("/comparative-analyst/")
|
41 |
-
def get_comparative_analysis(company: str):
|
42 |
-
|
43 |
-
# ✅ Extract 10 articles
|
44 |
articles = extract_news(company)[:10]
|
45 |
|
46 |
if len(articles) < 10:
|
47 |
raise HTTPException(status_code=400, detail="Not enough articles for a full comparison.")
|
48 |
|
49 |
-
|
50 |
-
comparison_data = comparison_analysis(articles)
|
51 |
|
52 |
return JSONResponse(content=comparison_data)
|
53 |
|
54 |
|
55 |
-
|
56 |
-
# Generate audio summary
|
57 |
@app.get("/generate-audio/")
|
58 |
-
def generate_audio(company: str):
|
59 |
"""Generates a Hindi audio summary using LLM response."""
|
60 |
-
|
61 |
-
# ✅ Extract 10 news articles
|
62 |
articles = extract_news(company)[:10]
|
|
|
63 |
if not articles:
|
64 |
raise HTTPException(status_code=404, detail="No articles found for the given company.")
|
65 |
|
66 |
-
#
|
67 |
-
|
68 |
-
|
69 |
-
# ✅ Convert summary to Hindi speech
|
70 |
-
audio_buffer = generate_hindi_speech(summary_text)
|
71 |
-
|
72 |
-
# ✅ Return only the Hindi audio as a file response
|
73 |
-
return StreamingResponse(audio_buffer, media_type="audio/mpeg", headers={
|
74 |
-
"Content-Disposition": "attachment; filename=hindi_summary.mp3"
|
75 |
-
})
|
76 |
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
|
79 |
if __name__ == "__main__":
|
|
|
4 |
from src.utils import extract_news, analyze_sentiment, extract_keywords_keybert, generate_hindi_speech
|
5 |
from src.comparison import comparison_analysis
|
6 |
from src.summarization import summarize_overall_sentiment
|
7 |
+
from typing import Dict
|
8 |
|
9 |
app = FastAPI()
|
10 |
|
11 |
+
|
12 |
@app.get("/")
|
13 |
+
def home() -> Dict[str, str]:
|
14 |
+
"""Home route for API"""
|
15 |
return {"message": "Welcome to the News Analysis API!"}
|
16 |
|
17 |
|
18 |
@app.get("/news-analysis/")
|
19 |
+
def get_news_analysis(company: str) -> JSONResponse:
|
20 |
"""Extracts news, analyzes sentiment, and provides a JSON response."""
|
21 |
articles = extract_news(company)[:10] # Extract first 10 articles
|
22 |
+
|
23 |
if not articles:
|
24 |
raise HTTPException(status_code=404, detail="No articles found for the given company.")
|
25 |
|
26 |
+
news_data = {
|
27 |
+
"Company": company,
|
28 |
+
"Articles": [
|
29 |
+
{
|
30 |
+
"Title": article.get("title", "No Title"),
|
31 |
+
"Summary": article.get("summary", "No Summary"),
|
32 |
+
"Sentiment": analyze_sentiment(article.get("summary", "")), # Sentiment analysis
|
33 |
+
"Topics": extract_keywords_keybert(article.get("summary", "")) # Extract topics
|
34 |
+
}
|
35 |
+
for article in articles
|
36 |
+
]
|
37 |
+
}
|
38 |
|
39 |
return JSONResponse(content=news_data)
|
40 |
|
41 |
|
|
|
42 |
@app.get("/comparative-analyst/")
|
43 |
+
def get_comparative_analysis(company: str) -> JSONResponse:
|
44 |
+
"""Performs comparative sentiment analysis for a given company."""
|
|
|
45 |
articles = extract_news(company)[:10]
|
46 |
|
47 |
if len(articles) < 10:
|
48 |
raise HTTPException(status_code=400, detail="Not enough articles for a full comparison.")
|
49 |
|
50 |
+
comparison_data = comparison_analysis(articles) # Perform comparative analysis
|
|
|
51 |
|
52 |
return JSONResponse(content=comparison_data)
|
53 |
|
54 |
|
|
|
|
|
55 |
@app.get("/generate-audio/")
|
56 |
+
def generate_audio(company: str) -> StreamingResponse:
|
57 |
"""Generates a Hindi audio summary using LLM response."""
|
|
|
|
|
58 |
articles = extract_news(company)[:10]
|
59 |
+
|
60 |
if not articles:
|
61 |
raise HTTPException(status_code=404, detail="No articles found for the given company.")
|
62 |
|
63 |
+
summary_text = summarize_overall_sentiment(articles) # Generate summary text
|
64 |
+
audio_buffer = generate_hindi_speech(summary_text) # Convert summary to speech
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
|
66 |
+
return StreamingResponse(
|
67 |
+
audio_buffer,
|
68 |
+
media_type="audio/mpeg",
|
69 |
+
headers={"Content-Disposition": "attachment; filename=hindi_summary.mp3"}
|
70 |
+
)
|
71 |
|
72 |
|
73 |
if __name__ == "__main__":
|
app.py
CHANGED
@@ -1,10 +1,11 @@
|
|
1 |
import streamlit as st
|
2 |
import requests
|
3 |
|
|
|
4 |
FASTAPI_URL = "http://127.0.0.1:8000"
|
5 |
|
6 |
-
#
|
7 |
-
st.set_page_config(page_title="Insightful News AI", page_icon
|
8 |
st.title("Sentiment-Driven News Summarization with AI-Powered Speech")
|
9 |
|
10 |
# Sidebar Controls
|
@@ -14,109 +15,113 @@ get_news = st.sidebar.button("Get News Summary")
|
|
14 |
compare_news = st.sidebar.button("Comparative Analysis")
|
15 |
generate_audio = st.sidebar.button("Generate Audio")
|
16 |
|
17 |
-
# Get News Summary
|
18 |
-
if get_news:
|
19 |
-
st.write("## Comany: ", company)
|
20 |
|
21 |
-
|
|
|
|
|
22 |
|
23 |
-
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
-
for i, article in enumerate(news_data["Articles"], start=1):
|
27 |
-
st.write(f"### {i}. {article['Title']}")
|
28 |
-
st.write(f"**Summary:** {article['Summary']}")
|
29 |
-
st.write(f"**Sentiment:** {article['Sentiment']}")
|
30 |
-
st.write(f"**Topics:** {', '.join(article['Topics'])}")
|
31 |
-
st.markdown("---") # Adds a separator between articles
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
else:
|
34 |
st.error("Error fetching news. Please try again.")
|
35 |
|
36 |
|
37 |
-
#
|
38 |
if compare_news:
|
39 |
st.write("## Comparative Analysis")
|
|
|
40 |
|
41 |
-
|
42 |
-
|
43 |
-
if response.status_code == 200:
|
44 |
-
comparison_data = response.json()
|
45 |
-
|
46 |
-
# ✅ Extract overall sentiment analysis
|
47 |
sentiment_data = comparison_data.get("Sentiment Analysis", {})
|
48 |
sentiment_distribution = sentiment_data.get("Sentiment Distribution", {})
|
49 |
final_sentiment = sentiment_data.get("Final Sentiment Summary", "No sentiment summary available.")
|
50 |
|
51 |
-
#
|
52 |
positive_count = sentiment_distribution.get("Positive", 0)
|
53 |
negative_count = sentiment_distribution.get("Negative", 0)
|
54 |
neutral_count = sentiment_distribution.get("Neutral", 0)
|
55 |
|
56 |
-
#
|
57 |
st.write("### Sentiment Distribution")
|
58 |
col1, col2, col3 = st.columns(3)
|
59 |
col1.metric(label="Positive", value=positive_count)
|
60 |
col2.metric(label="Negative", value=negative_count)
|
61 |
col3.metric(label="Neutral", value=neutral_count)
|
62 |
|
63 |
-
#
|
64 |
st.write("### Final Sentiment")
|
65 |
if positive_count > negative_count:
|
66 |
-
st.success(f"**{final_sentiment}**") #
|
67 |
elif positive_count < negative_count:
|
68 |
-
st.error(f"**{final_sentiment}**") #
|
69 |
else:
|
70 |
-
st.warning(f"**{final_sentiment}**") #
|
71 |
|
72 |
-
#
|
73 |
st.write("### Topic Overlap")
|
74 |
topic_overlap = comparison_data.get("Topic Overlap", {})
|
75 |
|
76 |
-
#
|
77 |
common_topics = topic_overlap.get("Common Topics", [])
|
78 |
-
if common_topics
|
79 |
-
st.write(f"**Common Topics (Appearing in ≥3 articles):** {', '.join(common_topics)}")
|
80 |
-
else:
|
81 |
-
st.write("No significant common topics found.")
|
82 |
|
83 |
-
#
|
84 |
unique_topics_per_article = topic_overlap.get("Unique Topics Per Article", [])
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
st.write(f"**Unique Topics in Article {article_number}:** {', '.join(unique_topics) if unique_topics else 'None'}")
|
90 |
|
91 |
-
#
|
92 |
st.write("## Overall Sentiment Summary")
|
93 |
final_llm_summary = comparison_data.get("Final Sentiment Analysis", "No summary available.")
|
94 |
st.info(f"**{final_llm_summary}**")
|
95 |
|
96 |
else:
|
97 |
-
st.error(
|
98 |
-
|
99 |
|
100 |
|
101 |
# Generate Hindi Speech Audio
|
102 |
if generate_audio:
|
103 |
st.write("### Hindi Audio Summary")
|
104 |
|
105 |
-
# ✅ Fetch the audio file from API
|
106 |
audio_url = f"{FASTAPI_URL}/generate-audio/?company={company}"
|
107 |
response = requests.get(audio_url)
|
108 |
|
109 |
if response.status_code == 200:
|
110 |
-
# ✅ Save the audio file in memory
|
111 |
audio_data = response.content
|
112 |
|
113 |
-
#
|
114 |
st.audio(audio_data, format="audio/mp3")
|
115 |
|
116 |
-
#
|
117 |
-
st.download_button(
|
118 |
-
|
119 |
-
|
120 |
-
|
|
|
|
|
121 |
else:
|
122 |
-
st.error("
|
|
|
1 |
import streamlit as st
|
2 |
import requests
|
3 |
|
4 |
+
# Define API URL
|
5 |
FASTAPI_URL = "http://127.0.0.1:8000"
|
6 |
|
7 |
+
# Configure Streamlit app layout
|
8 |
+
st.set_page_config(page_title="Insightful News AI", page_icon="🤖", layout="centered")
|
9 |
st.title("Sentiment-Driven News Summarization with AI-Powered Speech")
|
10 |
|
11 |
# Sidebar Controls
|
|
|
15 |
compare_news = st.sidebar.button("Comparative Analysis")
|
16 |
generate_audio = st.sidebar.button("Generate Audio")
|
17 |
|
|
|
|
|
|
|
18 |
|
19 |
+
def fetch_data(endpoint, params=None):
|
20 |
+
"""
|
21 |
+
Fetch data from the given API endpoint.
|
22 |
|
23 |
+
Args:
|
24 |
+
endpoint (str): API endpoint to fetch data from.
|
25 |
+
params (dict, optional): Query parameters for the request.
|
26 |
+
|
27 |
+
Returns:
|
28 |
+
dict: JSON response if successful, else None.
|
29 |
+
"""
|
30 |
+
response = requests.get(f"{FASTAPI_URL}/{endpoint}", params=params)
|
31 |
+
return response.json() if response.status_code == 200 else None
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
|
34 |
+
# Fetch News Summary
|
35 |
+
if get_news:
|
36 |
+
st.write("## Company: ", company)
|
37 |
+
news_data = fetch_data("news-analysis", {"company": company})
|
38 |
+
|
39 |
+
if news_data:
|
40 |
+
for i, article in enumerate(news_data.get("Articles", []), start=1):
|
41 |
+
st.write(f"### {i}. {article.get('Title', 'No Title')}")
|
42 |
+
st.write(f"**Summary:** {article.get('Summary', 'No Summary')}")
|
43 |
+
st.write(f"**Sentiment:** {article.get('Sentiment', 'Unknown')}")
|
44 |
+
st.write(f"**Topics:** {', '.join(article.get('Topics', []))}")
|
45 |
+
st.markdown("---") # Separator between articles
|
46 |
else:
|
47 |
st.error("Error fetching news. Please try again.")
|
48 |
|
49 |
|
50 |
+
# Comparative Analysis
|
51 |
if compare_news:
|
52 |
st.write("## Comparative Analysis")
|
53 |
+
comparison_data = fetch_data("comparative-analyst", {"company": company})
|
54 |
|
55 |
+
if comparison_data:
|
56 |
+
# Extract overall sentiment analysis
|
|
|
|
|
|
|
|
|
57 |
sentiment_data = comparison_data.get("Sentiment Analysis", {})
|
58 |
sentiment_distribution = sentiment_data.get("Sentiment Distribution", {})
|
59 |
final_sentiment = sentiment_data.get("Final Sentiment Summary", "No sentiment summary available.")
|
60 |
|
61 |
+
# Extract sentiment counts safely
|
62 |
positive_count = sentiment_distribution.get("Positive", 0)
|
63 |
negative_count = sentiment_distribution.get("Negative", 0)
|
64 |
neutral_count = sentiment_distribution.get("Neutral", 0)
|
65 |
|
66 |
+
# Display sentiment distribution
|
67 |
st.write("### Sentiment Distribution")
|
68 |
col1, col2, col3 = st.columns(3)
|
69 |
col1.metric(label="Positive", value=positive_count)
|
70 |
col2.metric(label="Negative", value=negative_count)
|
71 |
col3.metric(label="Neutral", value=neutral_count)
|
72 |
|
73 |
+
# Display final sentiment summary
|
74 |
st.write("### Final Sentiment")
|
75 |
if positive_count > negative_count:
|
76 |
+
st.success(f"**{final_sentiment}**") # Green for Positive
|
77 |
elif positive_count < negative_count:
|
78 |
+
st.error(f"**{final_sentiment}**") # Red for Negative
|
79 |
else:
|
80 |
+
st.warning(f"**{final_sentiment}**") # Yellow for Neutral
|
81 |
|
82 |
+
# Display topic overlap
|
83 |
st.write("### Topic Overlap")
|
84 |
topic_overlap = comparison_data.get("Topic Overlap", {})
|
85 |
|
86 |
+
# Display common topics
|
87 |
common_topics = topic_overlap.get("Common Topics", [])
|
88 |
+
st.write(f"**Common Topics (Appearing in ≥3 articles):** {', '.join(common_topics) if common_topics else 'None'}")
|
|
|
|
|
|
|
89 |
|
90 |
+
# Display unique topics per article
|
91 |
unique_topics_per_article = topic_overlap.get("Unique Topics Per Article", [])
|
92 |
+
for topic_data in unique_topics_per_article:
|
93 |
+
article_number = topic_data.get("Article", "Unknown")
|
94 |
+
unique_topics = topic_data.get("Unique Topics", [])
|
95 |
+
st.write(f"**Unique Topics in Article {article_number}:** {', '.join(unique_topics) if unique_topics else 'None'}")
|
|
|
96 |
|
97 |
+
# Display Final LLM-Based Sentiment Analysis
|
98 |
st.write("## Overall Sentiment Summary")
|
99 |
final_llm_summary = comparison_data.get("Final Sentiment Analysis", "No summary available.")
|
100 |
st.info(f"**{final_llm_summary}**")
|
101 |
|
102 |
else:
|
103 |
+
st.error("Error fetching comparative analysis. Please try again.")
|
|
|
104 |
|
105 |
|
106 |
# Generate Hindi Speech Audio
|
107 |
if generate_audio:
|
108 |
st.write("### Hindi Audio Summary")
|
109 |
|
|
|
110 |
audio_url = f"{FASTAPI_URL}/generate-audio/?company={company}"
|
111 |
response = requests.get(audio_url)
|
112 |
|
113 |
if response.status_code == 200:
|
|
|
114 |
audio_data = response.content
|
115 |
|
116 |
+
# Play the audio
|
117 |
st.audio(audio_data, format="audio/mp3")
|
118 |
|
119 |
+
# Provide a download button
|
120 |
+
st.download_button(
|
121 |
+
label="Download Hindi Audio",
|
122 |
+
data=audio_data,
|
123 |
+
file_name="hindi_summary.mp3",
|
124 |
+
mime="audio/mpeg"
|
125 |
+
)
|
126 |
else:
|
127 |
+
st.error("Failed to generate audio. Please try again.")
|
research/trials.ipynb
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
setup.py
DELETED
@@ -1,10 +0,0 @@
|
|
1 |
-
from setuptools import find_packages, setup
|
2 |
-
|
3 |
-
setup(
|
4 |
-
name="News-Summarization-and-Text-to-Speech-Application",
|
5 |
-
version="0.0.1",
|
6 |
-
author="bijay",
|
7 |
-
author_email="[email protected]",
|
8 |
-
packages=find_packages(),
|
9 |
-
install_requires=["SpeechRecognition","pipwin","pyaudio","gTTS","google-generativeai","python-dotenv","streamlit"]
|
10 |
-
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/comparison.py
CHANGED
@@ -4,43 +4,52 @@ from keybert import KeyBERT
|
|
4 |
from src.utils import extract_keywords_keybert, analyze_sentiment
|
5 |
from src.summarization import summarize_overall_sentiment
|
6 |
|
7 |
-
# ✅ Load necessary models
|
8 |
-
sentiment_pipeline = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
|
9 |
-
kw_model = KeyBERT("distilbert-base-nli-mean-tokens")
|
10 |
-
|
11 |
def comparison_analysis(articles):
|
12 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
if len(articles) < 10:
|
15 |
return {"error": "Not enough articles for a full comparison."}
|
16 |
|
17 |
-
#
|
18 |
article_keywords = [extract_keywords_keybert(article["summary"]) for article in articles]
|
19 |
|
20 |
-
#
|
21 |
all_keywords = [kw for sublist in article_keywords for kw in sublist]
|
22 |
keyword_counts = Counter(all_keywords)
|
23 |
|
24 |
-
#
|
25 |
-
common_topics = [kw for kw, count in keyword_counts.items() if count >= 3] #
|
26 |
unique_topics_per_article = [
|
27 |
-
{"Article": i+1, "Unique Topics": list(set(article_keywords[i]) - set(common_topics))}
|
28 |
for i in range(len(articles))
|
29 |
]
|
30 |
|
31 |
-
#
|
32 |
sentiments = [analyze_sentiment(article["summary"]) for article in articles]
|
33 |
sentiment_counts = Counter(sentiments)
|
34 |
-
|
|
|
|
|
35 |
|
36 |
-
#
|
37 |
overall_sentiment = max(sentiment_counts, key=sentiment_counts.get, default="Neutral").capitalize()
|
38 |
-
sentiment_summary =
|
|
|
|
|
|
|
39 |
|
40 |
-
#
|
41 |
overall_summary = summarize_overall_sentiment(articles)
|
42 |
|
43 |
-
#
|
44 |
return {
|
45 |
"Sentiment Analysis": {
|
46 |
"Sentiment Distribution": formatted_counts,
|
@@ -50,5 +59,5 @@ def comparison_analysis(articles):
|
|
50 |
"Common Topics": common_topics,
|
51 |
"Unique Topics Per Article": unique_topics_per_article
|
52 |
},
|
53 |
-
"Final Sentiment Analysis": overall_summary #
|
54 |
}
|
|
|
4 |
from src.utils import extract_keywords_keybert, analyze_sentiment
|
5 |
from src.summarization import summarize_overall_sentiment
|
6 |
|
|
|
|
|
|
|
|
|
7 |
def comparison_analysis(articles):
|
8 |
+
"""
|
9 |
+
Compares articles based on sentiment and topics, providing a final sentiment summary.
|
10 |
+
|
11 |
+
Args:
|
12 |
+
articles (list[dict]): A list of articles, each containing a "summary" key.
|
13 |
+
|
14 |
+
Returns:
|
15 |
+
dict: A dictionary containing sentiment analysis, topic overlap, and final sentiment summary.
|
16 |
+
"""
|
17 |
|
18 |
if len(articles) < 10:
|
19 |
return {"error": "Not enough articles for a full comparison."}
|
20 |
|
21 |
+
# Extract keywords from all articles
|
22 |
article_keywords = [extract_keywords_keybert(article["summary"]) for article in articles]
|
23 |
|
24 |
+
# Count occurrences of each keyword
|
25 |
all_keywords = [kw for sublist in article_keywords for kw in sublist]
|
26 |
keyword_counts = Counter(all_keywords)
|
27 |
|
28 |
+
# Identify common and unique topics
|
29 |
+
common_topics = [kw for kw, count in keyword_counts.items() if count >= 3] # Common if in ≥3 articles
|
30 |
unique_topics_per_article = [
|
31 |
+
{"Article": i + 1, "Unique Topics": list(set(article_keywords[i]) - set(common_topics))}
|
32 |
for i in range(len(articles))
|
33 |
]
|
34 |
|
35 |
+
# Perform sentiment analysis
|
36 |
sentiments = [analyze_sentiment(article["summary"]) for article in articles]
|
37 |
sentiment_counts = Counter(sentiments)
|
38 |
+
|
39 |
+
# Format sentiment counts for readability
|
40 |
+
formatted_counts = {sent.capitalize(): count for sent, count in sentiment_counts.items()}
|
41 |
|
42 |
+
# Determine overall sentiment
|
43 |
overall_sentiment = max(sentiment_counts, key=sentiment_counts.get, default="Neutral").capitalize()
|
44 |
+
sentiment_summary = (
|
45 |
+
f"Overall sentiment is {overall_sentiment} "
|
46 |
+
f"({formatted_counts.get('Negative', 0)} Negative, {formatted_counts.get('Positive', 0)} Positive)."
|
47 |
+
)
|
48 |
|
49 |
+
# Generate LLM-based sentiment summary
|
50 |
overall_summary = summarize_overall_sentiment(articles)
|
51 |
|
52 |
+
# Return the final comparative analysis
|
53 |
return {
|
54 |
"Sentiment Analysis": {
|
55 |
"Sentiment Distribution": formatted_counts,
|
|
|
59 |
"Common Topics": common_topics,
|
60 |
"Unique Topics Per Article": unique_topics_per_article
|
61 |
},
|
62 |
+
"Final Sentiment Analysis": overall_summary # LLM-generated summary
|
63 |
}
|
src/summarization.py
CHANGED
@@ -4,20 +4,28 @@ import groq
|
|
4 |
working_dir = os.path.dirname(os.path.abspath(__file__))
|
5 |
GROQ_API_KEY = os.environ["GROQ_API_KEY"]
|
6 |
|
7 |
-
#
|
8 |
if not GROQ_API_KEY:
|
9 |
-
raise ValueError("
|
10 |
|
11 |
-
#
|
12 |
client = groq.Groq(api_key=GROQ_API_KEY)
|
13 |
|
14 |
def summarize_overall_sentiment(articles):
|
15 |
-
"""
|
16 |
-
|
17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
concatenated_text = " ".join(article["summary"] for article in articles)
|
19 |
|
20 |
-
#
|
21 |
prompt = f"""
|
22 |
You are an AI model designed for news sentiment summarization.
|
23 |
Analyze the following news articles and determine the overall sentiment
|
@@ -28,7 +36,7 @@ def summarize_overall_sentiment(articles):
|
|
28 |
Provide a concise summary without additional formatting or headers in two paragraphs.
|
29 |
"""
|
30 |
|
31 |
-
#
|
32 |
response = client.chat.completions.create(
|
33 |
model="mistral-saba-24b",
|
34 |
messages=[
|
@@ -38,5 +46,5 @@ def summarize_overall_sentiment(articles):
|
|
38 |
max_tokens=250
|
39 |
)
|
40 |
|
41 |
-
#
|
42 |
return response.choices[0].message.content.strip()
|
|
|
4 |
working_dir = os.path.dirname(os.path.abspath(__file__))
|
5 |
GROQ_API_KEY = os.environ["GROQ_API_KEY"]
|
6 |
|
7 |
+
# Check if API Key is available
|
8 |
if not GROQ_API_KEY:
|
9 |
+
raise ValueError("Error: GROQ_API_KEY is missing! Set it as an environment variable.")
|
10 |
|
11 |
+
# Initialize Groq Client
|
12 |
client = groq.Groq(api_key=GROQ_API_KEY)
|
13 |
|
14 |
def summarize_overall_sentiment(articles):
|
15 |
+
"""
|
16 |
+
Summarizes sentiment analysis using the Groq API (LLaMA-3, Mixtral).
|
17 |
+
|
18 |
+
Args:
|
19 |
+
articles (list[dict]): A list of articles, each containing a "summary" key.
|
20 |
+
|
21 |
+
Returns:
|
22 |
+
str: A concise sentiment summary based on the news articles.
|
23 |
+
"""
|
24 |
+
|
25 |
+
# Concatenate all article summaries
|
26 |
concatenated_text = " ".join(article["summary"] for article in articles)
|
27 |
|
28 |
+
# Define the prompt for sentiment summarization
|
29 |
prompt = f"""
|
30 |
You are an AI model designed for news sentiment summarization.
|
31 |
Analyze the following news articles and determine the overall sentiment
|
|
|
36 |
Provide a concise summary without additional formatting or headers in two paragraphs.
|
37 |
"""
|
38 |
|
39 |
+
# Use a valid Groq model (Mixtral or LLaMA-3)
|
40 |
response = client.chat.completions.create(
|
41 |
model="mistral-saba-24b",
|
42 |
messages=[
|
|
|
46 |
max_tokens=250
|
47 |
)
|
48 |
|
49 |
+
# Return a cleaned response
|
50 |
return response.choices[0].message.content.strip()
|
src/utils.py
CHANGED
@@ -1,7 +1,6 @@
|
|
1 |
import requests
|
2 |
import io
|
3 |
from bs4 import BeautifulSoup
|
4 |
-
from collections import Counter
|
5 |
from gtts import gTTS
|
6 |
from deep_translator import GoogleTranslator
|
7 |
from transformers import pipeline
|
@@ -9,18 +8,29 @@ from keybert import KeyBERT
|
|
9 |
|
10 |
# News Extraction
|
11 |
def extract_news(topic):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
url = f"https://economictimes.indiatimes.com/topic/{topic}"
|
13 |
headers = {"User-Agent": "Mozilla/5.0"}
|
14 |
|
15 |
-
|
16 |
-
|
17 |
-
|
|
|
|
|
18 |
return []
|
19 |
|
20 |
soup = BeautifulSoup(response.text, "html.parser")
|
21 |
|
22 |
articles = []
|
23 |
-
article_blocks = soup.find_all("div", class_="clr flt topicstry story_list")
|
24 |
|
25 |
for article in article_blocks:
|
26 |
title_tag = article.find("a", class_="wrapLines l2")
|
@@ -34,33 +44,62 @@ def extract_news(topic):
|
|
34 |
return articles
|
35 |
|
36 |
|
37 |
-
# Sentiment Analysis
|
38 |
-
sentiment_pipeline = pipeline(
|
|
|
|
|
39 |
|
40 |
-
# Function to analyze sentiment
|
41 |
def analyze_sentiment(text):
|
42 |
-
|
43 |
-
sentiment
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
|
47 |
-
# Keyword Extraction
|
48 |
-
kw_model = KeyBERT(
|
49 |
|
50 |
def extract_keywords_keybert(text):
|
51 |
-
|
|
|
52 |
|
|
|
|
|
53 |
|
|
|
|
|
|
|
|
|
|
|
54 |
|
55 |
-
|
|
|
56 |
def generate_hindi_speech(text):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
-
|
59 |
tts = gTTS(hindi_text, lang="hi")
|
60 |
|
61 |
-
# Store the speech in memory
|
62 |
audio_buffer = io.BytesIO()
|
63 |
tts.write_to_fp(audio_buffer)
|
64 |
-
audio_buffer.seek(0) #
|
65 |
|
66 |
return audio_buffer
|
|
|
1 |
import requests
|
2 |
import io
|
3 |
from bs4 import BeautifulSoup
|
|
|
4 |
from gtts import gTTS
|
5 |
from deep_translator import GoogleTranslator
|
6 |
from transformers import pipeline
|
|
|
8 |
|
9 |
# News Extraction
|
10 |
def extract_news(topic):
|
11 |
+
"""
|
12 |
+
Extracts news articles related to the given topic from the Economic Times website.
|
13 |
+
|
14 |
+
Args:
|
15 |
+
topic (str): The topic for which news articles need to be extracted.
|
16 |
+
|
17 |
+
Returns:
|
18 |
+
list[dict]: A list of dictionaries containing news titles and summaries.
|
19 |
+
"""
|
20 |
url = f"https://economictimes.indiatimes.com/topic/{topic}"
|
21 |
headers = {"User-Agent": "Mozilla/5.0"}
|
22 |
|
23 |
+
try:
|
24 |
+
response = requests.get(url, headers=headers)
|
25 |
+
response.raise_for_status() # Raise an HTTPError for bad responses (4xx and 5xx)
|
26 |
+
except requests.RequestException as e:
|
27 |
+
print(f"Error fetching news: {e}")
|
28 |
return []
|
29 |
|
30 |
soup = BeautifulSoup(response.text, "html.parser")
|
31 |
|
32 |
articles = []
|
33 |
+
article_blocks = soup.find_all("div", class_="clr flt topicstry story_list")
|
34 |
|
35 |
for article in article_blocks:
|
36 |
title_tag = article.find("a", class_="wrapLines l2")
|
|
|
44 |
return articles
|
45 |
|
46 |
|
47 |
+
# Sentiment Analysis Pipeline
|
48 |
+
sentiment_pipeline = pipeline(
|
49 |
+
"sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english"
|
50 |
+
)
|
51 |
|
|
|
52 |
def analyze_sentiment(text):
|
53 |
+
"""
|
54 |
+
Analyzes the sentiment of the given text using a pre-trained model.
|
55 |
+
|
56 |
+
Args:
|
57 |
+
text (str): The input text to analyze.
|
58 |
+
|
59 |
+
Returns:
|
60 |
+
str: Sentiment label ('POSITIVE' or 'NEGATIVE').
|
61 |
+
"""
|
62 |
+
result = sentiment_pipeline(text)[0] # Process text through the model
|
63 |
+
return result["label"] # Extract and return sentiment label
|
64 |
|
65 |
|
66 |
+
# Keyword Extraction using KeyBERT
|
67 |
+
kw_model = KeyBERT("distilbert-base-nli-mean-tokens")
|
68 |
|
69 |
def extract_keywords_keybert(text):
|
70 |
+
"""
|
71 |
+
Extracts keywords from the given text using KeyBERT.
|
72 |
|
73 |
+
Args:
|
74 |
+
text (str): The input text for keyword extraction.
|
75 |
|
76 |
+
Returns:
|
77 |
+
list[str]: A list of extracted keywords (title-cased).
|
78 |
+
"""
|
79 |
+
keywords = kw_model.extract_keywords(text, keyphrase_ngram_range=(1, 2), top_n=3)
|
80 |
+
return [kw[0].title() for kw in keywords]
|
81 |
|
82 |
+
|
83 |
+
# Hindi Speech Generation
|
84 |
def generate_hindi_speech(text):
|
85 |
+
"""
|
86 |
+
Converts the given text into Hindi speech.
|
87 |
+
|
88 |
+
Args:
|
89 |
+
text (str): The input text to be translated and converted to speech.
|
90 |
+
|
91 |
+
Returns:
|
92 |
+
io.BytesIO: A buffer containing the generated speech audio.
|
93 |
+
"""
|
94 |
+
# Translate text to Hindi
|
95 |
+
hindi_text = GoogleTranslator(source="auto", target="hi").translate(text)
|
96 |
|
97 |
+
# Convert translated text to speech
|
98 |
tts = gTTS(hindi_text, lang="hi")
|
99 |
|
100 |
+
# Store the generated speech in memory
|
101 |
audio_buffer = io.BytesIO()
|
102 |
tts.write_to_fp(audio_buffer)
|
103 |
+
audio_buffer.seek(0) # Reset buffer position for playback
|
104 |
|
105 |
return audio_buffer
|