Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ license: mpl-2.0
|
|
9 |
short_description: চা খাবা?
|
10 |
---
|
11 |
<div align="center">
|
12 |
-
<img src="https://
|
13 |
|
14 |
# শব্দনিক | Shôbdhonic
|
15 |
|
@@ -20,7 +20,9 @@ short_description: চা খাবা?
|
|
20 |
[](https://shobdhonic.com)
|
21 |
[](https://discord.gg/shobdhonic)
|
22 |
[](https://twitter.com/Shobdhonic)
|
23 |
-
|
|
|
|
|
24 |
</div>
|
25 |
|
26 |
---
|
@@ -29,17 +31,40 @@ short_description: চা খাবা?
|
|
29 |
A **next-gen Bangla NLP platform** built for:
|
30 |
- 🔥 **Gen-Z Creators**: Meme generators, slang translators, TikTok/Reels integrations
|
31 |
- 🏢 **Enterprises**: Sentiment analysis, fraud detection, document processing
|
32 |
-
- 🇧🇩 **Cultural Preservation**: Digitize literature, dialects, and oral histories
|
|
|
|
|
33 |
|
34 |
---
|
35 |
|
36 |
## ✨ **Key Features**
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
| **
|
41 |
-
| **
|
42 |
-
| **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
---
|
45 |
|
@@ -50,6 +75,8 @@ A **next-gen Bangla NLP platform** built for:
|
|
50 |
| Primary | `#6A5ACD` |  |
|
51 |
| Secondary | `#FF69B4` |  |
|
52 |
| Accent | `#00FFE0` |  |
|
|
|
|
|
53 |
|
54 |
### **Mascot**
|
55 |
**বর্গ�� বট (Borgi Bot)** – Our street-smart AI mascot for Gen-Z campaigns:
|
@@ -61,44 +88,208 @@ A **next-gen Bangla NLP platform** built for:
|
|
61 |
### **Prerequisites**
|
62 |
- Python 3.10+ / Node.js 18+
|
63 |
- Hugging Face API Key (Register [here](https://huggingface.co/Shobdhonic))
|
|
|
|
|
64 |
|
65 |
### **Installation**
|
|
|
66 |
```bash
|
67 |
# Clone repo
|
68 |
git clone https://github.com/Shobdhonic/core-engine.git
|
69 |
cd core-engine
|
70 |
|
|
|
|
|
|
|
|
|
71 |
# Install dependencies (Python)
|
72 |
pip install -r requirements.txt
|
73 |
|
74 |
# Or for Node.js
|
75 |
npm install
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
```
|
77 |
|
78 |
### **Generate Your First Meme**
|
79 |
```python
|
80 |
from shobdhonic import MemeMaster
|
81 |
|
82 |
-
|
|
|
|
|
|
|
|
|
83 |
text="একটা চা আর হয়না? ☕",
|
84 |
-
template="cha_kaku"
|
|
|
|
|
|
|
85 |
)
|
86 |
-
|
|
|
|
|
|
|
|
|
|
|
87 |
```
|
88 |
|
89 |
-
### **
|
90 |
```python
|
91 |
from shobdhonic import VoiceForge
|
|
|
92 |
|
93 |
-
voice
|
|
|
|
|
|
|
|
|
94 |
target_voice="bappa_sir", # Popular Bangla YouTuber
|
95 |
-
text="ভাই, লাইক আর সাবস্ক্রাইব মনে হয়না!"
|
|
|
|
|
|
|
|
|
96 |
)
|
|
|
|
|
97 |
voice.play()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
98 |
```
|
99 |
|
100 |
---
|
101 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
## 📊 **Enterprise Solutions**
|
103 |
<div align="center">
|
104 |
<a href="https://shobdhonic.com/enterprise">
|
@@ -106,28 +297,310 @@ voice.play()
|
|
106 |
</a>
|
107 |
</div>
|
108 |
|
109 |
-
|
110 |
-
-
|
111 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
112 |
|
113 |
---
|
114 |
|
115 |
## 🤝 **Contribute to Bangla AI**
|
116 |
-
|
117 |
-
|
118 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
119 |
|
120 |
---
|
121 |
|
122 |
## 📜 **License & Ethics**
|
123 |
```text
|
124 |
MIT License | © 2024 Shôbdhonic
|
|
|
125 |
*Bangla Data Ethics Pledge:*
|
126 |
- No misuse of dialects/regional languages
|
127 |
- Cite sources like Ittefaq/Prothom Alo
|
128 |
-
- Free access for non-profits/NGOs
|
|
|
|
|
129 |
```
|
130 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
131 |
---
|
132 |
|
133 |
## 🌐 **Connect**
|
@@ -136,6 +609,8 @@ MIT License | © 2024 Shôbdhonic
|
|
136 |
[](https://huggingface.co/Shobdhonic)
|
137 |
[](https://youtube.com/Shobdhonic)
|
138 |
[](https://linkedin.com/company/Shobdhonic)
|
|
|
|
|
139 |
|
140 |
</div>
|
141 |
|
|
|
9 |
short_description: চা খাবা?
|
10 |
---
|
11 |
<div align="center">
|
12 |
+
<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67497128927b345d1345e9de/69fZeWPoXB20L7do9nZDY.png" width="300" alt="Shôbdhonic Logo">
|
13 |
|
14 |
# শব্দনিক | Shôbdhonic
|
15 |
|
|
|
20 |
[](https://shobdhonic.com)
|
21 |
[](https://discord.gg/shobdhonic)
|
22 |
[](https://twitter.com/Shobdhonic)
|
23 |
+
[](https://t.me/Shobdhonic)
|
24 |
+
[](https://github.com/Shobdhonic)
|
25 |
+
[](https://huggingface.co/Shobdhonic)
|
26 |
</div>
|
27 |
|
28 |
---
|
|
|
31 |
A **next-gen Bangla NLP platform** built for:
|
32 |
- 🔥 **Gen-Z Creators**: Meme generators, slang translators, TikTok/Reels integrations
|
33 |
- 🏢 **Enterprises**: Sentiment analysis, fraud detection, document processing
|
34 |
+
- 🇧🇩 **Cultural Preservation**: Digitize literature, dialects, and oral histories
|
35 |
+
- 🧠 **Research**: Advanced Bangla language models, transformer architectures, and fine-tuning pipelines
|
36 |
+
- 🌐 **Web3**: Blockchain integration for digital Bangla content authentication
|
37 |
|
38 |
---
|
39 |
|
40 |
## ✨ **Key Features**
|
41 |
+
|
42 |
+
| **Category** | **Tools** |
|
43 |
+
|-----------------------|------------------------------------------------------------------------------------|
|
44 |
+
| **Gen-Z Playground** | `MemeGPT` • `Slang Translator` • `AI Rap Generator` • `Voice Filters` • `TikTok Content API` |
|
45 |
+
| **Enterprise NLP** | `Legal Doc Analyzer` • `News Sentiment API` • `Plagiarism Checker` • `Customer Service Bot` • `Bangla Data OCR` |
|
46 |
+
| **Voice Lab** | `Celebrity Voice Cloning` • `Regional Accent TTS` • `Audio Transcription` • `Dialect Analysis` • `Emotion Detection` |
|
47 |
+
| **Real-Time AI** | `Trend Predictor` • `Social Media Pulse` • `Ittefaq News Scanner` • `Market Sentiment Analysis` • `Election Opinion Tracker` |
|
48 |
+
| **Academia** | `Literature Analysis` • `Academic Paper Assistant` • `Educational Content Generator` • `Bangla Research Corpus` |
|
49 |
+
| **Security Suite** | `Bangla Fraud Detection` • `Phishing Text Analysis` • `Disinformation Tracker` • `Financial Alert System` |
|
50 |
+
|
51 |
+
---
|
52 |
+
|
53 |
+
## 🎯 **Core Technologies**
|
54 |
+
|
55 |
+
### **Models Architecture**
|
56 |
+
- **ShobdhoBERT**: Transformer-based model trained on 5TB of Bangla text corpus
|
57 |
+
- **ShobdhoGPT-3.5**: GPT-based generative model fine-tuned on diverse Bangla content
|
58 |
+
- **DialectDiffusion**: Voice synthesis specialized for regional Bangla dialects
|
59 |
+
- **BanglaLLM-7B**: Large Language Model optimized for Bangla instruction following
|
60 |
+
- **Multimodal-Bangla**: Vision-language model for Bangla image-text understanding
|
61 |
+
|
62 |
+
### **Data Processing Pipeline**
|
63 |
+
- Proprietary text normalization for Bangla script variations
|
64 |
+
- Context-aware slang detection and interpretation
|
65 |
+
- Real-time news corpus analysis with automated categorization
|
66 |
+
- Specialized tokenization for Bangla script with compound word handling
|
67 |
+
- Advanced sentiment analysis for cultural nuances
|
68 |
|
69 |
---
|
70 |
|
|
|
75 |
| Primary | `#6A5ACD` |  |
|
76 |
| Secondary | `#FF69B4` |  |
|
77 |
| Accent | `#00FFE0` |  |
|
78 |
+
| Dark Mode | `#1A1A2E` |  |
|
79 |
+
| Light Mode | `#F5F5F7` |  |
|
80 |
|
81 |
### **Mascot**
|
82 |
**বর্গ�� বট (Borgi Bot)** – Our street-smart AI mascot for Gen-Z campaigns:
|
|
|
88 |
### **Prerequisites**
|
89 |
- Python 3.10+ / Node.js 18+
|
90 |
- Hugging Face API Key (Register [here](https://huggingface.co/Shobdhonic))
|
91 |
+
- Docker (optional, for containerized deployment)
|
92 |
+
- GPU acceleration (recommended for model training/inference)
|
93 |
|
94 |
### **Installation**
|
95 |
+
|
96 |
```bash
|
97 |
# Clone repo
|
98 |
git clone https://github.com/Shobdhonic/core-engine.git
|
99 |
cd core-engine
|
100 |
|
101 |
+
# Create virtual environment
|
102 |
+
python -m venv shobdhonic-env
|
103 |
+
source shobdhonic-env/bin/activate # On Windows: shobdhonic-env\Scripts\activate
|
104 |
+
|
105 |
# Install dependencies (Python)
|
106 |
pip install -r requirements.txt
|
107 |
|
108 |
# Or for Node.js
|
109 |
npm install
|
110 |
+
|
111 |
+
# Set up environment variables
|
112 |
+
cp .env.example .env
|
113 |
+
# Edit .env with your API keys
|
114 |
+
```
|
115 |
+
|
116 |
+
### **Docker Setup**
|
117 |
+
```bash
|
118 |
+
# Build the Docker image
|
119 |
+
docker build -t shobdhonic:latest .
|
120 |
+
|
121 |
+
# Run the container
|
122 |
+
docker run -p 8000:8000 -v $(pwd):/app --env-file .env shobdhonic:latest
|
123 |
```
|
124 |
|
125 |
### **Generate Your First Meme**
|
126 |
```python
|
127 |
from shobdhonic import MemeMaster
|
128 |
|
129 |
+
# Initialize with your API key
|
130 |
+
meme_api = MemeMaster(api_key="your_api_key_here")
|
131 |
+
|
132 |
+
# Create a meme with custom text and template
|
133 |
+
meme = meme_api.create(
|
134 |
text="একটা চা আর হয়না? ☕",
|
135 |
+
template="cha_kaku",
|
136 |
+
style="viral", # Options: viral, minimal, dramatic, retro
|
137 |
+
font="bangla_classic",
|
138 |
+
format="jpg" # Options: jpg, png, gif, mp4
|
139 |
)
|
140 |
+
|
141 |
+
# Save the meme
|
142 |
+
meme.download("output/cha_kaku_meme.jpg")
|
143 |
+
|
144 |
+
# Share directly to social media
|
145 |
+
meme.share(platform="facebook") # Options: facebook, twitter, instagram, whatsapp
|
146 |
```
|
147 |
|
148 |
+
### **Advanced Voice Cloning**
|
149 |
```python
|
150 |
from shobdhonic import VoiceForge
|
151 |
+
import numpy as np
|
152 |
|
153 |
+
# Initialize voice engine
|
154 |
+
voice_api = VoiceForge(api_key="your_api_key_here")
|
155 |
+
|
156 |
+
# Clone a voice with emotion parameters
|
157 |
+
voice = voice_api.clone(
|
158 |
target_voice="bappa_sir", # Popular Bangla YouTuber
|
159 |
+
text="ভাই, লাইক আর সাবস্ক্রাইব মনে হয়না!",
|
160 |
+
emotion="excited", # Options: neutral, sad, excited, angry, persuasive
|
161 |
+
dialect="dhaka", # Options: dhaka, chittagong, sylhet, rajshahi, khulna, barishal
|
162 |
+
speed=1.2, # Playback speed multiplier (0.5 - 2.0)
|
163 |
+
pitch_shift=0.3 # Adjust pitch (-1.0 to 1.0)
|
164 |
)
|
165 |
+
|
166 |
+
# Play the generated audio
|
167 |
voice.play()
|
168 |
+
|
169 |
+
# Save to file
|
170 |
+
voice.save("output/bappa_youtube_promo.mp3")
|
171 |
+
|
172 |
+
# Get waveform data for further processing
|
173 |
+
waveform = voice.get_waveform()
|
174 |
+
frequencies = np.fft.fft(waveform)
|
175 |
+
```
|
176 |
+
|
177 |
+
### **News Sentiment Analysis**
|
178 |
+
```python
|
179 |
+
from shobdhonic import NewsAnalyzer
|
180 |
+
import pandas as pd
|
181 |
+
import matplotlib.pyplot as plt
|
182 |
+
|
183 |
+
# Initialize news analyzer
|
184 |
+
news_api = NewsAnalyzer(api_key="your_api_key_here")
|
185 |
+
|
186 |
+
# Analyze recent articles
|
187 |
+
results = news_api.analyze(
|
188 |
+
source="prothom_alo", # Options: prothom_alo, ittefaq, bangla_tribune, bbc_bangla
|
189 |
+
category="politics", # Options: politics, business, sports, entertainment, tech
|
190 |
+
date_range="last_7_days", # Options: today, last_24h, last_7_days, last_30_days, custom
|
191 |
+
sample_size=100 # Number of articles to analyze
|
192 |
+
)
|
193 |
+
|
194 |
+
# Get sentiment breakdown
|
195 |
+
sentiment_df = pd.DataFrame(results.sentiment_data)
|
196 |
+
|
197 |
+
# Plot results
|
198 |
+
plt.figure(figsize=(10, 6))
|
199 |
+
plt.bar(sentiment_df['sentiment'], sentiment_df['percentage'])
|
200 |
+
plt.title('Political News Sentiment Analysis')
|
201 |
+
plt.xlabel('Sentiment')
|
202 |
+
plt.ylabel('Percentage (%)')
|
203 |
+
plt.savefig('output/sentiment_analysis.png')
|
204 |
+
```
|
205 |
+
|
206 |
+
### **Enterprise Document Processing**
|
207 |
+
```python
|
208 |
+
from shobdhonic import DocumentProcessor
|
209 |
+
from shobdhonic.security import SensitiveDataDetector
|
210 |
+
|
211 |
+
# Initialize document processor
|
212 |
+
doc_api = DocumentProcessor(api_key="your_api_key_here")
|
213 |
+
|
214 |
+
# Process legal document
|
215 |
+
processed_doc = doc_api.process(
|
216 |
+
file_path="contracts/agreement.pdf",
|
217 |
+
tasks=[
|
218 |
+
"summarize", # Create executive summary
|
219 |
+
"extract_entities", # Find people, organizations, dates
|
220 |
+
"identify_clauses", # Detect important legal clauses
|
221 |
+
"risk_assessment" # Flag potentially problematic terms
|
222 |
+
],
|
223 |
+
output_format="json"
|
224 |
+
)
|
225 |
+
|
226 |
+
# Check for sensitive information
|
227 |
+
sensitive_detector = SensitiveDataDetector()
|
228 |
+
security_scan = sensitive_detector.scan(processed_doc.raw_text)
|
229 |
+
|
230 |
+
if security_scan.has_sensitive_data:
|
231 |
+
print(f"WARNING: Found {len(security_scan.findings)} instances of sensitive data")
|
232 |
+
for finding in security_scan.findings:
|
233 |
+
print(f"- {finding.type}: {finding.severity} risk level")
|
234 |
+
|
235 |
+
# Export processed results
|
236 |
+
processed_doc.export(
|
237 |
+
output_path="output/processed_contract.json",
|
238 |
+
include_metadata=True,
|
239 |
+
redact_sensitive=True
|
240 |
+
)
|
241 |
```
|
242 |
|
243 |
---
|
244 |
|
245 |
+
## 🔋 **Core Modules**
|
246 |
+
|
247 |
+
### **Text Processing**
|
248 |
+
- `shobdhonic.tokenizer`: Advanced Bangla tokenization
|
249 |
+
- `shobdhonic.transformer`: Pre-trained transformer models
|
250 |
+
- `shobdhonic.nlp`: Natural language processing utilities
|
251 |
+
- `shobdhonic.generator`: Text generation capabilities
|
252 |
+
- `shobdhonic.translator`: Cross-language translation services
|
253 |
+
|
254 |
+
### **Audio & Speech**
|
255 |
+
- `shobdhonic.voice`: Text-to-speech and speech-to-text
|
256 |
+
- `shobdhonic.audio`: Audio processing utilities
|
257 |
+
- `shobdhonic.dialect`: Regional dialect processing
|
258 |
+
|
259 |
+
### **Media & Content**
|
260 |
+
- `shobdhonic.meme`: Meme generation engine
|
261 |
+
- `shobdhonic.social`: Social media integration
|
262 |
+
- `shobdhonic.content`: Content creation assistants
|
263 |
+
- `shobdhonic.video`: Video generation and editing
|
264 |
+
|
265 |
+
### **Analysis & Intelligence**
|
266 |
+
- `shobdhonic.sentiment`: Sentiment analysis tools
|
267 |
+
- `shobdhonic.analytics`: Usage statistics and reporting
|
268 |
+
- `shobdhonic.trends`: Trend detection and prediction
|
269 |
+
|
270 |
+
### **Security & Enterprise**
|
271 |
+
- `shobdhonic.security`: Security and compliance tools
|
272 |
+
- `shobdhonic.enterprise`: Enterprise integration utilities
|
273 |
+
- `shobdhonic.docs`: Document processing pipeline
|
274 |
+
|
275 |
+
---
|
276 |
+
|
277 |
+
## 📈 **Performance Benchmarks**
|
278 |
+
|
279 |
+
| **Task** | **Shôbdhonic** | **Other Bangla NLP** | **Improvement** |
|
280 |
+
|------------------------------|-----------------|----------------------|-----------------|
|
281 |
+
| Text Classification | 94.7% | 88.2% | +6.5% |
|
282 |
+
| Named Entity Recognition | 92.3% | 85.9% | +6.4% |
|
283 |
+
| Sentiment Analysis | 89.8% | 81.3% | +8.5% |
|
284 |
+
| Question Answering | 87.6% | 79.1% | +8.5% |
|
285 |
+
| Text Generation (BLEU) | 0.731 | 0.658 | +11.1% |
|
286 |
+
| Speech Recognition (WER) | 6.4% | 11.7% | -5.3% (better) |
|
287 |
+
| Text-to-Speech (MOS) | 4.52/5 | 3.87/5 | +16.8% |
|
288 |
+
|
289 |
+
*Benchmarks conducted using standard Bangla test sets and industry metrics. Full methodology available in our [technical paper](https://shobdhonic.com/research/benchmarks).*
|
290 |
+
|
291 |
+
---
|
292 |
+
|
293 |
## 📊 **Enterprise Solutions**
|
294 |
<div align="center">
|
295 |
<a href="https://shobdhonic.com/enterprise">
|
|
|
297 |
</a>
|
298 |
</div>
|
299 |
|
300 |
+
### **Banking & Finance**
|
301 |
+
- Fraud detection in Bangla SMS/call transcripts
|
302 |
+
- Customer support automation
|
303 |
+
- Financial document processing
|
304 |
+
- Transaction pattern analysis
|
305 |
+
- Risk assessment NLP
|
306 |
+
|
307 |
+
### **Media & Publishing**
|
308 |
+
- Auto-summarize news articles from Prothom Alo/Ittefaq
|
309 |
+
- Content recommendation engines
|
310 |
+
- Automated content tagging
|
311 |
+
- Engagement prediction
|
312 |
+
- Toxic comment filtering
|
313 |
+
|
314 |
+
### **Education**
|
315 |
+
- Essay grading and feedback
|
316 |
+
- Personalized learning content
|
317 |
+
- Question generation from textbooks
|
318 |
+
- Academic plagiarism detection
|
319 |
+
- Educational chatbots in Bangla
|
320 |
+
|
321 |
+
### **Government & NGOs**
|
322 |
+
- Citizen feedback analysis
|
323 |
+
- Service request categorization
|
324 |
+
- Policy document processing
|
325 |
+
- Public sentiment monitoring
|
326 |
+
- Disinformation detection
|
327 |
+
|
328 |
+
---
|
329 |
+
|
330 |
+
## 💻 **API Integration**
|
331 |
+
|
332 |
+
### **REST API Example**
|
333 |
+
```javascript
|
334 |
+
// Using fetch in JavaScript
|
335 |
+
const fetchMeme = async () => {
|
336 |
+
const response = await fetch('https://api.shobdhonic.com/v1/create-meme', {
|
337 |
+
method: 'POST',
|
338 |
+
headers: {
|
339 |
+
'Content-Type': 'application/json',
|
340 |
+
'Authorization': 'Bearer YOUR_API_KEY'
|
341 |
+
},
|
342 |
+
body: JSON.stringify({
|
343 |
+
text: 'পরীক্ষার রেজাল্ট দেখার পর আমি',
|
344 |
+
template: 'sad_pepe',
|
345 |
+
format: 'jpg'
|
346 |
+
})
|
347 |
+
});
|
348 |
+
|
349 |
+
const data = await response.json();
|
350 |
+
return data.meme_url;
|
351 |
+
};
|
352 |
+
|
353 |
+
// Call the function
|
354 |
+
fetchMeme().then(url => {
|
355 |
+
document.getElementById('meme-image').src = url;
|
356 |
+
});
|
357 |
+
```
|
358 |
+
|
359 |
+
### **Python SDK Example**
|
360 |
+
```python
|
361 |
+
from shobdhonic import ShobdhonicClient
|
362 |
+
import asyncio
|
363 |
+
|
364 |
+
async def main():
|
365 |
+
# Initialize client
|
366 |
+
client = ShobdhonicClient(api_key="YOUR_API_KEY")
|
367 |
+
|
368 |
+
# Use the sentiment analysis API
|
369 |
+
result = await client.analyze_sentiment(
|
370 |
+
text="এই সিনেমাটা দেখে আমি খুবই মুগ্ধ হয়েছি।",
|
371 |
+
detailed=True
|
372 |
+
)
|
373 |
+
|
374 |
+
print(f"Overall sentiment: {result.sentiment}")
|
375 |
+
print(f"Confidence score: {result.confidence:.2f}")
|
376 |
+
print(f"Emotional breakdown: {result.emotions}")
|
377 |
+
|
378 |
+
# Use the translation API
|
379 |
+
translation = await client.translate(
|
380 |
+
text="আমি বাংলায় কথা বলতে পারি।",
|
381 |
+
target_language="en"
|
382 |
+
)
|
383 |
+
|
384 |
+
print(f"Translation: {translation.text}")
|
385 |
+
print(f"Source language detected: {translation.source_language}")
|
386 |
+
|
387 |
+
# Run the async function
|
388 |
+
asyncio.run(main())
|
389 |
+
```
|
390 |
+
|
391 |
+
### **Webhook Integration**
|
392 |
+
```python
|
393 |
+
from flask import Flask, request, jsonify
|
394 |
+
import hmac
|
395 |
+
import hashlib
|
396 |
+
|
397 |
+
app = Flask(__name__)
|
398 |
+
|
399 |
+
@app.route('/webhook/shobdhonic', methods=['POST'])
|
400 |
+
def shobdhonic_webhook():
|
401 |
+
# Verify the webhook signature
|
402 |
+
signature = request.headers.get('X-Shobdhonic-Signature')
|
403 |
+
secret = 'your_webhook_secret'
|
404 |
+
|
405 |
+
computed_signature = hmac.new(
|
406 |
+
secret.encode('utf-8'),
|
407 |
+
request.data,
|
408 |
+
hashlib.sha256
|
409 |
+
).hexdigest()
|
410 |
+
|
411 |
+
if not hmac.compare_digest(signature, computed_signature):
|
412 |
+
return jsonify({'error': 'Invalid signature'}), 401
|
413 |
+
|
414 |
+
# Process the webhook data
|
415 |
+
data = request.json
|
416 |
+
event_type = data.get('event_type')
|
417 |
+
|
418 |
+
if event_type == 'sentiment_alert':
|
419 |
+
handle_sentiment_alert(data)
|
420 |
+
elif event_type == 'content_moderation':
|
421 |
+
handle_content_moderation(data)
|
422 |
+
elif event_type == 'trend_detected':
|
423 |
+
handle_trend_detection(data)
|
424 |
+
|
425 |
+
return jsonify({'status': 'success'}), 200
|
426 |
+
|
427 |
+
def handle_sentiment_alert(data):
|
428 |
+
# Process sentiment alerts
|
429 |
+
pass
|
430 |
+
|
431 |
+
def handle_content_moderation(data):
|
432 |
+
# Process content moderation events
|
433 |
+
pass
|
434 |
+
|
435 |
+
def handle_trend_detection(data):
|
436 |
+
# Process trend detection events
|
437 |
+
pass
|
438 |
+
|
439 |
+
if __name__ == '__main__':
|
440 |
+
app.run(debug=True, port=5000)
|
441 |
+
```
|
442 |
+
|
443 |
+
---
|
444 |
+
|
445 |
+
## 🧩 **Project Structure**
|
446 |
+
```
|
447 |
+
shobdhonic/
|
448 |
+
├── api/ # API endpoints
|
449 |
+
├── cli/ # Command-line tools
|
450 |
+
├── core/ # Core functionality
|
451 |
+
│ ├── models/ # ML models
|
452 |
+
│ ├── processors/ # Text processors
|
453 |
+
│ ├── tokenizers/ # Bangla tokenizers
|
454 |
+
│ └── vectors/ # Word embeddings
|
455 |
+
├── data/ # Data handling
|
456 |
+
│ ├── corpus/ # Text corpora
|
457 |
+
│ ├── loaders/ # Data loaders
|
458 |
+
│ └── scrapers/ # Web scrapers
|
459 |
+
├── media/ # Media generation
|
460 |
+
│ ├── audio/ # Audio processing
|
461 |
+
│ ├── images/ # Image generation
|
462 |
+
│ └── video/ # Video processing
|
463 |
+
├── security/ # Security tools
|
464 |
+
├── services/ # External services
|
465 |
+
├── ui/ # User interfaces
|
466 |
+
│ ├── web/ # Web interface
|
467 |
+
│ ├── mobile/ # Mobile interface
|
468 |
+
│ └── widgets/ # Embeddable widgets
|
469 |
+
├── utils/ # Utility functions
|
470 |
+
└── tests/ # Test suite
|
471 |
+
```
|
472 |
+
|
473 |
+
---
|
474 |
+
|
475 |
+
## 🛠️ **Development Workflow**
|
476 |
+
|
477 |
+
### **Setting Up Development Environment**
|
478 |
+
```bash
|
479 |
+
# Clone the development repository
|
480 |
+
git clone https://github.com/Shobdhonic/shobdhonic-dev.git
|
481 |
+
cd shobdhonic-dev
|
482 |
+
|
483 |
+
# Create development environment
|
484 |
+
python -m venv dev-env
|
485 |
+
source dev-env/bin/activate
|
486 |
+
|
487 |
+
# Install development dependencies
|
488 |
+
pip install -r requirements-dev.txt
|
489 |
+
|
490 |
+
# Set up pre-commit hooks
|
491 |
+
pre-commit install
|
492 |
+
```
|
493 |
+
|
494 |
+
### **Running Tests**
|
495 |
+
```bash
|
496 |
+
# Run all tests
|
497 |
+
pytest
|
498 |
+
|
499 |
+
# Run specific test category
|
500 |
+
pytest tests/test_tokenizers.py
|
501 |
+
|
502 |
+
# Run with coverage report
|
503 |
+
pytest --cov=shobdhonic --cov-report=html
|
504 |
+
```
|
505 |
+
|
506 |
+
### **Building Documentation**
|
507 |
+
```bash
|
508 |
+
# Generate API documentation
|
509 |
+
cd docs
|
510 |
+
make html
|
511 |
+
|
512 |
+
# View documentation
|
513 |
+
python -m http.server -d _build/html
|
514 |
+
```
|
515 |
+
|
516 |
+
### **CI/CD Pipeline**
|
517 |
+
Our continuous integration and deployment pipeline automatically:
|
518 |
+
1. Runs tests on all pull requests
|
519 |
+
2. Performs code quality checks
|
520 |
+
3. Builds and publishes packages on releases
|
521 |
+
4. Deploys to staging/production environments
|
522 |
+
5. Updates documentation site
|
523 |
|
524 |
---
|
525 |
|
526 |
## 🤝 **Contribute to Bangla AI**
|
527 |
+
We welcome contributions from the community! Here's how to get started:
|
528 |
+
|
529 |
+
1. **Fork the Repository**: [GitHub/Shobdhonic](https://github.com/Shobdhonic)
|
530 |
+
2. **Pick an Issue**: Look for issues labeled `good-first-issue`, `help-wanted`, or `Gen-Z feature`
|
531 |
+
3. **Set Up Your Environment**: Follow the development setup instructions above
|
532 |
+
4. **Make Your Changes**: Write code and tests for your feature or fix
|
533 |
+
5. **Submit a Pull Request**: Follow our [Contribution Guidelines](CONTRIBUTING.md)
|
534 |
+
|
535 |
+
### **Areas We Need Help With**
|
536 |
+
- 🧠 **Model Training**: Fine-tuning transformers on Bangla data
|
537 |
+
- 🎮 **Gen-Z Features**: Cultural memes, slang translators, social integrations
|
538 |
+
- 📱 **Mobile Development**: React Native components for our SDK
|
539 |
+
- 🔊 **Voice Data**: Collection and processing of regional dialects
|
540 |
+
- 📚 **Documentation**: Tutorials, examples, and API documentation
|
541 |
+
|
542 |
+
### **Contributor Code of Conduct**
|
543 |
+
All contributors are expected to adhere to our [Code of Conduct](CODE_OF_CONDUCT.md) which promotes a welcoming, inclusive, and harassment-free experience for everyone.
|
544 |
+
|
545 |
+
---
|
546 |
+
|
547 |
+
## 📒 **Documentation**
|
548 |
+
|
549 |
+
### **API Reference**
|
550 |
+
Complete API documentation is available at [docs.shobdhonic.com](https://docs.shobdhonic.com)
|
551 |
+
|
552 |
+
### **Tutorials**
|
553 |
+
Step-by-step tutorials for common tasks:
|
554 |
+
- [Getting Started with Shôbdhonic](https://docs.shobdhonic.com/tutorials/getting-started)
|
555 |
+
- [Building a Bangla Chatbot](https://docs.shobdhonic.com/tutorials/chatbot)
|
556 |
+
- [Voice Cloning Basics](https://docs.shobdhonic.com/tutorials/voice-cloning)
|
557 |
+
- [Meme Generation](https://docs.shobdhonic.com/tutorials/meme-gen)
|
558 |
+
- [Enterprise Document Processing](https://docs.shobdhonic.com/tutorials/document-processing)
|
559 |
+
|
560 |
+
### **Examples**
|
561 |
+
Explore our [examples directory](https://github.com/Shobdhonic/examples) for complete code samples:
|
562 |
+
- Basic NLP tasks (tokenization, classification, etc.)
|
563 |
+
- Voice synthesis and analysis
|
564 |
+
- Media generation workflows
|
565 |
+
- Enterprise integration patterns
|
566 |
+
- Web and mobile application samples
|
567 |
|
568 |
---
|
569 |
|
570 |
## 📜 **License & Ethics**
|
571 |
```text
|
572 |
MIT License | © 2024 Shôbdhonic
|
573 |
+
|
574 |
*Bangla Data Ethics Pledge:*
|
575 |
- No misuse of dialects/regional languages
|
576 |
- Cite sources like Ittefaq/Prothom Alo
|
577 |
+
- Free access for academic research and non-profits/NGOs
|
578 |
+
- Respecting privacy and data sovereignty
|
579 |
+
- Preserving Bangla linguistic diversity
|
580 |
```
|
581 |
|
582 |
+
### **Ethical AI Commitment**
|
583 |
+
At Shôbdhonic, we commit to:
|
584 |
+
- Transparency in our AI systems
|
585 |
+
- Fairness and bias mitigation
|
586 |
+
- Protection of user privacy
|
587 |
+
- Responsible data collection practices
|
588 |
+
- Supporting cultural preservation
|
589 |
+
- Making advanced Bangla NLP accessible to all
|
590 |
+
|
591 |
+
Our complete AI Ethics Policy is available [here](https://shobdhonic.com/ethics).
|
592 |
+
|
593 |
+
---
|
594 |
+
|
595 |
+
## 🧪 **Research**
|
596 |
+
Our team publishes open research on Bangla NLP:
|
597 |
+
|
598 |
+
- [BanglaTransformers: Pre-training Transformers for Bengali NLP](https://arxiv.org/abs/xxxx.xxxxx)
|
599 |
+
- [Dialect-Aware Speech Synthesis for Low-Resource Languages](https://arxiv.org/abs/xxxx.xxxxx)
|
600 |
+
- [BanglaEval: Benchmarking NLP Systems for Bengali](https://arxiv.org/abs/xxxx.xxxxx)
|
601 |
+
|
602 |
+
Interested in research collaboration? Contact us at [email protected]
|
603 |
+
|
604 |
---
|
605 |
|
606 |
## 🌐 **Connect**
|
|
|
609 |
[](https://huggingface.co/Shobdhonic)
|
610 |
[](https://youtube.com/Shobdhonic)
|
611 |
[](https://linkedin.com/company/Shobdhonic)
|
612 |
+
[](https://medium.com/Shobdhonic)
|
613 |
+
[](https://discord.gg/shobdhonic)
|
614 |
|
615 |
</div>
|
616 |
|