Spaces:
Running
Running
Upload 14 files
Browse files- .streamlit/config.toml +13 -0
- DEPLOYMENT_GUIDE.md +139 -0
- README.md +188 -13
- app.py +683 -0
- data/enterprise_ontology.json +771 -0
- data/enterprise_ontology.txt +10 -0
- huggingface.yml +8 -0
- requirements.txt +14 -0
- src/__init__.py +1 -0
- src/knowledge_graph.py +920 -0
- src/ontology_manager.py +440 -0
- src/semantic_retriever.py +233 -0
- src/visualization.py +1564 -0
- static/css/styles.css +83 -0
.streamlit/config.toml
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[server]
|
2 |
+
headless = true
|
3 |
+
enableCORS = false
|
4 |
+
|
5 |
+
[browser]
|
6 |
+
gatherUsageStats = false
|
7 |
+
|
8 |
+
[theme]
|
9 |
+
primaryColor = "#4B6BFF"
|
10 |
+
backgroundColor = "#FAFAFA"
|
11 |
+
secondaryBackgroundColor = "#F0F2F6"
|
12 |
+
textColor = "#262730"
|
13 |
+
font = "sans serif"
|
DEPLOYMENT_GUIDE.md
ADDED
@@ -0,0 +1,139 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Deployment Guide for Ontology-Enhanced RAG System
|
2 |
+
|
3 |
+
This guide will help you deploy the Ontology-Enhanced RAG demonstration to Hugging Face Spaces.
|
4 |
+
|
5 |
+
## Prerequisites
|
6 |
+
|
7 |
+
1. **Hugging Face Account**: You need a Hugging Face account.
|
8 |
+
2. **OpenAI API Key**: You need a valid OpenAI API key.
|
9 |
+
|
10 |
+
## Deployment Steps
|
11 |
+
|
12 |
+
### 1. Prepare Your Repository
|
13 |
+
|
14 |
+
Ensure your repository contains the following files and directories:
|
15 |
+
|
16 |
+
- `app.py`: Main Streamlit application
|
17 |
+
- `src/`: Directory containing all source code
|
18 |
+
- `data/`: Directory containing the ontology JSON and other data
|
19 |
+
- `.streamlit/`: Directory containing Streamlit configuration
|
20 |
+
- `static/`: Directory containing CSS and other static assets
|
21 |
+
- `requirements.txt`: List of all dependencies
|
22 |
+
- `huggingface.yml`: Hugging Face Space configuration
|
23 |
+
|
24 |
+
### 2. Set Up Hugging Face Space
|
25 |
+
|
26 |
+
1. Visit [Hugging Face](https://huggingface.co/) and log in
|
27 |
+
2. Click "New" → "Space" in the top right corner
|
28 |
+
3. Fill in the Space settings:
|
29 |
+
- **Owner**: Select your username or organization
|
30 |
+
- **Space name**: Choose a name for your demo, e.g., "ontology-rag-demo"
|
31 |
+
- **License**: Choose MIT or your preferred license
|
32 |
+
- **SDK**: Select Streamlit
|
33 |
+
- **Space hardware**: Choose according to your needs (minimum requirement: CPU + 4GB RAM)
|
34 |
+
|
35 |
+
4. Click "Create Space"
|
36 |
+
|
37 |
+
### 3. Configure Space Secrets
|
38 |
+
|
39 |
+
You need to add your OpenAI API key as a Secret:
|
40 |
+
|
41 |
+
1. In your Space page, go to the "Settings" tab
|
42 |
+
2. Scroll down to the "Repository secrets" section
|
43 |
+
3. Click "New secret"
|
44 |
+
4. Add the following secret:
|
45 |
+
- **Name**: `OPENAI_API_KEY`
|
46 |
+
- **Value**: Your OpenAI API key
|
47 |
+
5. Click "Add secret"
|
48 |
+
|
49 |
+
### 4. Upload Your Code
|
50 |
+
|
51 |
+
There are two ways to upload your code:
|
52 |
+
|
53 |
+
#### Option A: Upload via Web Interface
|
54 |
+
|
55 |
+
1. In your Space page, go to the "Files" tab
|
56 |
+
2. Use the upload button to upload all necessary files and directories
|
57 |
+
3. Ensure you maintain the correct directory structure
|
58 |
+
|
59 |
+
#### Option B: Upload via Git (Recommended)
|
60 |
+
|
61 |
+
1. Clone your Space repository:
|
62 |
+
```bash
|
63 |
+
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
|
64 |
+
```
|
65 |
+
|
66 |
+
2. Copy all your files into the cloned repository
|
67 |
+
3. Add, commit, and push the changes:
|
68 |
+
```bash
|
69 |
+
git add .
|
70 |
+
git commit -m "Initial commit"
|
71 |
+
git push
|
72 |
+
```
|
73 |
+
|
74 |
+
### 5. Verify Deployment
|
75 |
+
|
76 |
+
1. Visit your Space URL (in the format `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`)
|
77 |
+
2. Confirm that the application loads and runs correctly
|
78 |
+
3. Test all features
|
79 |
+
|
80 |
+
## Hardware Recommendations
|
81 |
+
|
82 |
+
For optimal performance, consider the following hardware configurations:
|
83 |
+
|
84 |
+
- **Minimal**: CPU + 4GB RAM (suitable for demos with limited users)
|
85 |
+
- **Recommended**: CPU + 16GB RAM (for better performance with knowledge graph visualizations)
|
86 |
+
|
87 |
+
## Troubleshooting
|
88 |
+
|
89 |
+
If you encounter issues:
|
90 |
+
|
91 |
+
1. **Application fails to start**:
|
92 |
+
- Check if the Streamlit version is compatible
|
93 |
+
- Verify all dependencies are correctly installed
|
94 |
+
- Check the Space logs for error messages
|
95 |
+
|
96 |
+
2. **OpenAI API errors**:
|
97 |
+
- Confirm the API key is correctly set as a Secret
|
98 |
+
- Verify the API key is valid and has sufficient quota
|
99 |
+
|
100 |
+
3. **Display issues**:
|
101 |
+
- Try simplifying visualizations, as they might be memory-intensive
|
102 |
+
- Check logs for any warnings or errors
|
103 |
+
|
104 |
+
4. **NetworkX or Visualization Issues**:
|
105 |
+
- Ensure pygraphviz is properly installed
|
106 |
+
- For simpler deployment, you can modify the code to use alternative layout algorithms that don't depend on Graphviz
|
107 |
+
|
108 |
+
## Deployment Optimizations
|
109 |
+
|
110 |
+
For production deployments, consider these optimizations:
|
111 |
+
|
112 |
+
1. **Resource Management**:
|
113 |
+
- Choose appropriate hardware (CPU+RAM) to meet your application's needs
|
114 |
+
- Consider optimizing large visualizations to reduce memory usage
|
115 |
+
|
116 |
+
2. **Performance**:
|
117 |
+
- Implement result caching for common queries
|
118 |
+
- Consider pre-computing common graph layouts
|
119 |
+
|
120 |
+
3. **Security**:
|
121 |
+
- Ensure no sensitive data is stored in the codebase
|
122 |
+
- Store all credentials using environment variables or Secrets
|
123 |
+
|
124 |
+
## Memory Optimization Tips
|
125 |
+
|
126 |
+
If you encounter memory issues with large ontologies:
|
127 |
+
|
128 |
+
1. Limit the maximum number of nodes in visualization
|
129 |
+
2. Implement pagination for large result sets
|
130 |
+
3. Use streaming responses for large text outputs
|
131 |
+
4. Optimize NetworkX operations for large graphs
|
132 |
+
|
133 |
+
## Additional Resources
|
134 |
+
|
135 |
+
- [Streamlit Deployment Documentation](https://docs.streamlit.io/streamlit-community-cloud/get-started)
|
136 |
+
- [Hugging Face Spaces Documentation](https://huggingface.co/docs/hub/spaces)
|
137 |
+
- [OpenAI API Documentation](https://platform.openai.com/docs/api-reference)
|
138 |
+
- [NetworkX Documentation](https://networkx.org/documentation/stable/)
|
139 |
+
- [FAISS Documentation](https://github.com/facebookresearch/faiss/wiki)
|
README.md
CHANGED
@@ -1,13 +1,188 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Enhanced Ontology-RAG System
|
2 |
+
ontology-rag/
|
3 |
+
├── .streamlit/
|
4 |
+
│ └── config.toml # Streamlit configuration
|
5 |
+
├── data/
|
6 |
+
│ └── enterprise_ontology.json # Enterprise ontology data
|
7 |
+
│ └── enterprise_ontology.txt # Simplified text representation of enterprise ontology
|
8 |
+
├── src/
|
9 |
+
│ ├── __init__.py
|
10 |
+
│ ├── knowledge_graph.py # Knowledge graph processing
|
11 |
+
│ ├── ontology_manager.py # Ontology management
|
12 |
+
│ ├── semantic_retriever.py # Semantic retrieval
|
13 |
+
│ └── visualization.py # Visualization function
|
14 |
+
├── static/
|
15 |
+
│ └── css/
|
16 |
+
│ └── styles.css # Custom styles
|
17 |
+
├── app.py # Main application
|
18 |
+
├── requirements.txt # Dependency list
|
19 |
+
├── README.md # Project descriptio
|
20 |
+
└── huggingface.yml # Hugging Face Space configuration
|
21 |
+
|
22 |
+
|
23 |
+
## Project Overview
|
24 |
+
|
25 |
+
This repository contains an advanced Retrieval-Augmented Generation (RAG) system that integrates structured ontologies with language models. The system demonstrates how formal ontological knowledge representation can enhance traditional vector-based retrieval methods to provide more accurate, contextually rich, and logically consistent answers to user queries.
|
26 |
+
|
27 |
+
The project implements a sophisticated architecture that combines:
|
28 |
+
|
29 |
+
- JSON-based ontology representation with classes, relationships, rules, and instances
|
30 |
+
- Knowledge graph visualization for exploring entity relationships
|
31 |
+
- Semantic path finding for multi-hop reasoning between concepts
|
32 |
+
- Comparative analysis between traditional vector-based RAG and ontology-enhanced RAG
|
33 |
+
|
34 |
+
The application is built with **Streamlit** for the frontend interface, uses **FAISS** for vector embeddings, **NetworkX** for graph representation, and integrates with **OpenAI's language models** for generating responses.
|
35 |
+
|
36 |
+
## Key Features
|
37 |
+
|
38 |
+
1. **RAG Comparison Demo**
|
39 |
+
- Side-by-side comparison of traditional and ontology-enhanced RAG
|
40 |
+
- Analysis of differences in answers and retrieved context
|
41 |
+
|
42 |
+
2. **Knowledge Graph Visualization**
|
43 |
+
- Interactive network graph for exploring the ontology structure
|
44 |
+
- Multiple layout algorithms (force-directed, hierarchical, radial, circular)
|
45 |
+
- Entity relationship exploration with customizable focus
|
46 |
+
|
47 |
+
3. **Ontology Structure Analysis**
|
48 |
+
- Visualization of class hierarchies and statistics
|
49 |
+
- Relationship usage and domain-range distribution analysis
|
50 |
+
- Graph statistics including node counts, edge counts, and centrality metrics
|
51 |
+
|
52 |
+
4. **Entity Exploration**
|
53 |
+
- Detailed entity information cards showing properties and relationships
|
54 |
+
- Relationship graphs centered on specific entities
|
55 |
+
- Neighborhood exploration for entities
|
56 |
+
|
57 |
+
5. **Semantic Path Visualization**
|
58 |
+
- Path visualization between entities with step-by-step explanation
|
59 |
+
- Visual representation of paths through the knowledge graph
|
60 |
+
- Connection to relevant business rules
|
61 |
+
|
62 |
+
6. **Reasoning Trace Visualization**
|
63 |
+
- Query analysis with entity and relationship detection
|
64 |
+
- Sankey diagrams showing information flow in the RAG process
|
65 |
+
- Explanation of reasoning steps
|
66 |
+
|
67 |
+
## Ontology Structure Example
|
68 |
+
|
69 |
+
The `data/enterprise_ontology.json` file contains a rich enterprise ontology that models organizational knowledge. Here's a breakdown of its key components:
|
70 |
+
|
71 |
+
### Classes (Entity Types)
|
72 |
+
|
73 |
+
The ontology defines a hierarchical class structure with inheritance relationships. For example:
|
74 |
+
|
75 |
+
- **Entity** (base class)
|
76 |
+
- **FinancialEntity** → Budget, Revenue, Expense
|
77 |
+
- **Asset** → PhysicalAsset, DigitalAsset, IntellectualProperty
|
78 |
+
- **Person** → InternalPerson → Employee → Manager
|
79 |
+
- **Process** → BusinessProcess, DevelopmentProcess, SupportProcess
|
80 |
+
- **Market** → GeographicMarket, DemographicMarket, BusinessMarket
|
81 |
+
|
82 |
+
Each class has a description and a set of defined properties. For instance, the `Employee` class includes properties like role, hire date, and performance rating.
|
83 |
+
|
84 |
+
### Relationships
|
85 |
+
|
86 |
+
The ontology defines explicit relationships between entity types, including:
|
87 |
+
|
88 |
+
- `ownedBy`: Connects Product to Department
|
89 |
+
- `managedBy`: Connects Department to Manager
|
90 |
+
- `worksOn`: Connects Employee to Product
|
91 |
+
- `purchases`: Connects Customer to Product
|
92 |
+
- `provides`: Connects Customer to Feedback
|
93 |
+
- `optimizedBy`: Relates Product to Feedback
|
94 |
+
|
95 |
+
Each relationship has metadata such as domain, range, cardinality, and inverse relationship name.
|
96 |
+
|
97 |
+
### Business Rules
|
98 |
+
|
99 |
+
The ontology contains formal business rules that constrain the knowledge model:
|
100 |
+
|
101 |
+
- "Every Product must be owned by exactly one Department"
|
102 |
+
- "Every Department must be managed by exactly one Manager"
|
103 |
+
- "Critical support tickets must be assigned to Senior employees or managers"
|
104 |
+
- "Product Lifecycle stages must follow a predefined sequence"
|
105 |
+
|
106 |
+
### Instances
|
107 |
+
|
108 |
+
The ontology includes concrete instances of the defined classes, such as:
|
109 |
+
|
110 |
+
- `product1`: An "Enterprise Analytics Suite" owned by the Engineering department
|
111 |
+
- `manager1`: A director named "Jane Smith" who manages the Engineering department
|
112 |
+
- `customer1`: "Acme Corp" who has purchased product1 and provided feedback
|
113 |
+
|
114 |
+
Each instance has properties and relationships to other instances, forming a connected knowledge graph.
|
115 |
+
|
116 |
+
This structured knowledge representation allows the system to perform semantic reasoning beyond what would be possible with simple text-based approaches, enabling it to answer complex queries that require understanding of hierarchical relationships, business rules, and multi-step connections between entities.
|
117 |
+
|
118 |
+
## Getting Started
|
119 |
+
|
120 |
+
### Prerequisites
|
121 |
+
|
122 |
+
- Python 3.8+
|
123 |
+
- OpenAI API key
|
124 |
+
|
125 |
+
### Installation
|
126 |
+
|
127 |
+
1. Clone this repository
|
128 |
+
2. Install the required dependencies:
|
129 |
+
```
|
130 |
+
pip install -r requirements.txt
|
131 |
+
```
|
132 |
+
3. Set up your OpenAI API key as an environment variable or in the Streamlit secrets
|
133 |
+
|
134 |
+
### Running the Application
|
135 |
+
|
136 |
+
To run the application locally:
|
137 |
+
|
138 |
+
```
|
139 |
+
streamlit run app.py
|
140 |
+
```
|
141 |
+
|
142 |
+
For deployment instructions, please refer to the [DEPLOYMENT_GUIDE.md](DEPLOYMENT_GUIDE.md).
|
143 |
+
|
144 |
+
## Project Structure
|
145 |
+
|
146 |
+
```
|
147 |
+
ontology-rag/
|
148 |
+
├── .streamlit/
|
149 |
+
│ └── config.toml # Streamlit configuration
|
150 |
+
├── data/
|
151 |
+
│ └── enterprise_ontology.json # Enterprise ontology data
|
152 |
+
│ └── enterprise_ontology.txt # Simplified text representation of ontology
|
153 |
+
├── src/
|
154 |
+
│ ├── __init__.py
|
155 |
+
│ ├── knowledge_graph.py # Knowledge graph processing
|
156 |
+
│ ├── ontology_manager.py # Ontology management
|
157 |
+
│ ├── semantic_retriever.py # Semantic retrieval
|
158 |
+
│ └── visualization.py # Visualization functions
|
159 |
+
├── static/
|
160 |
+
│ └── css/
|
161 |
+
│ └── styles.css # Custom styles
|
162 |
+
├── app.py # Main application
|
163 |
+
├── requirements.txt # Dependencies list
|
164 |
+
├── DEPLOYMENT_GUIDE.md # Deployment instructions
|
165 |
+
└── README.md # This file
|
166 |
+
```
|
167 |
+
|
168 |
+
## Use Cases
|
169 |
+
|
170 |
+
### Enterprise Knowledge Management
|
171 |
+
The ontology-enhanced RAG system can help organizations effectively organize and access their knowledge assets, connecting information across different departments and systems to provide more comprehensive business insights.
|
172 |
+
|
173 |
+
### Product Development Decision Support
|
174 |
+
By understanding the relationships between customer feedback, product features, and market data, the system can provide more valuable support for product development decisions.
|
175 |
+
|
176 |
+
### Complex Compliance Queries
|
177 |
+
In compliance scenarios where multiple rules and relationships need to be considered, the ontology-enhanced RAG can provide rule-based reasoning to ensure recommendations comply with all applicable policies and regulations.
|
178 |
+
|
179 |
+
### Diagnostics and Troubleshooting
|
180 |
+
In technical support and troubleshooting scenarios, the system can connect symptoms, causes, and solutions through multi-hop reasoning to provide more accurate diagnoses.
|
181 |
+
|
182 |
+
## Acknowledgments
|
183 |
+
|
184 |
+
This project demonstrates the integration of ontological knowledge with RAG systems for enhanced query answering capabilities. It builds upon research in knowledge graphs, semantic web technologies, and large language models.
|
185 |
+
|
186 |
+
## License
|
187 |
+
|
188 |
+
This project is licensed under the MIT License - see the license file for details.
|
app.py
ADDED
@@ -0,0 +1,683 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
st.set_page_config(page_title="Ontology RAG Demo", layout="wide")
|
3 |
+
|
4 |
+
import os
|
5 |
+
from src.semantic_retriever import SemanticRetriever
|
6 |
+
from src.ontology_manager import OntologyManager
|
7 |
+
from src.knowledge_graph import KnowledgeGraph
|
8 |
+
from src.visualization import (display_ontology_stats, display_entity_details,
|
9 |
+
display_graph_visualization, visualize_path,
|
10 |
+
display_reasoning_trace, render_html_in_streamlit)
|
11 |
+
import networkx as nx
|
12 |
+
from openai import OpenAI
|
13 |
+
import json
|
14 |
+
|
15 |
+
# Setup
|
16 |
+
llm = OpenAI(api_key=st.secrets["OPENAI_API_KEY"])
|
17 |
+
ontology_manager = OntologyManager("data/enterprise_ontology.json")
|
18 |
+
semantic_retriever = SemanticRetriever(ontology_manager=ontology_manager)
|
19 |
+
knowledge_graph = KnowledgeGraph(ontology_manager=ontology_manager)
|
20 |
+
k_val = st.sidebar.slider("Top K Results", 1, 10, 3)
|
21 |
+
|
22 |
+
def main():
|
23 |
+
# Page Navigation
|
24 |
+
st.sidebar.title("Page Navigation")
|
25 |
+
page = st.sidebar.selectbox(
|
26 |
+
"Select function",
|
27 |
+
["RAG comparison demonstration", "Knowledge graph visualization", "Ontology structure analysis", "Entity exploration", "Semantic path visualization", "Inference tracking", "Detailed comparative analysis"]
|
28 |
+
)
|
29 |
+
|
30 |
+
if page == "RAG Comparison Demo":
|
31 |
+
run_rag_demo()
|
32 |
+
elif page == "Knowledge Graph Visualization":
|
33 |
+
run_knowledge_graph_visualization()
|
34 |
+
elif page == "Ontology Structure Analysis":
|
35 |
+
run_ontology_structure_analysis()
|
36 |
+
elif page == "Entity Exploration":
|
37 |
+
run_entity_exploration()
|
38 |
+
elif page == "Semantic Path Visualization":
|
39 |
+
run_semantic_path_visualization()
|
40 |
+
elif page == "Inference Tracking":
|
41 |
+
run_reasoning_trace()
|
42 |
+
elif page == "Detailed comparative analysis":
|
43 |
+
run_detailed_comparison()
|
44 |
+
|
45 |
+
def run_rag_demo():
|
46 |
+
st.title("Ontology Enhanced RAG Demonstration")
|
47 |
+
|
48 |
+
query = st.text_input(
|
49 |
+
"Enter a question to compare RAG methods:",
|
50 |
+
"How does customer feedback influence product development?"
|
51 |
+
)
|
52 |
+
|
53 |
+
if query:
|
54 |
+
col1, col2 = st.columns(2)
|
55 |
+
|
56 |
+
with st.spinner("Run two RAG methods..."):
|
57 |
+
# Traditional RAG
|
58 |
+
with col1:
|
59 |
+
st.subheader("Traditional RAG")
|
60 |
+
vector_docs = semantic_retriever.vector_store.similarity_search(query, k=k_val)
|
61 |
+
vector_context = "\n\n".join([doc.page_content for doc in vector_docs])
|
62 |
+
vector_messages = [
|
63 |
+
{"role": "system", "content": f"You are an enterprise knowledge assistant...\nContext:\n{vector_context}"},
|
64 |
+
{"role": "user", "content": query}
|
65 |
+
]
|
66 |
+
vector_response = llm.chat.completions.create(
|
67 |
+
model="gpt-3.5-turbo",
|
68 |
+
messages=vector_messages
|
69 |
+
)
|
70 |
+
vector_answer = vector_response.choices[0].message.content
|
71 |
+
|
72 |
+
st.markdown("#### answer")
|
73 |
+
st.write(vector_answer)
|
74 |
+
|
75 |
+
st.markdown("#### retrieval context")
|
76 |
+
for i, doc in enumerate(vector_docs):
|
77 |
+
with st.expander(f"Source {i+1}"):
|
78 |
+
st.code(doc.page_content)
|
79 |
+
|
80 |
+
# # Ontology RAG
|
81 |
+
with col2:
|
82 |
+
st.subheader("Ontology RAG")
|
83 |
+
result = semantic_retriever.retrieve_with_paths(query, k=k_val)
|
84 |
+
retrieved_docs = result["documents"]
|
85 |
+
enhanced_context = "\n\n".join([doc.page_content for doc in retrieved_docs])
|
86 |
+
enhanced_messages = [
|
87 |
+
{"role": "system", "content": f"You are an enterprise knowledge assistant with ontology access rights...\nContext:\n{enhanced_context}"},
|
88 |
+
{"role": "user", "content": query}
|
89 |
+
]
|
90 |
+
enhanced_response = llm.chat.completions.create(
|
91 |
+
model="gpt-3.5-turbo",
|
92 |
+
messages=enhanced_messages
|
93 |
+
)
|
94 |
+
enhanced_answer = enhanced_response.choices[0].message.content
|
95 |
+
|
96 |
+
st.markdown("#### answer")
|
97 |
+
st.write(enhanced_answer)
|
98 |
+
|
99 |
+
st.markdown("#### Search context")
|
100 |
+
for i, doc in enumerate(retrieved_docs):
|
101 |
+
source = doc.metadata.get("source", "unknown")
|
102 |
+
label = {
|
103 |
+
"ontology": "Ontology context",
|
104 |
+
"text": "Text context",
|
105 |
+
"ontology_context": "Semantic context",
|
106 |
+
"semantic_path": "Relationship path"
|
107 |
+
}.get(source, f"source")
|
108 |
+
with st.expander(f"{label} {i+1}"):
|
109 |
+
st.markdown(doc.page_content)
|
110 |
+
|
111 |
+
# Store for reasoning trace visualization
|
112 |
+
st.session_state.query = query
|
113 |
+
st.session_state.retrieved_docs = retrieved_docs
|
114 |
+
st.session_state.answer = enhanced_answer
|
115 |
+
|
116 |
+
# Difference Analysis
|
117 |
+
st.markdown("---")
|
118 |
+
st.subheader("Difference Analysis")
|
119 |
+
|
120 |
+
st.markdown("""
|
121 |
+
The above comparison demonstrates several key advantages of ontology-enhanced RAG:
|
122 |
+
|
123 |
+
1. **Structure-aware**: Ontology-augmented methods understand the relationships between entities, not just their textual similarities.
|
124 |
+
|
125 |
+
2. **Multi-hop reasoning**: By using the knowledge graph structure, the enhancement method can connect information across multiple relational jumps.
|
126 |
+
|
127 |
+
3. **Context enrichment**: Ontologies provide additional context about entity types, attributes, and relationships that are not explicit in the text.
|
128 |
+
|
129 |
+
4. Reasoning ability: Structured knowledge allows for logical reasoning that vector similarity alone cannot achieve.
|
130 |
+
|
131 |
+
Try more complex queries that require understanding of relationships to see the differences more clearly!
|
132 |
+
""")
|
133 |
+
|
134 |
+
def run_knowledge_graph_visualization():
|
135 |
+
st.title("Knowledge Graph Visualization")
|
136 |
+
|
137 |
+
# Check if there is a center entity selected
|
138 |
+
central_entity = st.session_state.get('central_entity', None)
|
139 |
+
|
140 |
+
# Check if there is a center entity selected
|
141 |
+
display_graph_visualization(knowledge_graph, central_entity=central_entity, max_distance=2)
|
142 |
+
|
143 |
+
# Get and display graphical statistics
|
144 |
+
graph_stats = knowledge_graph.get_graph_statistics()
|
145 |
+
if graph_stats:
|
146 |
+
st.subheader("Graphical Statistics")
|
147 |
+
|
148 |
+
col1, col2, col3, col4 = st.columns(4)
|
149 |
+
col1.metric("Total number of nodes", graph_stats.get("node_count", 0))
|
150 |
+
col2.metric("Total number of edges", graph_stats.get("edge_count", 0))
|
151 |
+
col3.metric("total number of classes", graph_stats.get("class_count", 0))
|
152 |
+
col4.metric("Total number of instances", graph_stats.get("instance_count", 0))
|
153 |
+
|
154 |
+
# Display the central node
|
155 |
+
if "central_nodes" in graph_stats and graph_stats["central_nodes"]:
|
156 |
+
st.subheader("Central Nodes (by Betweenness Centrality)")
|
157 |
+
central_nodes = graph_stats["central_nodes"]["betweenness"]
|
158 |
+
nodes_df = []
|
159 |
+
for node_info in central_nodes:
|
160 |
+
node_id = node_info["node"]
|
161 |
+
node_data = knowledge_graph.graph.nodes.get(node_id, {})
|
162 |
+
node_type = node_data.get("type", "unknown")
|
163 |
+
if node_type == "instance":
|
164 |
+
node_class = node_data.get("class_type", "unknown")
|
165 |
+
properties = node_data.get("properties", {})
|
166 |
+
name = properties.get("name", node_id)
|
167 |
+
nodes_df.append({
|
168 |
+
"ID": node_id,
|
169 |
+
"Name": name,
|
170 |
+
"type": node_class,
|
171 |
+
"Centrality": node_info["centrality"]
|
172 |
+
})
|
173 |
+
|
174 |
+
st.table(nodes_df)
|
175 |
+
|
176 |
+
def run_ontology_structure_analysis():
|
177 |
+
st.title("Ontology Structure Analysis")
|
178 |
+
|
179 |
+
# Use the existing ontology statistics display function
|
180 |
+
display_ontology_stats(ontology_manager)
|
181 |
+
|
182 |
+
# Add additional class hierarchy visualization
|
183 |
+
st.subheader("class hierarchy")
|
184 |
+
|
185 |
+
# Get class hierarchy data
|
186 |
+
class_hierarchy = ontology_manager.get_class_hierarchy()
|
187 |
+
|
188 |
+
# Create a NetworkX graph to represent the class hierarchy
|
189 |
+
G = nx.DiGraph()
|
190 |
+
|
191 |
+
# Add nodes and edges
|
192 |
+
for parent, children in class_hierarchy.items():
|
193 |
+
if not G.has_node(parent):
|
194 |
+
G.add_node(parent)
|
195 |
+
for child in children:
|
196 |
+
G.add_node(child)
|
197 |
+
G.add_edge(parent, child)
|
198 |
+
|
199 |
+
# Check if there are enough nodes to create the visualization
|
200 |
+
if len(G.nodes) > 1:
|
201 |
+
# Generate HTML visualization using knowledge graph class
|
202 |
+
kg = KnowledgeGraph(ontology_manager)
|
203 |
+
html = kg.generate_html_visualization(
|
204 |
+
include_classes=True,
|
205 |
+
include_instances=False,
|
206 |
+
max_distance=5,
|
207 |
+
layout_algorithm="hierarchical"
|
208 |
+
)
|
209 |
+
|
210 |
+
# Rendering HTML
|
211 |
+
render_html_in_streamlit(html)
|
212 |
+
|
213 |
+
def run_entity_exploration():
|
214 |
+
st.title("Entity Exploration")
|
215 |
+
|
216 |
+
# Get all entities
|
217 |
+
entities = []
|
218 |
+
for class_name in ontology_manager.get_classes():
|
219 |
+
entities.extend(ontology_manager.get_instances_of_class(class_name))
|
220 |
+
|
221 |
+
# Remove duplicates and sort
|
222 |
+
entities = sorted(set(entities))
|
223 |
+
|
224 |
+
# Create a drop-down selection box
|
225 |
+
selected_entity = st.selectbox("Select entity", entities)
|
226 |
+
|
227 |
+
if selected_entity:
|
228 |
+
# Get entity information
|
229 |
+
entity_info = ontology_manager.get_entity_info(selected_entity)
|
230 |
+
|
231 |
+
# Display detailed information
|
232 |
+
display_entity_details(entity_info, ontology_manager)
|
233 |
+
|
234 |
+
# Set this entity as the central entity (for knowledge graph visualization)
|
235 |
+
if st.button("View this entity in the knowledge graph"):
|
236 |
+
st.session_state.central_entity = selected_entity
|
237 |
+
st.rerun()
|
238 |
+
|
239 |
+
# Get and display entity neighbors
|
240 |
+
st.subheader("Entity Neighborhood")
|
241 |
+
max_distance = st.slider("Maximum neighborhood distance", 1, 3, 1)
|
242 |
+
|
243 |
+
neighborhood = knowledge_graph.get_entity_neighborhood(
|
244 |
+
selected_entity,
|
245 |
+
max_distance=max_distance,
|
246 |
+
include_classes=True
|
247 |
+
)
|
248 |
+
|
249 |
+
if neighborhood and "neighbors" in neighborhood:
|
250 |
+
# Display neighbors grouped by distance
|
251 |
+
for distance in range(1, max_distance+1):
|
252 |
+
neighbors_at_distance = [n for n in neighborhood["neighbors"] if n["distance"] == distance]
|
253 |
+
|
254 |
+
if neighbors_at_distance:
|
255 |
+
with st.expander(f"Neighbors at distance {distance} ({len(neighbors_at_distance)})"):
|
256 |
+
for neighbor in neighbors_at_distance:
|
257 |
+
st.markdown(f"**{neighbor['id']}** ({neighbor.get('class_type', 'unknown')})")
|
258 |
+
|
259 |
+
# Display relations
|
260 |
+
for relation in neighbor.get("relations", []):
|
261 |
+
direction = "→" if relation["direction"] == "outgoing" else "←"
|
262 |
+
st.markdown(f"- {direction} {relation['type']}")
|
263 |
+
|
264 |
+
st.markdown("---")
|
265 |
+
|
266 |
+
def run_semantic_path_visualization():
|
267 |
+
st.title("Semantic Path Visualization")
|
268 |
+
|
269 |
+
# Get all entities
|
270 |
+
entities = []
|
271 |
+
for class_name in ontology_manager.get_classes():
|
272 |
+
entities.extend(ontology_manager.get_instances_of_class(class_name))
|
273 |
+
|
274 |
+
# Remove duplicates and sort
|
275 |
+
entities = sorted(set(entities))
|
276 |
+
|
277 |
+
# Create two columns for selecting source and target entities
|
278 |
+
col1, col2 = st.columns(2)
|
279 |
+
|
280 |
+
with col1:
|
281 |
+
source_entity = st.selectbox("Select source entity", entities, key="source")
|
282 |
+
|
283 |
+
with col2:
|
284 |
+
target_entity = st.selectbox("Select target entity", entities, key="target")
|
285 |
+
|
286 |
+
if source_entity and target_entity and source_entity != target_entity:
|
287 |
+
# Provide a maximum path length option
|
288 |
+
max_length = st.slider("Maximum path length", 1, 5, 3)
|
289 |
+
|
290 |
+
# Find the path
|
291 |
+
paths = knowledge_graph.find_paths_between_entities(
|
292 |
+
source_entity,
|
293 |
+
target_entity,
|
294 |
+
max_length=max_length
|
295 |
+
)
|
296 |
+
|
297 |
+
if paths:
|
298 |
+
st.success(f"Found {len(paths)} paths!")
|
299 |
+
|
300 |
+
# Create expanders for each path
|
301 |
+
for i, path in enumerate(paths):
|
302 |
+
# Calculate path length and relationship type
|
303 |
+
path_length = len(path)
|
304 |
+
rel_types = [edge["type"] for edge in path]
|
305 |
+
|
306 |
+
with st.expander(f"path {i+1} (length: {path_length}, relation: {', '.join(rel_types)})", expanded=(i==0)):
|
307 |
+
# Create a text description of the path
|
308 |
+
path_text = []
|
309 |
+
entities_in_path = []
|
310 |
+
|
311 |
+
for edge in path:
|
312 |
+
source = edge["source"]
|
313 |
+
target = edge["target"]
|
314 |
+
relation = edge["type"]
|
315 |
+
|
316 |
+
entities_in_path.append(source)
|
317 |
+
entities_in_path.append(target)
|
318 |
+
|
319 |
+
# Get entity information to get a human-readable name
|
320 |
+
source_info = ontology_manager.get_entity_info(source)
|
321 |
+
target_info = ontology_manager.get_entity_info(target)
|
322 |
+
|
323 |
+
source_name = source
|
324 |
+
if "properties" in source_info and "name" in source_info["properties"]:
|
325 |
+
source_name = source_info["properties"]["name"]
|
326 |
+
|
327 |
+
target_name = target
|
328 |
+
if "properties" in target_info and "name" in target_info["properties"]:
|
329 |
+
target_name = target_info["properties"]["name"]
|
330 |
+
|
331 |
+
path_text.append(f"{source_name} ({source}) **{relation}** {target_name} ({target})")
|
332 |
+
|
333 |
+
# Display path description
|
334 |
+
st.markdown(" → ".join(path_text))
|
335 |
+
|
336 |
+
# Prepare path visualization
|
337 |
+
path_info = {
|
338 |
+
"source": source_entity,
|
339 |
+
"target": target_entity,
|
340 |
+
"path": path,
|
341 |
+
"text": " → ".join(path_text)
|
342 |
+
}
|
343 |
+
|
344 |
+
# Display path visualization
|
345 |
+
visualize_path(path_info, ontology_manager)
|
346 |
+
else:
|
347 |
+
st.warning(f"No path of length {max_length} or shorter was found between these entities.")
|
348 |
+
|
349 |
+
def run_reasoning_trace():
|
350 |
+
st.title("Inference Tracking Visualization")
|
351 |
+
|
352 |
+
if not st.session_state.get("query") or not st.session_state.get("retrieved_docs") or not st.session_state.get("answer"):
|
353 |
+
st.warning("Please run a query on the RAG comparison page first to generate inference trace data.")
|
354 |
+
return
|
355 |
+
|
356 |
+
# Get data from session state
|
357 |
+
query = st.session_state.query
|
358 |
+
retrieved_docs = st.session_state.retrieved_docs
|
359 |
+
answer = st.session_state.answer
|
360 |
+
|
361 |
+
# Show inference trace
|
362 |
+
display_reasoning_trace(query, retrieved_docs, answer, ontology_manager)
|
363 |
+
|
364 |
+
def run_detailed_comparison():
|
365 |
+
st.title("Detailed comparison of RAG methods")
|
366 |
+
|
367 |
+
# Add comparison query options
|
368 |
+
comparison_queries = [
|
369 |
+
"How does customer feedback influence product development?",
|
370 |
+
"Which employees work in the Engineering department?",
|
371 |
+
"What are the product life cycle stages?",
|
372 |
+
"How do managers monitor employee performance?",
|
373 |
+
"What are the responsibilities of the marketing department?"
|
374 |
+
]
|
375 |
+
|
376 |
+
selected_query = st.selectbox(
|
377 |
+
"Select Compare Query",
|
378 |
+
comparison_queries,
|
379 |
+
index=0
|
380 |
+
)
|
381 |
+
|
382 |
+
custom_query = st.text_input("Or enter a custom query:", "")
|
383 |
+
|
384 |
+
if custom_query:
|
385 |
+
query = custom_query
|
386 |
+
else:
|
387 |
+
query = selected_query
|
388 |
+
|
389 |
+
if st.button("Compare RAG methods"):
|
390 |
+
with st.spinner("Run detailed comparison..."):
|
391 |
+
# Start timing
|
392 |
+
import time
|
393 |
+
start_time = time.time()
|
394 |
+
|
395 |
+
# Run traditional RAG
|
396 |
+
vector_docs = semantic_retriever.vector_store.similarity_search(query, k=k_val)
|
397 |
+
vector_context = "\n\n".join([doc.page_content for doc in vector_docs])
|
398 |
+
vector_messages = [
|
399 |
+
{"role": "system", "content": f"You are an enterprise knowledge assistant...\nContext:\n{vector_context}"},
|
400 |
+
{"role": "user", "content": query}
|
401 |
+
]
|
402 |
+
vector_response = llm.chat.completions.create(
|
403 |
+
model="gpt-3.5-turbo",
|
404 |
+
messages=vector_messages
|
405 |
+
)
|
406 |
+
vector_answer = vector_response.choices[0].message.content
|
407 |
+
vector_time = time.time() - start_time
|
408 |
+
|
409 |
+
# Reset the timer
|
410 |
+
start_time = time.time()
|
411 |
+
|
412 |
+
# Run the enhanced RAG
|
413 |
+
result = semantic_retriever.retrieve_with_paths(query, k=k_val)
|
414 |
+
retrieved_docs = result["documents"]
|
415 |
+
enhanced_context = "\n\n".join([doc.page_content for doc in retrieved_docs])
|
416 |
+
enhanced_messages = [
|
417 |
+
{"role": "system", "content": f"You are an enterprise knowledge assistant with ontology access rights...\nContext:\n{enhanced_context}"},
|
418 |
+
{"role": "user", "content": query}
|
419 |
+
]
|
420 |
+
enhanced_response = llm.chat.completions.create(
|
421 |
+
model="gpt-3.5-turbo",
|
422 |
+
messages=enhanced_messages
|
423 |
+
)
|
424 |
+
enhanced_answer = enhanced_response.choices[0].message.content
|
425 |
+
enhanced_time = time.time() - start_time
|
426 |
+
|
427 |
+
# Save the results for visualization
|
428 |
+
st.session_state.query = query
|
429 |
+
st.session_state.retrieved_docs = retrieved_docs
|
430 |
+
st.session_state.answer = enhanced_answer
|
431 |
+
|
432 |
+
# Display the comparison results
|
433 |
+
st.subheader("Comparison results")
|
434 |
+
|
435 |
+
# Use tabs to show comparisons in different aspects
|
436 |
+
tab1, tab2, tab3, tab4 = st.tabs(["Answer Comparison", "Performance Indicators", "Retrieval Source Comparison", "Context Quality"])
|
437 |
+
|
438 |
+
with tab1:
|
439 |
+
col1, col2 = st.columns(2)
|
440 |
+
|
441 |
+
with col1:
|
442 |
+
st.markdown("#### Traditional RAG answer")
|
443 |
+
st.write(vector_answer)
|
444 |
+
|
445 |
+
with col2:
|
446 |
+
st.markdown("#### Ontology Enhanced RAG Answer")
|
447 |
+
st.write(enhanced_answer)
|
448 |
+
|
449 |
+
with tab2:
|
450 |
+
# Performance Indicators
|
451 |
+
col1, col2 = st.columns(2)
|
452 |
+
|
453 |
+
with col1:
|
454 |
+
st.metric("Traditional RAG response time", f"{vector_time:.2f}秒")
|
455 |
+
|
456 |
+
# Calculate text related indicators
|
457 |
+
vector_tokens = len(vector_context.split())
|
458 |
+
st.metric("Number of retrieved context tokens", vector_tokens)
|
459 |
+
|
460 |
+
st.metric("Number of retrieved documents", len(vector_docs))
|
461 |
+
|
462 |
+
with col2:
|
463 |
+
st.metric("Ontology enhanced RAG response time", f"{enhanced_time:.2f}秒")
|
464 |
+
|
465 |
+
# Calculate text related indicators
|
466 |
+
enhanced_tokens = len(enhanced_context.split())
|
467 |
+
st.metric("Number of retrieved context tokens", enhanced_tokens)
|
468 |
+
|
469 |
+
st.metric("Number of retrieved documents", len(retrieved_docs))
|
470 |
+
|
471 |
+
# Add a chart
|
472 |
+
import pandas as pd
|
473 |
+
import plotly.express as px
|
474 |
+
|
475 |
+
# Performance comparison chart
|
476 |
+
performance_data = {
|
477 |
+
"Metrics": ["Response time (seconds)", "Number of context tags", "Number of retrieved documents"],
|
478 |
+
"Traditional RAG": [vector_time, vector_tokens, len(vector_docs)],
|
479 |
+
"Ontology Enhanced RAG": [enhanced_time, enhanced_tokens, len(retrieved_docs)]
|
480 |
+
}
|
481 |
+
|
482 |
+
df = pd.DataFrame(performance_data)
|
483 |
+
|
484 |
+
# Plotly bar chart
|
485 |
+
fig = px.bar(
|
486 |
+
df,
|
487 |
+
x="Indicator",
|
488 |
+
y=["Traditional RAG", "Ontology Enhanced RAG"],
|
489 |
+
barmode="group",
|
490 |
+
title="Performance Index Comparison",
|
491 |
+
labels={"value": "Numerical value", "variable": "RAG method"}
|
492 |
+
)
|
493 |
+
|
494 |
+
st.plotly_chart(fig)
|
495 |
+
|
496 |
+
with tab3:
|
497 |
+
# Search source comparison
|
498 |
+
traditional_sources = ["Traditional vector retrieval"] * len(vector_docs)
|
499 |
+
|
500 |
+
enhanced_sources = []
|
501 |
+
for doc in retrieved_docs:
|
502 |
+
source = doc.metadata.get("source", "unknown")
|
503 |
+
label = {
|
504 |
+
"ontology": "Ontology context",
|
505 |
+
"text": "Text context",
|
506 |
+
"ontology_context": "Semantic context",
|
507 |
+
"semantic_path": "Relationship path"
|
508 |
+
}.get(source, "unknown source")
|
509 |
+
enhanced_sources.append(label)
|
510 |
+
|
511 |
+
# Create a source distribution chart
|
512 |
+
source_counts = {}
|
513 |
+
for source in enhanced_sources:
|
514 |
+
if source in source_counts:
|
515 |
+
source_counts[source] += 1
|
516 |
+
else:
|
517 |
+
source_counts[source] = 1
|
518 |
+
|
519 |
+
source_df = pd.DataFrame({
|
520 |
+
"Source type": list(source_counts.keys()),
|
521 |
+
"Number of documents": list(source_counts.values())
|
522 |
+
})
|
523 |
+
|
524 |
+
fig = px.pie(
|
525 |
+
source_df,
|
526 |
+
values="Number of documents",
|
527 |
+
names="Source type",
|
528 |
+
title="Ontology-enhanced RAG retrieval source distribution"
|
529 |
+
)
|
530 |
+
|
531 |
+
st.plotly_chart(fig)
|
532 |
+
|
533 |
+
# Show the relationship between the source and the answer
|
534 |
+
st.subheader("Relationship between source and answer")
|
535 |
+
st.markdown("""
|
536 |
+
Ontology-enhanced methods leverage multiple sources of knowledge to construct more comprehensive answers. The figure above shows the distribution of different sources.
|
537 |
+
|
538 |
+
In particular, semantic context and relation paths provide knowledge that cannot be captured by traditional vector retrieval, enabling the system to connect concepts and perform multi-hop reasoning.
|
539 |
+
""")
|
540 |
+
|
541 |
+
with tab4:
|
542 |
+
# Contextual quality assessment
|
543 |
+
st.subheader("Contextual Quality Assessment")
|
544 |
+
|
545 |
+
# Create an evaluation function (simplified version)
|
546 |
+
def evaluate_context(docs):
|
547 |
+
metrics = {
|
548 |
+
"Direct Relevance": 0,
|
549 |
+
"Semantic Richness": 0,
|
550 |
+
"Structure Information": 0,
|
551 |
+
"Relationship Information": 0
|
552 |
+
}
|
553 |
+
|
554 |
+
for doc in docs:
|
555 |
+
content = doc.page_content if hasattr(doc, "page_content") else ""
|
556 |
+
|
557 |
+
# Direct Relevance - Based on Keywords
|
558 |
+
if any(kw in content.lower() for kw in query.lower().split()):
|
559 |
+
metrics["direct relevance"] += 1
|
560 |
+
|
561 |
+
# Semantic richness - based on text length
|
562 |
+
metrics["semantic richness"] += min(1, len(content.split()) / 50)
|
563 |
+
|
564 |
+
# Structural information - from the body
|
565 |
+
if hasattr(doc, "metadata") and doc.metadata.get("source") in ["ontology", "ontology_context"]:
|
566 |
+
metrics["Structure Information"] += 1
|
567 |
+
|
568 |
+
# Relationship information - from path
|
569 |
+
if hasattr(doc, "metadata") and doc.metadata.get("source") == "semantic_path":
|
570 |
+
metrics["relationship information"] += 1
|
571 |
+
|
572 |
+
# Standardization
|
573 |
+
for key in metrics:
|
574 |
+
metrics[key] = min(10, metrics[key])
|
575 |
+
|
576 |
+
return metrics
|
577 |
+
|
578 |
+
# Evaluate the two methods
|
579 |
+
vector_metrics = evaluate_context(vector_docs)
|
580 |
+
enhanced_metrics = evaluate_context(retrieved_docs)
|
581 |
+
|
582 |
+
# Create a comparative radar chart
|
583 |
+
metrics_df = pd.DataFrame({
|
584 |
+
"metrics": list(vector_metrics.keys()),
|
585 |
+
"Traditional RAG": list(vector_metrics.values()),
|
586 |
+
"Ontology Enhanced RAG": list(enhanced_metrics.values())
|
587 |
+
})
|
588 |
+
|
589 |
+
# Convert data to Plotly radar chart format
|
590 |
+
fig = px.line_polar(
|
591 |
+
metrics_df,
|
592 |
+
r=["Traditional RAG", "Ontology Enhanced RAG"],
|
593 |
+
theta="Indicator",
|
594 |
+
line_close=True,
|
595 |
+
range_r=[0, 10],
|
596 |
+
title="Contextual Quality Comparison"
|
597 |
+
)
|
598 |
+
|
599 |
+
st.plotly_chart(fig)
|
600 |
+
|
601 |
+
st.markdown("""
|
602 |
+
The figure above shows the comparison of the two RAG methods in terms of contextual quality. Ontology-enhanced RAG performs better in multiple dimensions:
|
603 |
+
|
604 |
+
1. **Direct relevance**: the degree of relevance between the search content and the query
|
605 |
+
2. **Semantic Richness**: Information density and richness of the retrieval context
|
606 |
+
3. **Structural information**: structured knowledge of entity types, attributes, and relationships
|
607 |
+
4. **Relationship information**: explicit relationships and connection paths between entities
|
608 |
+
|
609 |
+
The advantage of ontology-enhanced RAG is that it can retrieve structured knowledge and relational information, which are missing in traditional RAG methods.
|
610 |
+
""")
|
611 |
+
|
612 |
+
# Display detailed analysis section
|
613 |
+
st.subheader("Method Effect Analysis")
|
614 |
+
|
615 |
+
with st.expander("Comparison of advantages and disadvantages", expanded=True):
|
616 |
+
col1, col2 = st.columns(2)
|
617 |
+
|
618 |
+
with col1:
|
619 |
+
st.markdown("#### Traditional RAG")
|
620 |
+
st.markdown("""
|
621 |
+
**Advantages**:
|
622 |
+
- Simple implementation and light computational burden
|
623 |
+
- Works well with unstructured text
|
624 |
+
- Response times are usually faster
|
625 |
+
|
626 |
+
**Disadvantages**:
|
627 |
+
- Unable to capture relationships between entities
|
628 |
+
- Lack of context for structured knowledge
|
629 |
+
- Difficult to perform multi-hop reasoning
|
630 |
+
- Retrieval is mainly based on text similarity
|
631 |
+
""")
|
632 |
+
|
633 |
+
with col2:
|
634 |
+
st.markdown("#### Ontology Enhanced RAG")
|
635 |
+
st.markdown("""
|
636 |
+
**Advantages**:
|
637 |
+
- Ability to understand relationships and connections between entities
|
638 |
+
- Provides rich structured knowledge context
|
639 |
+
- Support multi-hop reasoning and path discovery
|
640 |
+
- Combining vector similarity and semantic relationship
|
641 |
+
|
642 |
+
**Disadvantages**:
|
643 |
+
- Higher implementation complexity
|
644 |
+
- Need to maintain the ontology model
|
645 |
+
- The computational overhead is relatively high
|
646 |
+
- Retrieval and inference times may be longer
|
647 |
+
""")
|
648 |
+
|
649 |
+
# Add usage scenario suggestions
|
650 |
+
with st.expander("Applicable scenarios"):
|
651 |
+
st.markdown("""
|
652 |
+
### Traditional RAG applicable scenarios
|
653 |
+
|
654 |
+
- Simple fact-finding
|
655 |
+
- Unstructured document retrieval
|
656 |
+
- Applications with high response time requirements
|
657 |
+
- When the document content is clear and direct
|
658 |
+
|
659 |
+
### Applicable scenarios for Ontology Enhanced RAG
|
660 |
+
|
661 |
+
- Complex knowledge association query
|
662 |
+
- Problems that require understanding of relationships between entities
|
663 |
+
- Applications that require cross-domain reasoning
|
664 |
+
- Enterprise Knowledge Management System
|
665 |
+
- Reasoning scenarios that require high accuracy and consistency
|
666 |
+
- Applications that require implicit knowledge discovery
|
667 |
+
""")
|
668 |
+
|
669 |
+
# Add practical application examples
|
670 |
+
with st.expander("Actual Application Case"):
|
671 |
+
st.markdown("""
|
672 |
+
### Enterprise Knowledge Management
|
673 |
+
Ontology-enhanced RAG systems can help enterprises effectively organize and access their knowledge assets, connect information in different departments and systems, and provide more comprehensive business insights.
|
674 |
+
|
675 |
+
### Product development decision support
|
676 |
+
By understanding the relationship between customer feedback, product features, and market data, the system can provide more valuable support for product development decisions.
|
677 |
+
|
678 |
+
### Complex compliance query
|
679 |
+
In compliance problems that require consideration of multiple rules and relationships, ontology-enhanced RAG can provide rule-based reasoning, ensuring that recommendations comply with all applicable policies and regulations.
|
680 |
+
|
681 |
+
### Diagnostics and Troubleshooting
|
682 |
+
In technical support and troubleshooting scenarios, the system can connect symptoms, causes, and solutions to provide more accurate diagnoses through multi-hop reasoning.
|
683 |
+
""")
|
data/enterprise_ontology.json
ADDED
@@ -0,0 +1,771 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"rules": [
|
3 |
+
{
|
4 |
+
"id": "rule9",
|
5 |
+
"description": "Critical support tickets must be assigned to Senior employees or managers",
|
6 |
+
"constraint": "FORALL ?t WHERE type(?t, SupportTicket) AND property(?t, priority, 'Critical') AND relationship(?t, assignedTo, ?e) MUST type(?e, Manager) OR (type(?e, Employee) AND property(?e, experienceLevel, 'Senior'))"
|
7 |
+
},
|
8 |
+
{
|
9 |
+
"id": "rule10",
|
10 |
+
"description": "Project end date must be after its start date",
|
11 |
+
"constraint": "FORALL ?p WHERE type(?p, Project) AND property(?p, startDate, ?start) AND property(?p, endDate, ?end) MUST date(?end) > date(?start)"
|
12 |
+
}
|
13 |
+
],
|
14 |
+
"classes": {
|
15 |
+
"FinancialEntity": {
|
16 |
+
"description": "An entity related to financial matters",
|
17 |
+
"subClassOf": "Entity",
|
18 |
+
"properties": ["amount", "currency", "fiscalYear", "quarter", "transactionDate"]
|
19 |
+
},
|
20 |
+
|
21 |
+
"Budget": {
|
22 |
+
"description": "A financial plan for a specified period",
|
23 |
+
"subClassOf": "FinancialEntity",
|
24 |
+
"properties": ["budgetId", "period", "departmentId", "plannedAmount", "actualAmount", "variance"]
|
25 |
+
},
|
26 |
+
|
27 |
+
"Revenue": {
|
28 |
+
"description": "Income generated from business activities",
|
29 |
+
"subClassOf": "FinancialEntity",
|
30 |
+
"properties": ["revenueId", "source", "productId", "recurring", "oneTime", "revenueType"]
|
31 |
+
},
|
32 |
+
|
33 |
+
"Expense": {
|
34 |
+
"description": "Cost incurred in business operations",
|
35 |
+
"subClassOf": "FinancialEntity",
|
36 |
+
"properties": ["expenseId", "category", "department", "approvedBy", "paymentStatus", "receiptUrl"]
|
37 |
+
},
|
38 |
+
|
39 |
+
"Asset": {
|
40 |
+
"description": "A resource with economic value",
|
41 |
+
"subClassOf": "Entity",
|
42 |
+
"properties": ["assetId", "acquisitionDate", "value", "depreciationSchedule", "currentValue", "location"]
|
43 |
+
},
|
44 |
+
|
45 |
+
"PhysicalAsset": {
|
46 |
+
"description": "A tangible asset with physical presence",
|
47 |
+
"subClassOf": "Asset",
|
48 |
+
"properties": ["serialNumber", "manufacturer", "model", "maintenanceSchedule", "condition"]
|
49 |
+
},
|
50 |
+
|
51 |
+
"DigitalAsset": {
|
52 |
+
"description": "An intangible digital asset",
|
53 |
+
"subClassOf": "Asset",
|
54 |
+
"properties": ["fileType", "storageLocation", "accessControl", "backupStatus", "version"]
|
55 |
+
},
|
56 |
+
|
57 |
+
"IntellectualProperty": {
|
58 |
+
"description": "Legal rights resulting from intellectual activity",
|
59 |
+
"subClassOf": "Asset",
|
60 |
+
"properties": ["ipType", "filingDate", "grantDate", "jurisdiction", "inventors", "expirationDate"]
|
61 |
+
},
|
62 |
+
|
63 |
+
"Location": {
|
64 |
+
"description": "A physical or virtual place",
|
65 |
+
"subClassOf": "Entity",
|
66 |
+
"properties": ["locationId", "address", "city", "state", "country", "postalCode", "geoCoordinates"]
|
67 |
+
},
|
68 |
+
|
69 |
+
"Facility": {
|
70 |
+
"description": "A physical building or site owned or operated by the organization",
|
71 |
+
"subClassOf": "Location",
|
72 |
+
"properties": ["facilityType", "squareFootage", "capacity", "operatingHours", "amenities", "securityLevel"]
|
73 |
+
},
|
74 |
+
|
75 |
+
"VirtualLocation": {
|
76 |
+
"description": "A digital space or environment",
|
77 |
+
"subClassOf": "Location",
|
78 |
+
"properties": ["url", "accessMethod", "hostingProvider", "virtualEnvironmentType", "availabilityStatus"]
|
79 |
+
},
|
80 |
+
|
81 |
+
"Market": {
|
82 |
+
"description": "A geographic or demographic target for products and services",
|
83 |
+
"subClassOf": "Entity",
|
84 |
+
"properties": ["marketId", "name", "geography", "demographics", "size", "growth", "competitiveIntensity"]
|
85 |
+
},
|
86 |
+
|
87 |
+
"GeographicMarket": {
|
88 |
+
"description": "A market defined by geographic boundaries",
|
89 |
+
"subClassOf": "Market",
|
90 |
+
"properties": ["region", "countries", "languages", "regulations", "culturalFactors"]
|
91 |
+
},
|
92 |
+
|
93 |
+
"DemographicMarket": {
|
94 |
+
"description": "A market defined by demographic characteristics",
|
95 |
+
"subClassOf": "Market",
|
96 |
+
"properties": ["ageRange", "income", "education", "occupation", "familyStatus", "interests"]
|
97 |
+
},
|
98 |
+
|
99 |
+
"BusinessMarket": {
|
100 |
+
"description": "A market consisting of business customers",
|
101 |
+
"subClassOf": "Market",
|
102 |
+
"properties": ["industryFocus", "companySize", "businessModel", "decisionMakers", "purchasingCriteria"]
|
103 |
+
},
|
104 |
+
|
105 |
+
"Campaign": {
|
106 |
+
"description": "A coordinated series of marketing activities",
|
107 |
+
"subClassOf": "Entity",
|
108 |
+
"properties": ["campaignId", "name", "objective", "startDate", "endDate", "budget", "targetAudience", "channels"]
|
109 |
+
},
|
110 |
+
|
111 |
+
"DigitalCampaign": {
|
112 |
+
"description": "A marketing campaign conducted through digital channels",
|
113 |
+
"subClassOf": "Campaign",
|
114 |
+
"properties": ["platforms", "contentTypes", "keywords", "tracking", "analytics", "automationWorkflows"]
|
115 |
+
},
|
116 |
+
|
117 |
+
"TraditionalCampaign": {
|
118 |
+
"description": "A marketing campaign conducted through traditional media",
|
119 |
+
"subClassOf": "Campaign",
|
120 |
+
"properties": ["mediaTypes", "adSizes", "placementSchedule", "production", "distributionMethod"]
|
121 |
+
},
|
122 |
+
|
123 |
+
"IntegratedCampaign": {
|
124 |
+
"description": "A campaign that spans multiple marketing channels",
|
125 |
+
"subClassOf": "Campaign",
|
126 |
+
"properties": ["channelMix", "messageConsistency", "crossChannelMetrics", "customerJourneyMap"]
|
127 |
+
},
|
128 |
+
|
129 |
+
"Process": {
|
130 |
+
"description": "A defined set of activities to accomplish a specific objective",
|
131 |
+
"subClassOf": "Entity",
|
132 |
+
"properties": ["processId", "name", "purpose", "owner", "inputs", "outputs", "steps", "metrics"]
|
133 |
+
},
|
134 |
+
|
135 |
+
"BusinessProcess": {
|
136 |
+
"description": "A process for conducting business operations",
|
137 |
+
"subClassOf": "Process",
|
138 |
+
"properties": ["businessFunction", "criticality", "maturityLevel", "automationLevel", "regulatoryRequirements"]
|
139 |
+
},
|
140 |
+
|
141 |
+
"DevelopmentProcess": {
|
142 |
+
"description": "A process for developing products or services",
|
143 |
+
"subClassOf": "Process",
|
144 |
+
"properties": ["methodology", "phases", "deliverables", "qualityGates", "tools", "repositories"]
|
145 |
+
},
|
146 |
+
|
147 |
+
"SupportProcess": {
|
148 |
+
"description": "A process for supporting customers or internal users",
|
149 |
+
"subClassOf": "Process",
|
150 |
+
"properties": ["serviceLevel", "escalationPath", "knowledgeBase", "ticketingSystem", "supportHours"]
|
151 |
+
},
|
152 |
+
|
153 |
+
"Skill": {
|
154 |
+
"description": "A learned capacity to perform a task",
|
155 |
+
"subClassOf": "Entity",
|
156 |
+
"properties": ["skillId", "name", "category", "proficiencyLevels", "certifications", "learningResources"]
|
157 |
+
},
|
158 |
+
|
159 |
+
"TechnicalSkill": {
|
160 |
+
"description": "A skill related to technology or technical processes",
|
161 |
+
"subClassOf": "Skill",
|
162 |
+
"properties": ["techCategory", "tools", "languages", "frameworks", "platforms", "compatibility"]
|
163 |
+
},
|
164 |
+
|
165 |
+
"SoftSkill": {
|
166 |
+
"description": "An interpersonal or non-technical skill",
|
167 |
+
"subClassOf": "Skill",
|
168 |
+
"properties": ["interpersonalArea", "communicationAspects", "leadershipComponents", "adaptabilityMetrics"]
|
169 |
+
},
|
170 |
+
|
171 |
+
"DomainSkill": {
|
172 |
+
"description": "Knowledge and expertise in a specific business domain",
|
173 |
+
"subClassOf": "Skill",
|
174 |
+
"properties": ["domain", "industrySpecific", "regulations", "bestPractices", "domainTerminology"]
|
175 |
+
},
|
176 |
+
|
177 |
+
"Objective": {
|
178 |
+
"description": "A goal or target to be achieved",
|
179 |
+
"subClassOf": "Entity",
|
180 |
+
"properties": ["objectiveId", "name", "description", "targetDate", "status", "priority", "owner", "metrics"]
|
181 |
+
},
|
182 |
+
|
183 |
+
"StrategicObjective": {
|
184 |
+
"description": "A high-level, long-term goal",
|
185 |
+
"subClassOf": "Objective",
|
186 |
+
"properties": ["strategyAlignment", "timeframe", "impactAreas", "successIndicators", "boardApproval"]
|
187 |
+
},
|
188 |
+
|
189 |
+
"TacticalObjective": {
|
190 |
+
"description": "A medium-term goal supporting strategic objectives",
|
191 |
+
"subClassOf": "Objective",
|
192 |
+
"properties": ["parentObjective", "implementationPlan", "resourceRequirements", "dependencies", "milestones"]
|
193 |
+
},
|
194 |
+
|
195 |
+
"OperationalObjective": {
|
196 |
+
"description": "A short-term, specific goal supporting tactical objectives",
|
197 |
+
"subClassOf": "Objective",
|
198 |
+
"properties": ["parentTacticalObjective", "assignedTeam", "dailyActivities", "progressTracking", "completionCriteria"]
|
199 |
+
},
|
200 |
+
|
201 |
+
"KPI": {
|
202 |
+
"description": "Key Performance Indicator for measuring success",
|
203 |
+
"subClassOf": "Entity",
|
204 |
+
"properties": ["kpiId", "name", "description", "category", "unit", "formula", "target", "actual", "frequency"]
|
205 |
+
},
|
206 |
+
|
207 |
+
"FinancialKPI": {
|
208 |
+
"description": "KPI measuring financial performance",
|
209 |
+
"subClassOf": "KPI",
|
210 |
+
"properties": ["financialCategory", "accountingStandard", "auditRequirement", "forecastAccuracy"]
|
211 |
+
},
|
212 |
+
|
213 |
+
"CustomerKPI": {
|
214 |
+
"description": "KPI measuring customer-related performance",
|
215 |
+
"subClassOf": "KPI",
|
216 |
+
"properties": ["customerSegment", "touchpoint", "journeyStage", "sentimentConnection", "loyaltyImpact"]
|
217 |
+
},
|
218 |
+
|
219 |
+
"OperationalKPI": {
|
220 |
+
"description": "KPI measuring operational efficiency",
|
221 |
+
"subClassOf": "KPI",
|
222 |
+
"properties": ["processArea", "qualityDimension", "productivityFactor", "resourceUtilization"]
|
223 |
+
},
|
224 |
+
|
225 |
+
"Risk": {
|
226 |
+
"description": "A potential event that could negatively impact objectives",
|
227 |
+
"subClassOf": "Entity",
|
228 |
+
"properties": ["riskId", "name", "description", "category", "probability", "impact", "status", "mitigationPlan"]
|
229 |
+
},
|
230 |
+
|
231 |
+
"FinancialRisk": {
|
232 |
+
"description": "Risk related to financial matters",
|
233 |
+
"subClassOf": "Risk",
|
234 |
+
"properties": ["financialExposure", "currencyFactors", "marketConditions", "hedgingStrategy", "insuranceCoverage"]
|
235 |
+
},
|
236 |
+
|
237 |
+
"OperationalRisk": {
|
238 |
+
"description": "Risk related to business operations",
|
239 |
+
"subClassOf": "Risk",
|
240 |
+
"properties": ["operationalArea", "processVulnerabilities", "systemDependencies", "staffingFactors", "recoveryPlan"]
|
241 |
+
},
|
242 |
+
|
243 |
+
"ComplianceRisk": {
|
244 |
+
"description": "Risk related to regulatory compliance",
|
245 |
+
"subClassOf": "Risk",
|
246 |
+
"properties": ["regulations", "jurisdictions", "reportingRequirements", "penaltyExposure", "complianceStatus"]
|
247 |
+
},
|
248 |
+
|
249 |
+
"Decision": {
|
250 |
+
"description": "A choice made between alternatives",
|
251 |
+
"subClassOf": "Entity",
|
252 |
+
"properties": ["decisionId", "name", "description", "date", "decisionMaker", "alternatives", "selectedOption", "rationale"]
|
253 |
+
},
|
254 |
+
|
255 |
+
"StrategicDecision": {
|
256 |
+
"description": "A decision affecting long-term direction",
|
257 |
+
"subClassOf": "Decision",
|
258 |
+
"properties": ["strategicImplications", "marketPosition", "competitiveAdvantage", "boardApproval", "communicationPlan"]
|
259 |
+
},
|
260 |
+
|
261 |
+
"TacticalDecision": {
|
262 |
+
"description": "A decision affecting medium-term operations",
|
263 |
+
"subClassOf": "Decision",
|
264 |
+
"properties": ["operationalImpact", "resourceAllocation", "implementationTimeline", "departmentalScope"]
|
265 |
+
},
|
266 |
+
|
267 |
+
"OperationalDecision": {
|
268 |
+
"description": "A day-to-day decision in business operations",
|
269 |
+
"subClassOf": "Decision",
|
270 |
+
"properties": ["decisionFrequency", "standardProcedure", "delegationLevel", "auditTrail"]
|
271 |
+
},
|
272 |
+
|
273 |
+
"Technology": {
|
274 |
+
"description": "A technical capability or system",
|
275 |
+
"subClassOf": "Entity",
|
276 |
+
"properties": ["technologyId", "name", "category", "version", "vendor", "maturityLevel", "supportStatus"]
|
277 |
+
},
|
278 |
+
|
279 |
+
"Hardware": {
|
280 |
+
"description": "Physical technological equipment",
|
281 |
+
"subClassOf": "Technology",
|
282 |
+
"properties": ["specifications", "formFactor", "powerRequirements", "connectivity", "lifecycle", "replacementSchedule"]
|
283 |
+
},
|
284 |
+
|
285 |
+
"Software": {
|
286 |
+
"description": "Computer programs and applications",
|
287 |
+
"subClassOf": "Technology",
|
288 |
+
"properties": ["programmingLanguage", "operatingSystem", "architecture", "apiDocumentation", "licensingModel", "updateFrequency"]
|
289 |
+
},
|
290 |
+
|
291 |
+
"Infrastructure": {
|
292 |
+
"description": "Foundational technology systems",
|
293 |
+
"subClassOf": "Technology",
|
294 |
+
"properties": ["deploymentModel", "scalability", "redundancy", "securityFeatures", "complianceCertifications", "capacityMetrics"]
|
295 |
+
},
|
296 |
+
|
297 |
+
"SecurityEntity": {
|
298 |
+
"description": "An entity related to security measures",
|
299 |
+
"subClassOf": "Entity",
|
300 |
+
"properties": ["securityId", "name", "type", "implementationDate", "lastReview", "responsibleParty", "status"]
|
301 |
+
},
|
302 |
+
|
303 |
+
"SecurityControl": {
|
304 |
+
"description": "A measure to mitigate security risks",
|
305 |
+
"subClassOf": "SecurityEntity",
|
306 |
+
"properties": ["controlCategory", "protectedAssets", "implementationLevel", "automationDegree", "verificationMethod", "exceptions"]
|
307 |
+
},
|
308 |
+
|
309 |
+
"SecurityIncident": {
|
310 |
+
"description": "An event that compromises security",
|
311 |
+
"subClassOf": "SecurityEntity",
|
312 |
+
"properties": ["incidentDate", "severity", "affectedSystems", "vector", "remediationSteps", "rootCause", "resolution"]
|
313 |
+
},
|
314 |
+
|
315 |
+
"SecurityPolicy": {
|
316 |
+
"description": "A documented security directive",
|
317 |
+
"subClassOf": "SecurityEntity",
|
318 |
+
"properties": ["policyScope", "requiredControls", "complianceRequirements", "exemptionProcess", "reviewSchedule", "enforcementMechanism"]
|
319 |
+
},
|
320 |
+
|
321 |
+
"Competency": {
|
322 |
+
"description": "A cluster of related abilities, knowledge, and skills",
|
323 |
+
"subClassOf": "Entity",
|
324 |
+
"properties": ["competencyId", "name", "category", "description", "importance", "requiredProficiency", "assessmentMethod"]
|
325 |
+
},
|
326 |
+
|
327 |
+
"ManagerialCompetency": {
|
328 |
+
"description": "Competency related to managing people and resources",
|
329 |
+
"subClassOf": "Competency",
|
330 |
+
"properties": ["leadershipAspects", "teamDevelopment", "decisionMaking", "conflictResolution", "changeManagement", "resourceOptimization"]
|
331 |
+
},
|
332 |
+
|
333 |
+
"TechnicalCompetency": {
|
334 |
+
"description": "Competency related to technical knowledge and skills",
|
335 |
+
"subClassOf": "Competency",
|
336 |
+
"properties": ["technicalDomain", "specializations", "toolProficiency", "problemSolvingApproach", "technicalLeadership", "knowledgeSharing"]
|
337 |
+
},
|
338 |
+
|
339 |
+
"BusinessCompetency": {
|
340 |
+
"description": "Competency related to business acumen and operations",
|
341 |
+
"subClassOf": "Competency",
|
342 |
+
"properties": ["businessAcumen", "industryKnowledge", "stakeholderManagement", "commercialAwareness", "strategicThinking", "resultsOrientation"]
|
343 |
+
},
|
344 |
+
|
345 |
+
"Stakeholder": {
|
346 |
+
"description": "An individual or group with interest in or influence over the organization",
|
347 |
+
"subClassOf": "Entity",
|
348 |
+
"properties": ["stakeholderId", "name", "type", "influence", "interest", "expectations", "engagementLevel", "communicationPreference"]
|
349 |
+
},
|
350 |
+
|
351 |
+
"InternalStakeholder": {
|
352 |
+
"description": "A stakeholder within the organization",
|
353 |
+
"subClassOf": "Stakeholder",
|
354 |
+
"properties": ["department", "role", "decisionAuthority", "projectInvolvement", "changeReadiness", "organizationalTenure"]
|
355 |
+
},
|
356 |
+
|
357 |
+
"ExternalStakeholder": {
|
358 |
+
"description": "A stakeholder outside the organization",
|
359 |
+
"subClassOf": "Stakeholder",
|
360 |
+
"properties": ["organization", "relationship", "contractualAgreements", "marketInfluence", "externalNetworks", "publicProfile"]
|
361 |
+
},
|
362 |
+
|
363 |
+
"RegulatoryStakeholder": {
|
364 |
+
"description": "A regulatory body or authority",
|
365 |
+
"subClassOf": "Stakeholder",
|
366 |
+
"properties": ["jurisdiction", "regulations", "enforcementPowers", "reportingRequirements", "auditFrequency", "complianceDeadlines"]
|
367 |
+
}
|
368 |
+
},
|
369 |
+
"relationships": [
|
370 |
+
{
|
371 |
+
"name": "ownedBy",
|
372 |
+
"domain": "Product",
|
373 |
+
"range": "Department",
|
374 |
+
"inverse": "owns",
|
375 |
+
"cardinality": "many-to-one",
|
376 |
+
"description": "Indicates which department owns a product"
|
377 |
+
},
|
378 |
+
{
|
379 |
+
"name": "managedBy",
|
380 |
+
"domain": "Department",
|
381 |
+
"range": "Manager",
|
382 |
+
"inverse": "manages",
|
383 |
+
"cardinality": "one-to-one",
|
384 |
+
"description": "Indicates which manager heads a department"
|
385 |
+
},
|
386 |
+
{
|
387 |
+
"name": "worksOn",
|
388 |
+
"domain": "Employee",
|
389 |
+
"range": "Product",
|
390 |
+
"inverse": "developedBy",
|
391 |
+
"cardinality": "many-to-many",
|
392 |
+
"description": "Indicates which products an employee works on"
|
393 |
+
},
|
394 |
+
{
|
395 |
+
"name": "purchases",
|
396 |
+
"domain": "Customer",
|
397 |
+
"range": "Product",
|
398 |
+
"inverse": "purchasedBy",
|
399 |
+
"cardinality": "many-to-many",
|
400 |
+
"description": "Indicates which products a customer has purchased"
|
401 |
+
},
|
402 |
+
{
|
403 |
+
"name": "provides",
|
404 |
+
"domain": "Customer",
|
405 |
+
"range": "Feedback",
|
406 |
+
"inverse": "providedBy",
|
407 |
+
"cardinality": "one-to-many",
|
408 |
+
"description": "Connects customers to their feedback submissions"
|
409 |
+
},
|
410 |
+
{
|
411 |
+
"name": "pertainsTo",
|
412 |
+
"domain": "Feedback",
|
413 |
+
"range": "Product",
|
414 |
+
"inverse": "hasFeedback",
|
415 |
+
"cardinality": "many-to-one",
|
416 |
+
"description": "Indicates which product a feedback item is about"
|
417 |
+
},
|
418 |
+
{
|
419 |
+
"name": "supports",
|
420 |
+
"domain": "Platform",
|
421 |
+
"range": "Product",
|
422 |
+
"inverse": "supportedBy",
|
423 |
+
"cardinality": "one-to-many",
|
424 |
+
"description": "Indicates which products are supported by the platform"
|
425 |
+
},
|
426 |
+
{
|
427 |
+
"name": "hasLifecycle",
|
428 |
+
"domain": "Product",
|
429 |
+
"range": "Lifecycle",
|
430 |
+
"inverse": "lifecycleOf",
|
431 |
+
"cardinality": "one-to-one",
|
432 |
+
"description": "Connects a product to its lifecycle information"
|
433 |
+
},
|
434 |
+
{
|
435 |
+
"name": "oversees",
|
436 |
+
"domain": "Manager",
|
437 |
+
"range": "Employee",
|
438 |
+
"inverse": "reportsToDirect",
|
439 |
+
"cardinality": "one-to-many",
|
440 |
+
"description": "Indicates which employees report to a manager"
|
441 |
+
},
|
442 |
+
{
|
443 |
+
"name": "optimizedBy",
|
444 |
+
"domain": "Product",
|
445 |
+
"range": "Feedback",
|
446 |
+
"inverse": "optimizes",
|
447 |
+
"cardinality": "many-to-many",
|
448 |
+
"description": "Indicates how feedback is used to optimize product development"
|
449 |
+
},
|
450 |
+
{
|
451 |
+
"name": "allocatesTo",
|
452 |
+
"domain": "Budget",
|
453 |
+
"range": "Department",
|
454 |
+
"inverse": "fundedBy",
|
455 |
+
"cardinality": "one-to-many",
|
456 |
+
"description": "Indicates which departments receive budget allocations"
|
457 |
+
},
|
458 |
+
{
|
459 |
+
"name": "generatesRevenue",
|
460 |
+
"domain": "Product",
|
461 |
+
"range": "Revenue",
|
462 |
+
"inverse": "generatedFrom",
|
463 |
+
"cardinality": "one-to-many",
|
464 |
+
"description": "Connects products to the revenue they generate"
|
465 |
+
},
|
466 |
+
{
|
467 |
+
"name": "incursExpense",
|
468 |
+
"domain": "Department",
|
469 |
+
"range": "Expense",
|
470 |
+
"inverse": "incurredBy",
|
471 |
+
"cardinality": "one-to-many",
|
472 |
+
"description": "Connects departments to their expenses"
|
473 |
+
},
|
474 |
+
{
|
475 |
+
"name": "locatedAt",
|
476 |
+
"domain": "PhysicalEntity",
|
477 |
+
"range": "Location",
|
478 |
+
"inverse": "houses",
|
479 |
+
"cardinality": "many-to-one",
|
480 |
+
"description": "Indicates where a physical entity is located"
|
481 |
+
},
|
482 |
+
{
|
483 |
+
"name": "targetedAt",
|
484 |
+
"domain": "Campaign",
|
485 |
+
"range": "Market",
|
486 |
+
"inverse": "targetedBy",
|
487 |
+
"cardinality": "many-to-many",
|
488 |
+
"description": "Indicates which markets a campaign targets"
|
489 |
+
},
|
490 |
+
{
|
491 |
+
"name": "follows",
|
492 |
+
"domain": "Project",
|
493 |
+
"range": "Process",
|
494 |
+
"inverse": "implementedBy",
|
495 |
+
"cardinality": "many-to-one",
|
496 |
+
"description": "Indicates which process a project follows"
|
497 |
+
},
|
498 |
+
{
|
499 |
+
"name": "requires",
|
500 |
+
"domain": "Role",
|
501 |
+
"range": "Skill",
|
502 |
+
"inverse": "requiredFor",
|
503 |
+
"cardinality": "many-to-many",
|
504 |
+
"description": "Indicates which skills are required for a role"
|
505 |
+
},
|
506 |
+
{
|
507 |
+
"name": "possesses",
|
508 |
+
"domain": "Employee",
|
509 |
+
"range": "Skill",
|
510 |
+
"inverse": "possessedBy",
|
511 |
+
"cardinality": "many-to-many",
|
512 |
+
"description": "Indicates which skills an employee possesses"
|
513 |
+
},
|
514 |
+
{
|
515 |
+
"name": "measures",
|
516 |
+
"domain": "KPI",
|
517 |
+
"range": "Objective",
|
518 |
+
"inverse": "measuredBy",
|
519 |
+
"cardinality": "many-to-many",
|
520 |
+
"description": "Indicates which objectives a KPI measures"
|
521 |
+
},
|
522 |
+
{
|
523 |
+
"name": "affects",
|
524 |
+
"domain": "Risk",
|
525 |
+
"range": "Entity",
|
526 |
+
"inverse": "affectedBy",
|
527 |
+
"cardinality": "many-to-many",
|
528 |
+
"description": "Indicates which entities are affected by a risk"
|
529 |
+
},
|
530 |
+
{
|
531 |
+
"name": "mitigates",
|
532 |
+
"domain": "SecurityControl",
|
533 |
+
"range": "Risk",
|
534 |
+
"inverse": "mitigatedBy",
|
535 |
+
"cardinality": "many-to-many",
|
536 |
+
"description": "Indicates which risks are mitigated by a security control"
|
537 |
+
},
|
538 |
+
{
|
539 |
+
"name": "demonstrates",
|
540 |
+
"domain": "Employee",
|
541 |
+
"range": "Competency",
|
542 |
+
"inverse": "demonstratedBy",
|
543 |
+
"cardinality": "many-to-many",
|
544 |
+
"description": "Indicates which competencies an employee demonstrates"
|
545 |
+
},
|
546 |
+
{
|
547 |
+
"name": "influencedBy",
|
548 |
+
"domain": "Decision",
|
549 |
+
"range": "Stakeholder",
|
550 |
+
"inverse": "influences",
|
551 |
+
"cardinality": "many-to-many",
|
552 |
+
"description": "Indicates which stakeholders influence a decision"
|
553 |
+
},
|
554 |
+
{
|
555 |
+
"name": "implementedWith",
|
556 |
+
"domain": "Process",
|
557 |
+
"range": "Technology",
|
558 |
+
"inverse": "supports",
|
559 |
+
"cardinality": "many-to-many",
|
560 |
+
"description": "Indicates which technologies support a process"
|
561 |
+
}
|
562 |
+
],
|
563 |
+
"instances": [
|
564 |
+
{
|
565 |
+
"id": "product1",
|
566 |
+
"type": "Product",
|
567 |
+
"properties": {
|
568 |
+
"name": "Enterprise Analytics Suite",
|
569 |
+
"version": "2.1",
|
570 |
+
"status": "Active"
|
571 |
+
},
|
572 |
+
"relationships": [
|
573 |
+
{"type": "ownedBy", "target": "dept1"},
|
574 |
+
{"type": "hasLifecycle", "target": "lifecycle1"},
|
575 |
+
{"type": "optimizedBy", "target": "feedback1"}
|
576 |
+
]
|
577 |
+
},
|
578 |
+
{
|
579 |
+
"id": "product2",
|
580 |
+
"type": "Product",
|
581 |
+
"properties": {
|
582 |
+
"name": "Customer Portal",
|
583 |
+
"version": "1.5",
|
584 |
+
"status": "Active"
|
585 |
+
},
|
586 |
+
"relationships": [
|
587 |
+
{"type": "ownedBy", "target": "dept2"},
|
588 |
+
{"type": "hasLifecycle", "target": "lifecycle2"},
|
589 |
+
{"type": "optimizedBy", "target": "feedback2"}
|
590 |
+
]
|
591 |
+
},
|
592 |
+
{
|
593 |
+
"id": "dept1",
|
594 |
+
"type": "Department",
|
595 |
+
"properties": {
|
596 |
+
"name": "Engineering",
|
597 |
+
"function": "Product Development"
|
598 |
+
},
|
599 |
+
"relationships": [
|
600 |
+
{"type": "managedBy", "target": "manager1"},
|
601 |
+
{"type": "owns", "target": "product1"}
|
602 |
+
]
|
603 |
+
},
|
604 |
+
{
|
605 |
+
"id": "dept2",
|
606 |
+
"type": "Department",
|
607 |
+
"properties": {
|
608 |
+
"name": "Marketing",
|
609 |
+
"function": "Customer Engagement"
|
610 |
+
},
|
611 |
+
"relationships": [
|
612 |
+
{"type": "managedBy", "target": "manager2"},
|
613 |
+
{"type": "owns", "target": "product2"}
|
614 |
+
]
|
615 |
+
},
|
616 |
+
{
|
617 |
+
"id": "manager1",
|
618 |
+
"type": "Manager",
|
619 |
+
"properties": {
|
620 |
+
"name": "Jane Smith",
|
621 |
+
"role": "Engineering Director",
|
622 |
+
"managementLevel": "Director"
|
623 |
+
},
|
624 |
+
"relationships": [
|
625 |
+
{"type": "oversees", "target": "employee1"},
|
626 |
+
{"type": "oversees", "target": "employee2"},
|
627 |
+
{"type": "manages", "target": "dept1"}
|
628 |
+
]
|
629 |
+
},
|
630 |
+
{
|
631 |
+
"id": "manager2",
|
632 |
+
"type": "Manager",
|
633 |
+
"properties": {
|
634 |
+
"name": "Michael Chen",
|
635 |
+
"role": "Marketing Manager",
|
636 |
+
"managementLevel": "Manager"
|
637 |
+
},
|
638 |
+
"relationships": [
|
639 |
+
{"type": "oversees", "target": "employee3"},
|
640 |
+
{"type": "manages", "target": "dept2"}
|
641 |
+
]
|
642 |
+
},
|
643 |
+
{
|
644 |
+
"id": "employee1",
|
645 |
+
"type": "Employee",
|
646 |
+
"properties": {
|
647 |
+
"name": "John Doe",
|
648 |
+
"role": "Senior Developer"
|
649 |
+
},
|
650 |
+
"relationships": [
|
651 |
+
{"type": "worksOn", "target": "product1"},
|
652 |
+
{"type": "reportsToDirect", "target": "manager1"}
|
653 |
+
]
|
654 |
+
},
|
655 |
+
{
|
656 |
+
"id": "employee2",
|
657 |
+
"type": "Employee",
|
658 |
+
"properties": {
|
659 |
+
"name": "Sarah Johnson",
|
660 |
+
"role": "QA Engineer"
|
661 |
+
},
|
662 |
+
"relationships": [
|
663 |
+
{"type": "worksOn", "target": "product1"},
|
664 |
+
{"type": "reportsToDirect", "target": "manager1"}
|
665 |
+
]
|
666 |
+
},
|
667 |
+
{
|
668 |
+
"id": "employee3",
|
669 |
+
"type": "Employee",
|
670 |
+
"properties": {
|
671 |
+
"name": "David Wilson",
|
672 |
+
"role": "Marketing Specialist"
|
673 |
+
},
|
674 |
+
"relationships": [
|
675 |
+
{"type": "worksOn", "target": "product2"},
|
676 |
+
{"type": "reportsToDirect", "target": "manager2"}
|
677 |
+
]
|
678 |
+
},
|
679 |
+
{
|
680 |
+
"id": "customer1",
|
681 |
+
"type": "Customer",
|
682 |
+
"properties": {
|
683 |
+
"name": "Acme Corp",
|
684 |
+
"customerSince": "2020-05-15"
|
685 |
+
},
|
686 |
+
"relationships": [
|
687 |
+
{"type": "purchases", "target": "product1"},
|
688 |
+
{"type": "provides", "target": "feedback1"}
|
689 |
+
]
|
690 |
+
},
|
691 |
+
{
|
692 |
+
"id": "customer2",
|
693 |
+
"type": "Customer",
|
694 |
+
"properties": {
|
695 |
+
"name": "GlobalTech",
|
696 |
+
"customerSince": "2021-03-22"
|
697 |
+
},
|
698 |
+
"relationships": [
|
699 |
+
{"type": "purchases", "target": "product2"},
|
700 |
+
{"type": "provides", "target": "feedback2"}
|
701 |
+
]
|
702 |
+
},
|
703 |
+
{
|
704 |
+
"id": "feedback1",
|
705 |
+
"type": "Feedback",
|
706 |
+
"properties": {
|
707 |
+
"date": "2023-09-10",
|
708 |
+
"sentiment": "Positive",
|
709 |
+
"rating": 4.5,
|
710 |
+
"content": "The analytics dashboard is very intuitive and provides excellent insights.",
|
711 |
+
"suggestions": "Would like to see more export options."
|
712 |
+
},
|
713 |
+
"relationships": [
|
714 |
+
{"type": "providedBy", "target": "customer1"},
|
715 |
+
{"type": "pertainsTo", "target": "product1"},
|
716 |
+
{"type": "optimizes", "target": "product1"}
|
717 |
+
]
|
718 |
+
},
|
719 |
+
{
|
720 |
+
"id": "feedback2",
|
721 |
+
"type": "Feedback",
|
722 |
+
"properties": {
|
723 |
+
"date": "2023-10-05",
|
724 |
+
"sentiment": "Mixed",
|
725 |
+
"rating": 3.0,
|
726 |
+
"content": "The portal is functional but navigation could be improved.",
|
727 |
+
"suggestions": "Add better navigation and mobile support."
|
728 |
+
},
|
729 |
+
"relationships": [
|
730 |
+
{"type": "providedBy", "target": "customer2"},
|
731 |
+
{"type": "pertainsTo", "target": "product2"},
|
732 |
+
{"type": "optimizes", "target": "product2"}
|
733 |
+
]
|
734 |
+
},
|
735 |
+
{
|
736 |
+
"id": "lifecycle1",
|
737 |
+
"type": "Lifecycle",
|
738 |
+
"properties": {
|
739 |
+
"currentStage": "Maintenance",
|
740 |
+
"previousStages": ["Development", "Launch"]
|
741 |
+
},
|
742 |
+
"relationships": [
|
743 |
+
{"type": "lifecycleOf", "target": "product1"}
|
744 |
+
]
|
745 |
+
},
|
746 |
+
{
|
747 |
+
"id": "lifecycle2",
|
748 |
+
"type": "Lifecycle",
|
749 |
+
"properties": {
|
750 |
+
"currentStage": "Growth",
|
751 |
+
"previousStages": ["Development", "Launch"]
|
752 |
+
},
|
753 |
+
"relationships": [
|
754 |
+
{"type": "lifecycleOf", "target": "product2"}
|
755 |
+
]
|
756 |
+
},
|
757 |
+
{
|
758 |
+
"id": "platform1",
|
759 |
+
"type": "Platform",
|
760 |
+
"properties": {
|
761 |
+
"name": "Product Management System",
|
762 |
+
"version": "3.0",
|
763 |
+
"capabilities": ["Tracking", "Versioning", "Ownership Management"]
|
764 |
+
},
|
765 |
+
"relationships": [
|
766 |
+
{"type": "supports", "target": "product1"},
|
767 |
+
{"type": "supports", "target": "product2"}
|
768 |
+
]
|
769 |
+
}
|
770 |
+
]
|
771 |
+
}
|
data/enterprise_ontology.txt
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Product is owned by Department.
|
2 |
+
Department is managed by Manager.
|
3 |
+
Employee works on Product.
|
4 |
+
Customer purchases Product and provides Feedback.
|
5 |
+
Platform supports Product tracking, versioning, and ownership.
|
6 |
+
Each Product has an associated Lifecycle.
|
7 |
+
Product Lifecycle includes stages like development, launch, maintenance, and retirement.
|
8 |
+
Manager oversees Employee performance and departmental goals.
|
9 |
+
Feedback includes sentiment, rating, and suggestions.
|
10 |
+
Platform uses AI agents to optimize Product development based on Feedback trends.
|
huggingface.yml
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
title: Ontology RAG Demo
|
2 |
+
colorFrom: indigo
|
3 |
+
colorTo: blue
|
4 |
+
sdk: streamlit
|
5 |
+
sdk_version: 1.44.0
|
6 |
+
app_file: app.py
|
7 |
+
pinned: true
|
8 |
+
license: mit
|
requirements.txt
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
streamlit>=1.44.0
|
2 |
+
openai>=1.2.0
|
3 |
+
langchain>=0.1.13
|
4 |
+
langchain-community>=0.0.21
|
5 |
+
langchain-openai>=0.0.5
|
6 |
+
faiss-cpu>=1.7.4
|
7 |
+
networkx>=3.1
|
8 |
+
pyvis>=0.3.2
|
9 |
+
plotly>=5.15.0
|
10 |
+
pandas>=2.0.0
|
11 |
+
matplotlib>=3.7.1
|
12 |
+
numpy>=1.24.3
|
13 |
+
pygraphviz>=1.10 # May require system dependencies, optional
|
14 |
+
pydantic>=1.10.8
|
src/__init__.py
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
# Package initialization
|
src/knowledge_graph.py
ADDED
@@ -0,0 +1,920 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# src/knowledge_graph.py
|
2 |
+
|
3 |
+
import networkx as nx
|
4 |
+
from pyvis.network import Network
|
5 |
+
import json
|
6 |
+
from typing import Dict, List, Any, Optional, Set, Tuple
|
7 |
+
import matplotlib.pyplot as plt
|
8 |
+
import matplotlib.colors as mcolors
|
9 |
+
from collections import defaultdict
|
10 |
+
|
11 |
+
class KnowledgeGraph:
|
12 |
+
"""
|
13 |
+
Handles the construction and visualization of knowledge graphs
|
14 |
+
based on the ontology data.
|
15 |
+
"""
|
16 |
+
|
17 |
+
def __init__(self, ontology_manager=None):
|
18 |
+
"""
|
19 |
+
Initialize the knowledge graph handler.
|
20 |
+
|
21 |
+
Args:
|
22 |
+
ontology_manager: Optional ontology manager instance
|
23 |
+
"""
|
24 |
+
self.ontology_manager = ontology_manager
|
25 |
+
self.graph = None
|
26 |
+
|
27 |
+
if ontology_manager:
|
28 |
+
self.graph = ontology_manager.graph
|
29 |
+
|
30 |
+
def build_visualization_graph(
|
31 |
+
self,
|
32 |
+
include_classes: bool = True,
|
33 |
+
include_instances: bool = True,
|
34 |
+
central_entity: Optional[str] = None,
|
35 |
+
max_distance: int = 2,
|
36 |
+
include_properties: bool = False
|
37 |
+
) -> nx.Graph:
|
38 |
+
"""
|
39 |
+
Build a simplified graph for visualization purposes.
|
40 |
+
|
41 |
+
Args:
|
42 |
+
include_classes: Whether to include class nodes
|
43 |
+
include_instances: Whether to include instance nodes
|
44 |
+
central_entity: Optional central entity to focus the graph on
|
45 |
+
max_distance: Maximum distance from central entity to include
|
46 |
+
include_properties: Whether to include property nodes
|
47 |
+
|
48 |
+
Returns:
|
49 |
+
A NetworkX graph suitable for visualization
|
50 |
+
"""
|
51 |
+
if not self.graph:
|
52 |
+
return nx.Graph()
|
53 |
+
|
54 |
+
# Create an undirected graph for visualization
|
55 |
+
viz_graph = nx.Graph()
|
56 |
+
|
57 |
+
# If we have a central entity, extract a subgraph around it
|
58 |
+
if central_entity and central_entity in self.graph:
|
59 |
+
# Get nodes within max_distance of central_entity
|
60 |
+
nodes_to_include = set([central_entity])
|
61 |
+
current_distance = 0
|
62 |
+
current_layer = set([central_entity])
|
63 |
+
|
64 |
+
while current_distance < max_distance:
|
65 |
+
next_layer = set()
|
66 |
+
for node in current_layer:
|
67 |
+
# Get neighbors
|
68 |
+
neighbors = set(self.graph.successors(node)).union(set(self.graph.predecessors(node)))
|
69 |
+
next_layer.update(neighbors)
|
70 |
+
|
71 |
+
nodes_to_include.update(next_layer)
|
72 |
+
current_layer = next_layer
|
73 |
+
current_distance += 1
|
74 |
+
|
75 |
+
# Create subgraph
|
76 |
+
subgraph = self.graph.subgraph(nodes_to_include)
|
77 |
+
else:
|
78 |
+
subgraph = self.graph
|
79 |
+
|
80 |
+
# Add nodes to the visualization graph
|
81 |
+
for node, data in subgraph.nodes(data=True):
|
82 |
+
node_type = data.get("type")
|
83 |
+
|
84 |
+
# Skip nodes based on configuration
|
85 |
+
if node_type == "class" and not include_classes:
|
86 |
+
continue
|
87 |
+
if node_type == "instance" and not include_instances:
|
88 |
+
continue
|
89 |
+
|
90 |
+
# Get readable name for the node
|
91 |
+
if node_type == "instance" and "properties" in data:
|
92 |
+
label = data["properties"].get("name", node)
|
93 |
+
else:
|
94 |
+
label = node
|
95 |
+
|
96 |
+
# Set node attributes for visualization
|
97 |
+
viz_attrs = {
|
98 |
+
"id": node,
|
99 |
+
"label": label,
|
100 |
+
"title": self._get_node_tooltip(node, data),
|
101 |
+
"group": data.get("class_type", node_type),
|
102 |
+
"shape": "dot" if node_type == "instance" else "diamond"
|
103 |
+
}
|
104 |
+
|
105 |
+
# Highlight central entity if specified
|
106 |
+
if central_entity and node == central_entity:
|
107 |
+
viz_attrs["color"] = "#ff7f0e" # Orange for central entity
|
108 |
+
viz_attrs["size"] = 25 # Larger size for central entity
|
109 |
+
|
110 |
+
# Add the node
|
111 |
+
viz_graph.add_node(node, **viz_attrs)
|
112 |
+
|
113 |
+
# Add property nodes if configured
|
114 |
+
if include_properties and node_type == "instance" and "properties" in data:
|
115 |
+
for prop_name, prop_value in data["properties"].items():
|
116 |
+
# Create a property node
|
117 |
+
prop_node_id = f"{node}_{prop_name}"
|
118 |
+
prop_value_str = str(prop_value)
|
119 |
+
if len(prop_value_str) > 20:
|
120 |
+
prop_value_str = prop_value_str[:17] + "..."
|
121 |
+
|
122 |
+
viz_graph.add_node(
|
123 |
+
prop_node_id,
|
124 |
+
id=prop_node_id,
|
125 |
+
label=f"{prop_name}: {prop_value_str}",
|
126 |
+
title=f"{prop_name}: {prop_value}",
|
127 |
+
group="property",
|
128 |
+
shape="ellipse",
|
129 |
+
size=5
|
130 |
+
)
|
131 |
+
|
132 |
+
# Connect instance to property
|
133 |
+
viz_graph.add_edge(node, prop_node_id, label="has_property", dashes=True)
|
134 |
+
|
135 |
+
# Add edges to the visualization graph
|
136 |
+
for source, target, data in subgraph.edges(data=True):
|
137 |
+
# Only include edges between nodes that are in the viz_graph
|
138 |
+
if source in viz_graph and target in viz_graph:
|
139 |
+
# Skip property-related edges if we're manually creating them
|
140 |
+
if include_properties and (
|
141 |
+
source.startswith(target + "_") or target.startswith(source + "_")
|
142 |
+
):
|
143 |
+
continue
|
144 |
+
|
145 |
+
# Set edge attributes
|
146 |
+
edge_type = data.get("type", "unknown")
|
147 |
+
|
148 |
+
# Don't show subClassOf and instanceOf relationships if not explicitly requested
|
149 |
+
if edge_type in ["subClassOf", "instanceOf"] and not include_classes:
|
150 |
+
continue
|
151 |
+
|
152 |
+
viz_graph.add_edge(source, target, label=edge_type, title=edge_type)
|
153 |
+
|
154 |
+
return viz_graph
|
155 |
+
|
156 |
+
def _get_node_tooltip(self, node_id: str, data: Dict) -> str:
|
157 |
+
"""Generate a tooltip for a node."""
|
158 |
+
tooltip = f"<strong>ID:</strong> {node_id}<br>"
|
159 |
+
|
160 |
+
node_type = data.get("type")
|
161 |
+
if node_type:
|
162 |
+
tooltip += f"<strong>Type:</strong> {node_type}<br>"
|
163 |
+
|
164 |
+
if node_type == "instance":
|
165 |
+
tooltip += f"<strong>Class:</strong> {data.get('class_type', 'unknown')}<br>"
|
166 |
+
|
167 |
+
# Add properties
|
168 |
+
if "properties" in data:
|
169 |
+
tooltip += "<strong>Properties:</strong><br>"
|
170 |
+
for key, value in data["properties"].items():
|
171 |
+
tooltip += f"- {key}: {value}<br>"
|
172 |
+
|
173 |
+
elif node_type == "class":
|
174 |
+
tooltip += f"<strong>Description:</strong> {data.get('description', '')}<br>"
|
175 |
+
|
176 |
+
# Add properties if available
|
177 |
+
if "properties" in data:
|
178 |
+
tooltip += "<strong>Properties:</strong> " + ", ".join(data["properties"]) + "<br>"
|
179 |
+
|
180 |
+
return tooltip
|
181 |
+
|
182 |
+
def generate_html_visualization(
|
183 |
+
self,
|
184 |
+
include_classes: bool = True,
|
185 |
+
include_instances: bool = True,
|
186 |
+
central_entity: Optional[str] = None,
|
187 |
+
max_distance: int = 2,
|
188 |
+
include_properties: bool = False,
|
189 |
+
height: str = "600px",
|
190 |
+
width: str = "100%",
|
191 |
+
bgcolor: str = "#ffffff",
|
192 |
+
font_color: str = "#000000",
|
193 |
+
layout_algorithm: str = "force-directed"
|
194 |
+
) -> str:
|
195 |
+
"""
|
196 |
+
Generate an HTML visualization of the knowledge graph.
|
197 |
+
|
198 |
+
Args:
|
199 |
+
include_classes: Whether to include class nodes
|
200 |
+
include_instances: Whether to include instance nodes
|
201 |
+
central_entity: Optional central entity to focus the graph on
|
202 |
+
max_distance: Maximum distance from central entity to include
|
203 |
+
include_properties: Whether to include property nodes
|
204 |
+
height: Height of the visualization
|
205 |
+
width: Width of the visualization
|
206 |
+
bgcolor: Background color
|
207 |
+
font_color: Font color
|
208 |
+
layout_algorithm: Algorithm for layout ('force-directed', 'hierarchical', 'radial', 'circular')
|
209 |
+
|
210 |
+
Returns:
|
211 |
+
HTML string containing the visualization
|
212 |
+
"""
|
213 |
+
# Build the visualization graph
|
214 |
+
viz_graph = self.build_visualization_graph(
|
215 |
+
include_classes=include_classes,
|
216 |
+
include_instances=include_instances,
|
217 |
+
central_entity=central_entity,
|
218 |
+
max_distance=max_distance,
|
219 |
+
include_properties=include_properties
|
220 |
+
)
|
221 |
+
|
222 |
+
# Create a PyVis network
|
223 |
+
net = Network(height=height, width=width, bgcolor=bgcolor, font_color=font_color, directed=True)
|
224 |
+
|
225 |
+
# Configure physics based on the selected layout algorithm
|
226 |
+
if layout_algorithm == "force-directed":
|
227 |
+
physics_options = {
|
228 |
+
"enabled": True,
|
229 |
+
"solver": "forceAtlas2Based",
|
230 |
+
"forceAtlas2Based": {
|
231 |
+
"gravitationalConstant": -50,
|
232 |
+
"centralGravity": 0.01,
|
233 |
+
"springLength": 100,
|
234 |
+
"springConstant": 0.08
|
235 |
+
},
|
236 |
+
"stabilization": {
|
237 |
+
"enabled": True,
|
238 |
+
"iterations": 100
|
239 |
+
}
|
240 |
+
}
|
241 |
+
elif layout_algorithm == "hierarchical":
|
242 |
+
physics_options = {
|
243 |
+
"enabled": True,
|
244 |
+
"hierarchicalRepulsion": {
|
245 |
+
"centralGravity": 0.0,
|
246 |
+
"springLength": 100,
|
247 |
+
"springConstant": 0.01,
|
248 |
+
"nodeDistance": 120
|
249 |
+
},
|
250 |
+
"solver": "hierarchicalRepulsion",
|
251 |
+
"stabilization": {
|
252 |
+
"enabled": True,
|
253 |
+
"iterations": 100
|
254 |
+
}
|
255 |
+
}
|
256 |
+
|
257 |
+
# Set hierarchical layout
|
258 |
+
net.set_options("""
|
259 |
+
var options = {
|
260 |
+
"layout": {
|
261 |
+
"hierarchical": {
|
262 |
+
"enabled": true,
|
263 |
+
"direction": "UD",
|
264 |
+
"sortMethod": "directed",
|
265 |
+
"nodeSpacing": 150,
|
266 |
+
"treeSpacing": 200
|
267 |
+
}
|
268 |
+
}
|
269 |
+
}
|
270 |
+
""")
|
271 |
+
elif layout_algorithm == "radial":
|
272 |
+
physics_options = {
|
273 |
+
"enabled": True,
|
274 |
+
"solver": "repulsion",
|
275 |
+
"repulsion": {
|
276 |
+
"nodeDistance": 120,
|
277 |
+
"centralGravity": 0.2,
|
278 |
+
"springLength": 200,
|
279 |
+
"springConstant": 0.05
|
280 |
+
},
|
281 |
+
"stabilization": {
|
282 |
+
"enabled": True,
|
283 |
+
"iterations": 100
|
284 |
+
}
|
285 |
+
}
|
286 |
+
elif layout_algorithm == "circular":
|
287 |
+
physics_options = {
|
288 |
+
"enabled": False
|
289 |
+
}
|
290 |
+
|
291 |
+
# Compute circular layout and set fixed positions
|
292 |
+
pos = nx.circular_layout(viz_graph)
|
293 |
+
for node_id, coords in pos.items():
|
294 |
+
if node_id in viz_graph.nodes:
|
295 |
+
x, y = coords
|
296 |
+
viz_graph.nodes[node_id]['x'] = float(x) * 500
|
297 |
+
viz_graph.nodes[node_id]['y'] = float(y) * 500
|
298 |
+
viz_graph.nodes[node_id]['physics'] = False
|
299 |
+
|
300 |
+
# Configure other options
|
301 |
+
options = {
|
302 |
+
"nodes": {
|
303 |
+
"font": {"size": 12},
|
304 |
+
"scaling": {"min": 10, "max": 30}
|
305 |
+
},
|
306 |
+
"edges": {
|
307 |
+
"color": {"inherit": True},
|
308 |
+
"smooth": {"enabled": True, "type": "dynamic"},
|
309 |
+
"arrows": {"to": {"enabled": True, "scaleFactor": 0.5}},
|
310 |
+
"font": {"size": 10, "align": "middle"}
|
311 |
+
},
|
312 |
+
"physics": physics_options,
|
313 |
+
"interaction": {
|
314 |
+
"hover": True,
|
315 |
+
"navigationButtons": True,
|
316 |
+
"keyboard": True,
|
317 |
+
"tooltipDelay": 100
|
318 |
+
}
|
319 |
+
}
|
320 |
+
|
321 |
+
# Set options and create the network
|
322 |
+
net.options = options
|
323 |
+
net.from_nx(viz_graph)
|
324 |
+
|
325 |
+
# Add custom CSS for better visualization
|
326 |
+
custom_css = """
|
327 |
+
<style>
|
328 |
+
.vis-network {
|
329 |
+
border: 1px solid #ddd;
|
330 |
+
border-radius: 5px;
|
331 |
+
}
|
332 |
+
.vis-tooltip {
|
333 |
+
position: absolute;
|
334 |
+
background-color: #f5f5f5;
|
335 |
+
border: 1px solid #ccc;
|
336 |
+
border-radius: 4px;
|
337 |
+
padding: 10px;
|
338 |
+
font-family: Arial, sans-serif;
|
339 |
+
font-size: 12px;
|
340 |
+
color: #333;
|
341 |
+
max-width: 300px;
|
342 |
+
z-index: 9999;
|
343 |
+
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
|
344 |
+
}
|
345 |
+
</style>
|
346 |
+
"""
|
347 |
+
|
348 |
+
# Generate the HTML and add custom CSS
|
349 |
+
html = net.generate_html()
|
350 |
+
html = html.replace("<style>", custom_css + "<style>")
|
351 |
+
|
352 |
+
# Add legend
|
353 |
+
legend_html = self._generate_legend_html(viz_graph)
|
354 |
+
html = html.replace("</body>", legend_html + "</body>")
|
355 |
+
|
356 |
+
return html
|
357 |
+
|
358 |
+
def _generate_legend_html(self, graph: nx.Graph) -> str:
|
359 |
+
"""Generate a legend for the visualization."""
|
360 |
+
# Collect unique groups
|
361 |
+
groups = set()
|
362 |
+
for _, attrs in graph.nodes(data=True):
|
363 |
+
if "group" in attrs:
|
364 |
+
groups.add(attrs["group"])
|
365 |
+
|
366 |
+
# Generate HTML for legend
|
367 |
+
legend_html = """
|
368 |
+
<div id="graph-legend" style="position: absolute; top: 10px; right: 10px; background-color: rgba(255,255,255,0.8);
|
369 |
+
padding: 10px; border-radius: 5px; border: 1px solid #ddd; max-width: 200px;">
|
370 |
+
<strong>Legend:</strong>
|
371 |
+
<ul style="list-style-type: none; padding-left: 0; margin-top: 5px;">
|
372 |
+
"""
|
373 |
+
|
374 |
+
# Add items for each group
|
375 |
+
for group in sorted(groups):
|
376 |
+
color = "#97c2fc" # Default color
|
377 |
+
if group == "property":
|
378 |
+
color = "#ffcc99"
|
379 |
+
elif group == "class":
|
380 |
+
color = "#a1d3a2"
|
381 |
+
|
382 |
+
legend_html += f"""
|
383 |
+
<li style="margin-bottom: 5px;">
|
384 |
+
<span style="display: inline-block; width: 12px; height: 12px; border-radius: 50%;
|
385 |
+
background-color: {color}; margin-right: 5px;"></span>
|
386 |
+
{group}
|
387 |
+
</li>
|
388 |
+
"""
|
389 |
+
|
390 |
+
# Close the legend container
|
391 |
+
legend_html += """
|
392 |
+
</ul>
|
393 |
+
<div style="font-size: 10px; margin-top: 5px; color: #666;">
|
394 |
+
Double-click to zoom, drag to pan, scroll to zoom in/out
|
395 |
+
</div>
|
396 |
+
</div>
|
397 |
+
"""
|
398 |
+
|
399 |
+
return legend_html
|
400 |
+
|
401 |
+
def get_graph_statistics(self) -> Dict[str, Any]:
|
402 |
+
"""
|
403 |
+
Calculate statistics about the knowledge graph.
|
404 |
+
|
405 |
+
Returns:
|
406 |
+
A dictionary containing graph statistics
|
407 |
+
"""
|
408 |
+
if not self.graph:
|
409 |
+
return {}
|
410 |
+
|
411 |
+
# Count nodes by type
|
412 |
+
class_count = 0
|
413 |
+
instance_count = 0
|
414 |
+
property_count = 0
|
415 |
+
|
416 |
+
for _, data in self.graph.nodes(data=True):
|
417 |
+
node_type = data.get("type")
|
418 |
+
if node_type == "class":
|
419 |
+
class_count += 1
|
420 |
+
elif node_type == "instance":
|
421 |
+
instance_count += 1
|
422 |
+
if "properties" in data:
|
423 |
+
property_count += len(data["properties"])
|
424 |
+
|
425 |
+
# Count edges by type
|
426 |
+
relationship_counts = {}
|
427 |
+
for _, _, data in self.graph.edges(data=True):
|
428 |
+
rel_type = data.get("type", "unknown")
|
429 |
+
relationship_counts[rel_type] = relationship_counts.get(rel_type, 0) + 1
|
430 |
+
|
431 |
+
# Calculate graph metrics
|
432 |
+
try:
|
433 |
+
# Some metrics only work on undirected graphs
|
434 |
+
undirected = nx.Graph(self.graph)
|
435 |
+
avg_degree = sum(dict(undirected.degree()).values()) / undirected.number_of_nodes()
|
436 |
+
|
437 |
+
# Only calculate these if the graph is connected
|
438 |
+
if nx.is_connected(undirected):
|
439 |
+
avg_path_length = nx.average_shortest_path_length(undirected)
|
440 |
+
diameter = nx.diameter(undirected)
|
441 |
+
else:
|
442 |
+
# Get the largest connected component
|
443 |
+
largest_cc = max(nx.connected_components(undirected), key=len)
|
444 |
+
largest_cc_subgraph = undirected.subgraph(largest_cc)
|
445 |
+
|
446 |
+
avg_path_length = nx.average_shortest_path_length(largest_cc_subgraph)
|
447 |
+
diameter = nx.diameter(largest_cc_subgraph)
|
448 |
+
|
449 |
+
# Calculate density
|
450 |
+
density = nx.density(self.graph)
|
451 |
+
|
452 |
+
# Calculate clustering coefficient
|
453 |
+
clustering = nx.average_clustering(undirected)
|
454 |
+
except:
|
455 |
+
avg_degree = 0
|
456 |
+
avg_path_length = 0
|
457 |
+
diameter = 0
|
458 |
+
density = 0
|
459 |
+
clustering = 0
|
460 |
+
|
461 |
+
# Count different entity types
|
462 |
+
class_counts = defaultdict(int)
|
463 |
+
for _, data in self.graph.nodes(data=True):
|
464 |
+
if data.get("type") == "instance":
|
465 |
+
class_type = data.get("class_type", "unknown")
|
466 |
+
class_counts[class_type] += 1
|
467 |
+
|
468 |
+
# Get nodes with highest centrality
|
469 |
+
try:
|
470 |
+
betweenness = nx.betweenness_centrality(self.graph)
|
471 |
+
degree = nx.degree_centrality(self.graph)
|
472 |
+
|
473 |
+
# Get top 5 nodes by betweenness centrality
|
474 |
+
top_betweenness = sorted(betweenness.items(), key=lambda x: x[1], reverse=True)[:5]
|
475 |
+
top_degree = sorted(degree.items(), key=lambda x: x[1], reverse=True)[:5]
|
476 |
+
|
477 |
+
central_nodes = {
|
478 |
+
"betweenness": [{"node": node, "centrality": round(cent, 3)} for node, cent in top_betweenness],
|
479 |
+
"degree": [{"node": node, "centrality": round(cent, 3)} for node, cent in top_degree]
|
480 |
+
}
|
481 |
+
except:
|
482 |
+
central_nodes = {}
|
483 |
+
|
484 |
+
return {
|
485 |
+
"node_count": self.graph.number_of_nodes(),
|
486 |
+
"edge_count": self.graph.number_of_edges(),
|
487 |
+
"class_count": class_count,
|
488 |
+
"instance_count": instance_count,
|
489 |
+
"property_count": property_count,
|
490 |
+
"relationship_counts": relationship_counts,
|
491 |
+
"class_instance_counts": dict(class_counts),
|
492 |
+
"average_degree": avg_degree,
|
493 |
+
"average_path_length": avg_path_length,
|
494 |
+
"diameter": diameter,
|
495 |
+
"density": density,
|
496 |
+
"clustering_coefficient": clustering,
|
497 |
+
"central_nodes": central_nodes
|
498 |
+
}
|
499 |
+
|
500 |
+
def find_paths_between_entities(
|
501 |
+
self,
|
502 |
+
source_entity: str,
|
503 |
+
target_entity: str,
|
504 |
+
max_length: int = 3
|
505 |
+
) -> List[List[Dict]]:
|
506 |
+
"""
|
507 |
+
Find all paths between two entities up to a maximum length.
|
508 |
+
|
509 |
+
Args:
|
510 |
+
source_entity: Starting entity ID
|
511 |
+
target_entity: Target entity ID
|
512 |
+
max_length: Maximum path length
|
513 |
+
|
514 |
+
Returns:
|
515 |
+
A list of paths, where each path is a list of edge dictionaries
|
516 |
+
"""
|
517 |
+
if not self.graph or source_entity not in self.graph or target_entity not in self.graph:
|
518 |
+
return []
|
519 |
+
|
520 |
+
# Use networkx to find simple paths
|
521 |
+
try:
|
522 |
+
simple_paths = list(nx.all_simple_paths(
|
523 |
+
self.graph, source_entity, target_entity, cutoff=max_length
|
524 |
+
))
|
525 |
+
except (nx.NetworkXNoPath, nx.NodeNotFound):
|
526 |
+
return []
|
527 |
+
|
528 |
+
# Convert paths to edge sequences
|
529 |
+
paths = []
|
530 |
+
for path in simple_paths:
|
531 |
+
edge_sequence = []
|
532 |
+
for i in range(len(path) - 1):
|
533 |
+
source = path[i]
|
534 |
+
target = path[i + 1]
|
535 |
+
|
536 |
+
# There may be multiple edges between nodes
|
537 |
+
edges = self.graph.get_edge_data(source, target)
|
538 |
+
if edges:
|
539 |
+
for key, data in edges.items():
|
540 |
+
edge_sequence.append({
|
541 |
+
"source": source,
|
542 |
+
"target": target,
|
543 |
+
"type": data.get("type", "unknown")
|
544 |
+
})
|
545 |
+
|
546 |
+
# Only include the path if it has meaningful relationships
|
547 |
+
# Filter out paths that only contain structural relationships like subClassOf, instanceOf
|
548 |
+
meaningful_relationships = [edge for edge in edge_sequence
|
549 |
+
if edge["type"] not in ["subClassOf", "instanceOf"]]
|
550 |
+
|
551 |
+
if meaningful_relationships:
|
552 |
+
paths.append(edge_sequence)
|
553 |
+
|
554 |
+
# Sort paths by length (shorter paths first)
|
555 |
+
paths.sort(key=len)
|
556 |
+
|
557 |
+
return paths
|
558 |
+
|
559 |
+
def get_entity_neighborhood(
|
560 |
+
self,
|
561 |
+
entity_id: str,
|
562 |
+
max_distance: int = 1,
|
563 |
+
include_classes: bool = True
|
564 |
+
) -> Dict[str, Any]:
|
565 |
+
"""
|
566 |
+
Get the neighborhood of an entity.
|
567 |
+
|
568 |
+
Args:
|
569 |
+
entity_id: The central entity ID
|
570 |
+
max_distance: Maximum distance from the central entity
|
571 |
+
include_classes: Whether to include class relationships
|
572 |
+
|
573 |
+
Returns:
|
574 |
+
A dictionary containing the neighborhood information
|
575 |
+
"""
|
576 |
+
if not self.graph or entity_id not in self.graph:
|
577 |
+
return {}
|
578 |
+
|
579 |
+
# Get nodes within max_distance of entity_id using BFS
|
580 |
+
nodes_at_distance = {0: [entity_id]}
|
581 |
+
visited = set([entity_id])
|
582 |
+
|
583 |
+
for distance in range(1, max_distance + 1):
|
584 |
+
nodes_at_distance[distance] = []
|
585 |
+
|
586 |
+
for node in nodes_at_distance[distance - 1]:
|
587 |
+
# Get neighbors
|
588 |
+
neighbors = list(self.graph.successors(node)) + list(self.graph.predecessors(node))
|
589 |
+
|
590 |
+
for neighbor in neighbors:
|
591 |
+
# Skip class nodes if not including classes
|
592 |
+
neighbor_data = self.graph.nodes.get(neighbor, {})
|
593 |
+
if not include_classes and neighbor_data.get("type") == "class":
|
594 |
+
continue
|
595 |
+
|
596 |
+
if neighbor not in visited:
|
597 |
+
nodes_at_distance[distance].append(neighbor)
|
598 |
+
visited.add(neighbor)
|
599 |
+
|
600 |
+
# Flatten the nodes
|
601 |
+
all_nodes = [node for nodes in nodes_at_distance.values() for node in nodes]
|
602 |
+
|
603 |
+
# Extract the subgraph
|
604 |
+
subgraph = self.graph.subgraph(all_nodes)
|
605 |
+
|
606 |
+
# Build neighbor information
|
607 |
+
neighbors = []
|
608 |
+
for node in all_nodes:
|
609 |
+
if node == entity_id:
|
610 |
+
continue
|
611 |
+
|
612 |
+
node_data = self.graph.nodes[node]
|
613 |
+
|
614 |
+
# Determine the relations to central entity
|
615 |
+
relations = []
|
616 |
+
|
617 |
+
# Check direct relationships
|
618 |
+
# Check if central entity is source
|
619 |
+
edges_out = self.graph.get_edge_data(entity_id, node)
|
620 |
+
if edges_out:
|
621 |
+
for key, data in edges_out.items():
|
622 |
+
rel_type = data.get("type", "unknown")
|
623 |
+
|
624 |
+
# Skip structural relationships if not including classes
|
625 |
+
if not include_classes and rel_type in ["subClassOf", "instanceOf"]:
|
626 |
+
continue
|
627 |
+
|
628 |
+
relations.append({
|
629 |
+
"type": rel_type,
|
630 |
+
"direction": "outgoing"
|
631 |
+
})
|
632 |
+
|
633 |
+
# Check if central entity is target
|
634 |
+
edges_in = self.graph.get_edge_data(node, entity_id)
|
635 |
+
if edges_in:
|
636 |
+
for key, data in edges_in.items():
|
637 |
+
rel_type = data.get("type", "unknown")
|
638 |
+
|
639 |
+
# Skip structural relationships if not including classes
|
640 |
+
if not include_classes and rel_type in ["subClassOf", "instanceOf"]:
|
641 |
+
continue
|
642 |
+
|
643 |
+
relations.append({
|
644 |
+
"type": rel_type,
|
645 |
+
"direction": "incoming"
|
646 |
+
})
|
647 |
+
|
648 |
+
# Also find paths through intermediate nodes (indirect relationships)
|
649 |
+
if not relations: # Only look for indirect if no direct relationships
|
650 |
+
for path_length in range(2, max_distance + 1):
|
651 |
+
try:
|
652 |
+
# Find paths of exactly length path_length
|
653 |
+
paths = list(nx.all_simple_paths(
|
654 |
+
self.graph, entity_id, node, cutoff=path_length, min_edges=path_length
|
655 |
+
))
|
656 |
+
|
657 |
+
for path in paths:
|
658 |
+
if len(path) > 1: # Path should have at least 2 nodes
|
659 |
+
intermediate_nodes = path[1:-1] # Skip source and target
|
660 |
+
|
661 |
+
# Format the path as a relation
|
662 |
+
path_relation = {
|
663 |
+
"type": "indirect_connection",
|
664 |
+
"direction": "outgoing",
|
665 |
+
"path_length": len(path) - 1,
|
666 |
+
"intermediates": intermediate_nodes
|
667 |
+
}
|
668 |
+
|
669 |
+
relations.append(path_relation)
|
670 |
+
|
671 |
+
# Only need one example of an indirect path
|
672 |
+
break
|
673 |
+
except (nx.NetworkXNoPath, nx.NodeNotFound):
|
674 |
+
pass
|
675 |
+
|
676 |
+
# Only include neighbors with relations
|
677 |
+
if relations:
|
678 |
+
neighbors.append({
|
679 |
+
"id": node,
|
680 |
+
"type": node_data.get("type"),
|
681 |
+
"class_type": node_data.get("class_type"),
|
682 |
+
"properties": node_data.get("properties", {}),
|
683 |
+
"relations": relations,
|
684 |
+
"distance": next(dist for dist, nodes in nodes_at_distance.items() if node in nodes)
|
685 |
+
})
|
686 |
+
|
687 |
+
# Group neighbors by distance
|
688 |
+
neighbors_by_distance = defaultdict(list)
|
689 |
+
for neighbor in neighbors:
|
690 |
+
neighbors_by_distance[neighbor["distance"]].append(neighbor)
|
691 |
+
|
692 |
+
# Get central entity info
|
693 |
+
central_data = self.graph.nodes[entity_id]
|
694 |
+
|
695 |
+
return {
|
696 |
+
"central_entity": {
|
697 |
+
"id": entity_id,
|
698 |
+
"type": central_data.get("type"),
|
699 |
+
"class_type": central_data.get("class_type", ""),
|
700 |
+
"properties": central_data.get("properties", {})
|
701 |
+
},
|
702 |
+
"neighbors": neighbors,
|
703 |
+
"neighbors_by_distance": dict(neighbors_by_distance),
|
704 |
+
"total_neighbors": len(neighbors)
|
705 |
+
}
|
706 |
+
|
707 |
+
def find_common_patterns(self) -> List[Dict[str, Any]]:
|
708 |
+
"""
|
709 |
+
Find common patterns and structures in the knowledge graph.
|
710 |
+
|
711 |
+
Returns:
|
712 |
+
A list of pattern dictionaries
|
713 |
+
"""
|
714 |
+
if not self.graph:
|
715 |
+
return []
|
716 |
+
|
717 |
+
patterns = []
|
718 |
+
|
719 |
+
# Find common relationship patterns
|
720 |
+
relationship_patterns = self._find_relationship_patterns()
|
721 |
+
if relationship_patterns:
|
722 |
+
patterns.extend(relationship_patterns)
|
723 |
+
|
724 |
+
# Find hub entities (entities with many connections)
|
725 |
+
hub_entities = self._find_hub_entities()
|
726 |
+
if hub_entities:
|
727 |
+
patterns.append({
|
728 |
+
"type": "hub_entities",
|
729 |
+
"description": "Entities with high connectivity serving as knowledge hubs",
|
730 |
+
"entities": hub_entities
|
731 |
+
})
|
732 |
+
|
733 |
+
# Find common property patterns
|
734 |
+
property_patterns = self._find_property_patterns()
|
735 |
+
if property_patterns:
|
736 |
+
patterns.extend(property_patterns)
|
737 |
+
|
738 |
+
return patterns
|
739 |
+
|
740 |
+
def _find_relationship_patterns(self) -> List[Dict[str, Any]]:
|
741 |
+
"""Find common relationship patterns in the graph."""
|
742 |
+
# Count relationship triplets (source_type, relation, target_type)
|
743 |
+
triplet_counts = defaultdict(int)
|
744 |
+
|
745 |
+
for source, target, data in self.graph.edges(data=True):
|
746 |
+
rel_type = data.get("type", "unknown")
|
747 |
+
|
748 |
+
# Skip structural relationships
|
749 |
+
if rel_type in ["subClassOf", "instanceOf"]:
|
750 |
+
continue
|
751 |
+
|
752 |
+
# Get node types
|
753 |
+
source_data = self.graph.nodes[source]
|
754 |
+
target_data = self.graph.nodes[target]
|
755 |
+
|
756 |
+
source_type = (
|
757 |
+
source_data.get("class_type")
|
758 |
+
if source_data.get("type") == "instance"
|
759 |
+
else source_data.get("type")
|
760 |
+
)
|
761 |
+
|
762 |
+
target_type = (
|
763 |
+
target_data.get("class_type")
|
764 |
+
if target_data.get("type") == "instance"
|
765 |
+
else target_data.get("type")
|
766 |
+
)
|
767 |
+
|
768 |
+
if source_type and target_type:
|
769 |
+
triplet = (source_type, rel_type, target_type)
|
770 |
+
triplet_counts[triplet] += 1
|
771 |
+
|
772 |
+
# Get patterns with significant frequency (more than 1 occurrence)
|
773 |
+
patterns = []
|
774 |
+
for triplet, count in triplet_counts.items():
|
775 |
+
if count > 1:
|
776 |
+
source_type, rel_type, target_type = triplet
|
777 |
+
|
778 |
+
# Find examples of this pattern
|
779 |
+
examples = []
|
780 |
+
for source, target, data in self.graph.edges(data=True):
|
781 |
+
if len(examples) >= 3: # Limit to 3 examples
|
782 |
+
break
|
783 |
+
|
784 |
+
rel = data.get("type", "unknown")
|
785 |
+
if rel != rel_type:
|
786 |
+
continue
|
787 |
+
|
788 |
+
source_data = self.graph.nodes[source]
|
789 |
+
target_data = self.graph.nodes[target]
|
790 |
+
|
791 |
+
current_source_type = (
|
792 |
+
source_data.get("class_type")
|
793 |
+
if source_data.get("type") == "instance"
|
794 |
+
else source_data.get("type")
|
795 |
+
)
|
796 |
+
|
797 |
+
current_target_type = (
|
798 |
+
target_data.get("class_type")
|
799 |
+
if target_data.get("type") == "instance"
|
800 |
+
else target_data.get("type")
|
801 |
+
)
|
802 |
+
|
803 |
+
if current_source_type == source_type and current_target_type == target_type:
|
804 |
+
# Get readable names if available
|
805 |
+
source_name = source
|
806 |
+
if source_data.get("type") == "instance" and "properties" in source_data:
|
807 |
+
properties = source_data["properties"]
|
808 |
+
if "name" in properties:
|
809 |
+
source_name = properties["name"]
|
810 |
+
|
811 |
+
target_name = target
|
812 |
+
if target_data.get("type") == "instance" and "properties" in target_data:
|
813 |
+
properties = target_data["properties"]
|
814 |
+
if "name" in properties:
|
815 |
+
target_name = properties["name"]
|
816 |
+
|
817 |
+
examples.append({
|
818 |
+
"source": source,
|
819 |
+
"source_name": source_name,
|
820 |
+
"target": target,
|
821 |
+
"target_name": target_name,
|
822 |
+
"relationship": rel_type
|
823 |
+
})
|
824 |
+
|
825 |
+
patterns.append({
|
826 |
+
"type": "relationship_pattern",
|
827 |
+
"description": f"{source_type} {rel_type} {target_type}",
|
828 |
+
"source_type": source_type,
|
829 |
+
"relationship": rel_type,
|
830 |
+
"target_type": target_type,
|
831 |
+
"count": count,
|
832 |
+
"examples": examples
|
833 |
+
})
|
834 |
+
|
835 |
+
# Sort by frequency
|
836 |
+
patterns.sort(key=lambda x: x["count"], reverse=True)
|
837 |
+
|
838 |
+
return patterns
|
839 |
+
|
840 |
+
def _find_hub_entities(self) -> List[Dict[str, Any]]:
|
841 |
+
"""Find entities that serve as hubs (many connections)."""
|
842 |
+
# Calculate degree centrality
|
843 |
+
degree = nx.degree_centrality(self.graph)
|
844 |
+
|
845 |
+
# Get top entities by degree
|
846 |
+
top_entities = sorted(degree.items(), key=lambda x: x[1], reverse=True)[:10]
|
847 |
+
|
848 |
+
hub_entities = []
|
849 |
+
for node, centrality in top_entities:
|
850 |
+
node_data = self.graph.nodes[node]
|
851 |
+
node_type = node_data.get("type")
|
852 |
+
|
853 |
+
# Only consider instance nodes
|
854 |
+
if node_type == "instance":
|
855 |
+
# Get class type
|
856 |
+
class_type = node_data.get("class_type", "unknown")
|
857 |
+
|
858 |
+
# Get name if available
|
859 |
+
name = node
|
860 |
+
if "properties" in node_data and "name" in node_data["properties"]:
|
861 |
+
name = node_data["properties"]["name"]
|
862 |
+
|
863 |
+
# Count relationships by type
|
864 |
+
relationships = defaultdict(int)
|
865 |
+
for _, _, data in self.graph.edges(data=True, nbunch=[node]):
|
866 |
+
rel_type = data.get("type", "unknown")
|
867 |
+
if rel_type not in ["subClassOf", "instanceOf"]:
|
868 |
+
relationships[rel_type] += 1
|
869 |
+
|
870 |
+
hub_entities.append({
|
871 |
+
"id": node,
|
872 |
+
"name": name,
|
873 |
+
"type": class_type,
|
874 |
+
"centrality": centrality,
|
875 |
+
"relationships": dict(relationships),
|
876 |
+
"total_connections": sum(relationships.values())
|
877 |
+
})
|
878 |
+
|
879 |
+
# Sort by total connections
|
880 |
+
hub_entities.sort(key=lambda x: x["total_connections"], reverse=True)
|
881 |
+
|
882 |
+
return hub_entities
|
883 |
+
|
884 |
+
def _find_property_patterns(self) -> List[Dict[str, Any]]:
|
885 |
+
"""Find common property patterns in instance data."""
|
886 |
+
# Track properties by class type
|
887 |
+
properties_by_class = defaultdict(lambda: defaultdict(int))
|
888 |
+
|
889 |
+
for node, data in self.graph.nodes(data=True):
|
890 |
+
if data.get("type") == "instance":
|
891 |
+
class_type = data.get("class_type", "unknown")
|
892 |
+
|
893 |
+
if "properties" in data:
|
894 |
+
for prop in data["properties"].keys():
|
895 |
+
properties_by_class[class_type][prop] += 1
|
896 |
+
|
897 |
+
# Find common property combinations
|
898 |
+
patterns = []
|
899 |
+
for class_type, props in properties_by_class.items():
|
900 |
+
# Sort properties by frequency
|
901 |
+
sorted_props = sorted(props.items(), key=lambda x: x[1], reverse=True)
|
902 |
+
|
903 |
+
# Only include classes with multiple instances
|
904 |
+
class_instances = sum(1 for _, data in self.graph.nodes(data=True)
|
905 |
+
if data.get("type") == "instance" and data.get("class_type") == class_type)
|
906 |
+
|
907 |
+
if class_instances > 1:
|
908 |
+
common_props = [prop for prop, count in sorted_props if count > 1]
|
909 |
+
|
910 |
+
if common_props:
|
911 |
+
patterns.append({
|
912 |
+
"type": "property_pattern",
|
913 |
+
"description": f"Common properties for {class_type} instances",
|
914 |
+
"class_type": class_type,
|
915 |
+
"instance_count": class_instances,
|
916 |
+
"common_properties": common_props,
|
917 |
+
"property_frequencies": {prop: count for prop, count in sorted_props}
|
918 |
+
})
|
919 |
+
|
920 |
+
return patterns
|
src/ontology_manager.py
ADDED
@@ -0,0 +1,440 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# src/ontology_manager.py
|
2 |
+
|
3 |
+
import json
|
4 |
+
import networkx as nx
|
5 |
+
from typing import Dict, List, Any, Optional, Union, Set
|
6 |
+
|
7 |
+
class OntologyManager:
|
8 |
+
"""
|
9 |
+
Manages the ontology model and provides methods for querying and navigating
|
10 |
+
the ontological structure.
|
11 |
+
"""
|
12 |
+
|
13 |
+
def __init__(self, ontology_path: str):
|
14 |
+
"""
|
15 |
+
Initialize the ontology manager with a path to the ontology JSON file.
|
16 |
+
|
17 |
+
Args:
|
18 |
+
ontology_path: Path to the JSON file containing the ontology model
|
19 |
+
"""
|
20 |
+
self.ontology_path = ontology_path
|
21 |
+
self.ontology_data = self._load_ontology()
|
22 |
+
self.graph = self._build_graph()
|
23 |
+
|
24 |
+
def _load_ontology(self) -> Dict:
|
25 |
+
"""Load the ontology from the JSON file."""
|
26 |
+
with open(self.ontology_path, 'r') as f:
|
27 |
+
return json.load(f)
|
28 |
+
|
29 |
+
def _build_graph(self) -> nx.MultiDiGraph:
|
30 |
+
"""Construct a directed graph from the ontology data."""
|
31 |
+
G = nx.MultiDiGraph()
|
32 |
+
|
33 |
+
# Add class nodes
|
34 |
+
for class_id, class_data in self.ontology_data["classes"].items():
|
35 |
+
G.add_node(class_id,
|
36 |
+
type="class",
|
37 |
+
description=class_data.get("description", ""),
|
38 |
+
properties=class_data.get("properties", []))
|
39 |
+
|
40 |
+
# Add subclass relationships
|
41 |
+
if "subClassOf" in class_data:
|
42 |
+
G.add_edge(class_id, class_data["subClassOf"],
|
43 |
+
type="subClassOf")
|
44 |
+
|
45 |
+
# Add relationship type information
|
46 |
+
self.relationship_info = {r["name"]: r for r in self.ontology_data["relationships"]}
|
47 |
+
|
48 |
+
# Add instance nodes and their relationships
|
49 |
+
for instance in self.ontology_data["instances"]:
|
50 |
+
G.add_node(instance["id"],
|
51 |
+
type="instance",
|
52 |
+
class_type=instance["type"],
|
53 |
+
properties=instance.get("properties", {}))
|
54 |
+
|
55 |
+
# Add instance-of-class relationship
|
56 |
+
G.add_edge(instance["id"], instance["type"], type="instanceOf")
|
57 |
+
|
58 |
+
# Add relationships between instances
|
59 |
+
for rel in instance.get("relationships", []):
|
60 |
+
G.add_edge(instance["id"], rel["target"],
|
61 |
+
type=rel["type"])
|
62 |
+
|
63 |
+
return G
|
64 |
+
|
65 |
+
def get_classes(self) -> List[str]:
|
66 |
+
"""Return a list of all class names in the ontology."""
|
67 |
+
return list(self.ontology_data["classes"].keys())
|
68 |
+
|
69 |
+
def get_class_hierarchy(self) -> Dict[str, List[str]]:
|
70 |
+
"""Return a dictionary mapping each class to its subclasses."""
|
71 |
+
hierarchy = {}
|
72 |
+
for class_id in self.get_classes():
|
73 |
+
hierarchy[class_id] = []
|
74 |
+
|
75 |
+
for class_id, class_data in self.ontology_data["classes"].items():
|
76 |
+
if "subClassOf" in class_data:
|
77 |
+
parent = class_data["subClassOf"]
|
78 |
+
if parent in hierarchy:
|
79 |
+
hierarchy[parent].append(class_id)
|
80 |
+
|
81 |
+
return hierarchy
|
82 |
+
|
83 |
+
def get_instances_of_class(self, class_name: str, include_subclasses: bool = True) -> List[str]:
|
84 |
+
"""
|
85 |
+
Get all instances of a given class.
|
86 |
+
|
87 |
+
Args:
|
88 |
+
class_name: The name of the class
|
89 |
+
include_subclasses: Whether to include instances of subclasses
|
90 |
+
|
91 |
+
Returns:
|
92 |
+
A list of instance IDs
|
93 |
+
"""
|
94 |
+
if include_subclasses:
|
95 |
+
# Get all subclasses recursively
|
96 |
+
subclasses = set(self._get_all_subclasses(class_name))
|
97 |
+
subclasses.add(class_name)
|
98 |
+
|
99 |
+
# Get instances of all classes
|
100 |
+
instances = []
|
101 |
+
for class_id in subclasses:
|
102 |
+
instances.extend([
|
103 |
+
n for n, attr in self.graph.nodes(data=True)
|
104 |
+
if attr.get("type") == "instance" and attr.get("class_type") == class_id
|
105 |
+
])
|
106 |
+
return instances
|
107 |
+
else:
|
108 |
+
# Just get direct instances
|
109 |
+
return [
|
110 |
+
n for n, attr in self.graph.nodes(data=True)
|
111 |
+
if attr.get("type") == "instance" and attr.get("class_type") == class_name
|
112 |
+
]
|
113 |
+
|
114 |
+
def _get_all_subclasses(self, class_name: str) -> List[str]:
|
115 |
+
"""Recursively get all subclasses of a given class."""
|
116 |
+
subclasses = []
|
117 |
+
direct_subclasses = [
|
118 |
+
src for src, dst, data in self.graph.edges(data=True)
|
119 |
+
if dst == class_name and data.get("type") == "subClassOf"
|
120 |
+
]
|
121 |
+
|
122 |
+
for subclass in direct_subclasses:
|
123 |
+
subclasses.append(subclass)
|
124 |
+
subclasses.extend(self._get_all_subclasses(subclass))
|
125 |
+
|
126 |
+
return subclasses
|
127 |
+
|
128 |
+
def get_relationships(self, entity_id: str, relationship_type: Optional[str] = None) -> List[Dict]:
|
129 |
+
"""
|
130 |
+
Get all relationships for a given entity, optionally filtered by type.
|
131 |
+
|
132 |
+
Args:
|
133 |
+
entity_id: The ID of the entity
|
134 |
+
relationship_type: Optional relationship type to filter by
|
135 |
+
|
136 |
+
Returns:
|
137 |
+
A list of dictionaries containing relationship information
|
138 |
+
"""
|
139 |
+
relationships = []
|
140 |
+
|
141 |
+
# Look at outgoing edges
|
142 |
+
for _, target, data in self.graph.out_edges(entity_id, data=True):
|
143 |
+
rel_type = data.get("type")
|
144 |
+
if rel_type != "instanceOf" and rel_type != "subClassOf":
|
145 |
+
if relationship_type is None or rel_type == relationship_type:
|
146 |
+
relationships.append({
|
147 |
+
"type": rel_type,
|
148 |
+
"target": target,
|
149 |
+
"direction": "outgoing"
|
150 |
+
})
|
151 |
+
|
152 |
+
# Look at incoming edges
|
153 |
+
for source, _, data in self.graph.in_edges(entity_id, data=True):
|
154 |
+
rel_type = data.get("type")
|
155 |
+
if rel_type != "instanceOf" and rel_type != "subClassOf":
|
156 |
+
if relationship_type is None or rel_type == relationship_type:
|
157 |
+
relationships.append({
|
158 |
+
"type": rel_type,
|
159 |
+
"source": source,
|
160 |
+
"direction": "incoming"
|
161 |
+
})
|
162 |
+
|
163 |
+
return relationships
|
164 |
+
|
165 |
+
def find_paths(self, source_id: str, target_id: str, max_length: int = 3) -> List[List[Dict]]:
|
166 |
+
"""
|
167 |
+
Find all paths between two entities up to a maximum length.
|
168 |
+
|
169 |
+
Args:
|
170 |
+
source_id: Starting entity ID
|
171 |
+
target_id: Target entity ID
|
172 |
+
max_length: Maximum path length
|
173 |
+
|
174 |
+
Returns:
|
175 |
+
A list of paths, where each path is a list of relationship dictionaries
|
176 |
+
"""
|
177 |
+
paths = []
|
178 |
+
|
179 |
+
# Use networkx to find simple paths
|
180 |
+
simple_paths = nx.all_simple_paths(self.graph, source_id, target_id, cutoff=max_length)
|
181 |
+
|
182 |
+
for path in simple_paths:
|
183 |
+
path_with_edges = []
|
184 |
+
for i in range(len(path) - 1):
|
185 |
+
source = path[i]
|
186 |
+
target = path[i + 1]
|
187 |
+
# There may be multiple edges between nodes
|
188 |
+
edges = self.graph.get_edge_data(source, target)
|
189 |
+
if edges:
|
190 |
+
for key, data in edges.items():
|
191 |
+
path_with_edges.append({
|
192 |
+
"source": source,
|
193 |
+
"target": target,
|
194 |
+
"type": data.get("type", "unknown")
|
195 |
+
})
|
196 |
+
paths.append(path_with_edges)
|
197 |
+
|
198 |
+
return paths
|
199 |
+
|
200 |
+
def get_entity_info(self, entity_id: str) -> Dict:
|
201 |
+
"""
|
202 |
+
Get detailed information about an entity.
|
203 |
+
|
204 |
+
Args:
|
205 |
+
entity_id: The ID of the entity
|
206 |
+
|
207 |
+
Returns:
|
208 |
+
A dictionary with entity information
|
209 |
+
"""
|
210 |
+
if entity_id not in self.graph:
|
211 |
+
return {}
|
212 |
+
|
213 |
+
node_data = self.graph.nodes[entity_id]
|
214 |
+
entity_type = node_data.get("type")
|
215 |
+
|
216 |
+
if entity_type == "instance":
|
217 |
+
# Get class information
|
218 |
+
class_type = node_data.get("class_type")
|
219 |
+
class_info = self.ontology_data["classes"].get(class_type, {})
|
220 |
+
|
221 |
+
return {
|
222 |
+
"id": entity_id,
|
223 |
+
"type": entity_type,
|
224 |
+
"class": class_type,
|
225 |
+
"class_description": class_info.get("description", ""),
|
226 |
+
"properties": node_data.get("properties", {}),
|
227 |
+
"relationships": self.get_relationships(entity_id)
|
228 |
+
}
|
229 |
+
elif entity_type == "class":
|
230 |
+
return {
|
231 |
+
"id": entity_id,
|
232 |
+
"type": entity_type,
|
233 |
+
"description": node_data.get("description", ""),
|
234 |
+
"properties": node_data.get("properties", []),
|
235 |
+
"subclasses": self._get_all_subclasses(entity_id),
|
236 |
+
"instances": self.get_instances_of_class(entity_id)
|
237 |
+
}
|
238 |
+
|
239 |
+
return node_data
|
240 |
+
|
241 |
+
def get_text_representation(self) -> str:
|
242 |
+
"""
|
243 |
+
Generate a text representation of the ontology for embedding.
|
244 |
+
|
245 |
+
Returns:
|
246 |
+
A string containing the textual representation of the ontology
|
247 |
+
"""
|
248 |
+
text_chunks = []
|
249 |
+
|
250 |
+
# Class definitions
|
251 |
+
for class_id, class_data in self.ontology_data["classes"].items():
|
252 |
+
chunk = f"Class: {class_id}\n"
|
253 |
+
chunk += f"Description: {class_data.get('description', '')}\n"
|
254 |
+
|
255 |
+
if "subClassOf" in class_data:
|
256 |
+
chunk += f"{class_id} is a subclass of {class_data['subClassOf']}.\n"
|
257 |
+
|
258 |
+
if "properties" in class_data:
|
259 |
+
chunk += f"{class_id} has properties: {', '.join(class_data['properties'])}.\n"
|
260 |
+
|
261 |
+
text_chunks.append(chunk)
|
262 |
+
|
263 |
+
# Relationship definitions
|
264 |
+
for rel in self.ontology_data["relationships"]:
|
265 |
+
chunk = f"Relationship: {rel['name']}\n"
|
266 |
+
chunk += f"Domain: {rel['domain']}, Range: {rel['range']}\n"
|
267 |
+
chunk += f"Description: {rel.get('description', '')}\n"
|
268 |
+
chunk += f"Cardinality: {rel.get('cardinality', 'many-to-many')}\n"
|
269 |
+
|
270 |
+
if "inverse" in rel:
|
271 |
+
chunk += f"The inverse relationship is {rel['inverse']}.\n"
|
272 |
+
|
273 |
+
text_chunks.append(chunk)
|
274 |
+
|
275 |
+
# Rules
|
276 |
+
for rule in self.ontology_data.get("rules", []):
|
277 |
+
chunk = f"Rule: {rule.get('id', '')}\n"
|
278 |
+
chunk += f"Description: {rule.get('description', '')}\n"
|
279 |
+
text_chunks.append(chunk)
|
280 |
+
|
281 |
+
# Instance data
|
282 |
+
for instance in self.ontology_data["instances"]:
|
283 |
+
chunk = f"Instance: {instance['id']}\n"
|
284 |
+
chunk += f"Type: {instance['type']}\n"
|
285 |
+
|
286 |
+
# Properties
|
287 |
+
if "properties" in instance:
|
288 |
+
props = []
|
289 |
+
for key, value in instance["properties"].items():
|
290 |
+
if isinstance(value, list):
|
291 |
+
props.append(f"{key}: {', '.join(str(v) for v in value)}")
|
292 |
+
else:
|
293 |
+
props.append(f"{key}: {value}")
|
294 |
+
|
295 |
+
if props:
|
296 |
+
chunk += "Properties:\n- " + "\n- ".join(props) + "\n"
|
297 |
+
|
298 |
+
# Relationships
|
299 |
+
if "relationships" in instance:
|
300 |
+
rels = []
|
301 |
+
for rel in instance["relationships"]:
|
302 |
+
rels.append(f"{rel['type']} {rel['target']}")
|
303 |
+
|
304 |
+
if rels:
|
305 |
+
chunk += "Relationships:\n- " + "\n- ".join(rels) + "\n"
|
306 |
+
|
307 |
+
text_chunks.append(chunk)
|
308 |
+
|
309 |
+
return "\n\n".join(text_chunks)
|
310 |
+
|
311 |
+
def query_by_relationship(self, source_type: str, relationship: str, target_type: str) -> List[Dict]:
|
312 |
+
"""
|
313 |
+
Query for instances connected by a specific relationship.
|
314 |
+
|
315 |
+
Args:
|
316 |
+
source_type: Type of the source entity
|
317 |
+
relationship: Type of relationship
|
318 |
+
target_type: Type of the target entity
|
319 |
+
|
320 |
+
Returns:
|
321 |
+
A list of matching relationship dictionaries
|
322 |
+
"""
|
323 |
+
results = []
|
324 |
+
|
325 |
+
# Get all instances of the source type
|
326 |
+
source_instances = self.get_instances_of_class(source_type)
|
327 |
+
|
328 |
+
for source_id in source_instances:
|
329 |
+
# Get relationships of the specified type
|
330 |
+
relationships = self.get_relationships(source_id, relationship)
|
331 |
+
|
332 |
+
for rel in relationships:
|
333 |
+
if rel["direction"] == "outgoing" and "target" in rel:
|
334 |
+
target_id = rel["target"]
|
335 |
+
target_data = self.graph.nodes[target_id]
|
336 |
+
|
337 |
+
# Check if the target is of the right type
|
338 |
+
if (target_data.get("type") == "instance" and
|
339 |
+
target_data.get("class_type") == target_type):
|
340 |
+
results.append({
|
341 |
+
"source": source_id,
|
342 |
+
"source_properties": self.graph.nodes[source_id].get("properties", {}),
|
343 |
+
"relationship": relationship,
|
344 |
+
"target": target_id,
|
345 |
+
"target_properties": target_data.get("properties", {})
|
346 |
+
})
|
347 |
+
|
348 |
+
return results
|
349 |
+
|
350 |
+
def get_semantic_context(self, query: str) -> List[str]:
|
351 |
+
"""
|
352 |
+
Retrieve relevant semantic context from the ontology based on a query.
|
353 |
+
|
354 |
+
This method identifies entities and relationships mentioned in the query
|
355 |
+
and returns contextual information about them from the ontology.
|
356 |
+
|
357 |
+
Args:
|
358 |
+
query: The query string to analyze
|
359 |
+
|
360 |
+
Returns:
|
361 |
+
A list of text chunks providing relevant ontological context
|
362 |
+
"""
|
363 |
+
# This is a simple implementation - a more sophisticated one would use
|
364 |
+
# entity recognition and semantic parsing
|
365 |
+
|
366 |
+
query_lower = query.lower()
|
367 |
+
context_chunks = []
|
368 |
+
|
369 |
+
# Check for class mentions
|
370 |
+
for class_id in self.get_classes():
|
371 |
+
if class_id.lower() in query_lower:
|
372 |
+
# Add class information
|
373 |
+
class_data = self.ontology_data["classes"][class_id]
|
374 |
+
chunk = f"Class {class_id}: {class_data.get('description', '')}\n"
|
375 |
+
|
376 |
+
# Add subclass information
|
377 |
+
if "subClassOf" in class_data:
|
378 |
+
parent = class_data["subClassOf"]
|
379 |
+
chunk += f"{class_id} is a subclass of {parent}.\n"
|
380 |
+
|
381 |
+
# Add property information
|
382 |
+
if "properties" in class_data:
|
383 |
+
chunk += f"{class_id} has properties: {', '.join(class_data['properties'])}.\n"
|
384 |
+
|
385 |
+
context_chunks.append(chunk)
|
386 |
+
|
387 |
+
# Also add some instance examples
|
388 |
+
instances = self.get_instances_of_class(class_id, include_subclasses=False)[:3]
|
389 |
+
if instances:
|
390 |
+
instance_chunk = f"Examples of {class_id}:\n"
|
391 |
+
for inst_id in instances:
|
392 |
+
props = self.graph.nodes[inst_id].get("properties", {})
|
393 |
+
if "name" in props:
|
394 |
+
instance_chunk += f"- {inst_id} ({props['name']})\n"
|
395 |
+
else:
|
396 |
+
instance_chunk += f"- {inst_id}\n"
|
397 |
+
context_chunks.append(instance_chunk)
|
398 |
+
|
399 |
+
# Check for relationship mentions
|
400 |
+
for rel in self.ontology_data["relationships"]:
|
401 |
+
if rel["name"].lower() in query_lower:
|
402 |
+
chunk = f"Relationship {rel['name']}: {rel.get('description', '')}\n"
|
403 |
+
chunk += f"This relationship connects {rel['domain']} to {rel['range']}.\n"
|
404 |
+
|
405 |
+
# Add examples
|
406 |
+
examples = self.query_by_relationship(rel['domain'], rel['name'], rel['range'])[:3]
|
407 |
+
if examples:
|
408 |
+
chunk += "Examples:\n"
|
409 |
+
for ex in examples:
|
410 |
+
source_props = ex["source_properties"]
|
411 |
+
target_props = ex["target_properties"]
|
412 |
+
|
413 |
+
source_name = source_props.get("name", ex["source"])
|
414 |
+
target_name = target_props.get("name", ex["target"])
|
415 |
+
|
416 |
+
chunk += f"- {source_name} {rel['name']} {target_name}\n"
|
417 |
+
|
418 |
+
context_chunks.append(chunk)
|
419 |
+
|
420 |
+
# If we found nothing specific, add general ontology info
|
421 |
+
if not context_chunks:
|
422 |
+
# Add information about top-level classes
|
423 |
+
top_classes = [c for c, data in self.ontology_data["classes"].items()
|
424 |
+
if "subClassOf" not in data or data["subClassOf"] == "Entity"]
|
425 |
+
|
426 |
+
if top_classes:
|
427 |
+
chunk = "Main classes in the ontology:\n"
|
428 |
+
for cls in top_classes:
|
429 |
+
desc = self.ontology_data["classes"][cls].get("description", "")
|
430 |
+
chunk += f"- {cls}: {desc}\n"
|
431 |
+
context_chunks.append(chunk)
|
432 |
+
|
433 |
+
# Add information about key relationships
|
434 |
+
if self.ontology_data["relationships"]:
|
435 |
+
chunk = "Key relationships in the ontology:\n"
|
436 |
+
for rel in self.ontology_data["relationships"][:5]: # Top 5 relationships
|
437 |
+
chunk += f"- {rel['name']}: {rel.get('description', '')}\n"
|
438 |
+
context_chunks.append(chunk)
|
439 |
+
|
440 |
+
return context_chunks
|
src/semantic_retriever.py
ADDED
@@ -0,0 +1,233 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# src/semantic_retriever.py
|
2 |
+
|
3 |
+
from typing import List, Dict, Any, Tuple, Optional
|
4 |
+
import numpy as np
|
5 |
+
from langchain_community.embeddings import OpenAIEmbeddings
|
6 |
+
from langchain_community.vectorstores import FAISS
|
7 |
+
from langchain.schema import Document
|
8 |
+
from src.ontology_manager import OntologyManager
|
9 |
+
|
10 |
+
class SemanticRetriever:
|
11 |
+
"""
|
12 |
+
Enhanced retrieval system that combines vector search with ontology awareness.
|
13 |
+
"""
|
14 |
+
|
15 |
+
def __init__(
|
16 |
+
self,
|
17 |
+
ontology_manager: OntologyManager,
|
18 |
+
embeddings_model = None,
|
19 |
+
text_chunks: Optional[List[str]] = None
|
20 |
+
):
|
21 |
+
"""
|
22 |
+
Initialize the semantic retriever.
|
23 |
+
|
24 |
+
Args:
|
25 |
+
ontology_manager: The ontology manager instance
|
26 |
+
embeddings_model: The embeddings model to use (defaults to OpenAIEmbeddings)
|
27 |
+
text_chunks: Optional list of text chunks to add to the vector store
|
28 |
+
"""
|
29 |
+
self.ontology_manager = ontology_manager
|
30 |
+
self.embeddings = embeddings_model or OpenAIEmbeddings()
|
31 |
+
|
32 |
+
# Create a vector store with the text representation of the ontology
|
33 |
+
ontology_text = ontology_manager.get_text_representation()
|
34 |
+
self.ontology_chunks = self._split_text(ontology_text)
|
35 |
+
|
36 |
+
# Add additional text chunks if provided
|
37 |
+
if text_chunks:
|
38 |
+
self.text_chunks = text_chunks
|
39 |
+
all_chunks = self.ontology_chunks + text_chunks
|
40 |
+
else:
|
41 |
+
self.text_chunks = []
|
42 |
+
all_chunks = self.ontology_chunks
|
43 |
+
|
44 |
+
# Convert to Document objects for FAISS
|
45 |
+
documents = [Document(page_content=chunk, metadata={"source": "ontology" if i < len(self.ontology_chunks) else "text"})
|
46 |
+
for i, chunk in enumerate(all_chunks)]
|
47 |
+
|
48 |
+
# Create the vector store
|
49 |
+
self.vector_store = FAISS.from_documents(documents, self.embeddings)
|
50 |
+
|
51 |
+
def _split_text(self, text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
|
52 |
+
"""Split text into chunks for embedding."""
|
53 |
+
chunks = []
|
54 |
+
text_length = len(text)
|
55 |
+
|
56 |
+
for i in range(0, text_length, chunk_size - overlap):
|
57 |
+
chunk = text[i:i + chunk_size]
|
58 |
+
if len(chunk) < 50: # Skip very small chunks
|
59 |
+
continue
|
60 |
+
chunks.append(chunk)
|
61 |
+
|
62 |
+
return chunks
|
63 |
+
|
64 |
+
def retrieve(self, query: str, k: int = 4, include_ontology_context: bool = True) -> List[Document]:
|
65 |
+
"""
|
66 |
+
Retrieve relevant documents using a hybrid approach.
|
67 |
+
|
68 |
+
Args:
|
69 |
+
query: The query string
|
70 |
+
k: Number of documents to retrieve
|
71 |
+
include_ontology_context: Whether to include additional ontology context
|
72 |
+
|
73 |
+
Returns:
|
74 |
+
A list of retrieved documents
|
75 |
+
"""
|
76 |
+
# Get semantic context from the ontology
|
77 |
+
if include_ontology_context:
|
78 |
+
ontology_context = self.ontology_manager.get_semantic_context(query)
|
79 |
+
else:
|
80 |
+
ontology_context = []
|
81 |
+
|
82 |
+
# Perform vector similarity search
|
83 |
+
vector_results = self.vector_store.similarity_search(query, k=k)
|
84 |
+
|
85 |
+
# Combine results
|
86 |
+
combined_results = vector_results
|
87 |
+
|
88 |
+
# Add ontology context as additional documents
|
89 |
+
for i, context in enumerate(ontology_context):
|
90 |
+
combined_results.append(Document(
|
91 |
+
page_content=context,
|
92 |
+
metadata={"source": "ontology_context", "context_id": i}
|
93 |
+
))
|
94 |
+
|
95 |
+
return combined_results
|
96 |
+
|
97 |
+
def retrieve_with_paths(self, query: str, k: int = 4) -> Dict[str, Any]:
|
98 |
+
"""
|
99 |
+
Enhanced retrieval that includes semantic paths between entities.
|
100 |
+
|
101 |
+
Args:
|
102 |
+
query: The query string
|
103 |
+
k: Number of documents to retrieve
|
104 |
+
|
105 |
+
Returns:
|
106 |
+
A dictionary containing retrieved documents and semantic paths
|
107 |
+
"""
|
108 |
+
# Basic retrieval
|
109 |
+
basic_results = self.retrieve(query, k)
|
110 |
+
|
111 |
+
# Extract potential entities from the query (simplified approach)
|
112 |
+
# A more sophisticated approach would use NER or entity linking
|
113 |
+
entity_types = ["Product", "Department", "Employee", "Manager", "Customer", "Feedback"]
|
114 |
+
query_words = query.lower().split()
|
115 |
+
|
116 |
+
potential_entities = []
|
117 |
+
for entity_type in entity_types:
|
118 |
+
if entity_type.lower() in query_words:
|
119 |
+
# Get instances of this type
|
120 |
+
instances = self.ontology_manager.get_instances_of_class(entity_type)
|
121 |
+
if instances:
|
122 |
+
# Just take the first few for demonstration
|
123 |
+
potential_entities.extend(instances[:2])
|
124 |
+
|
125 |
+
# Find paths between potential entities
|
126 |
+
paths = []
|
127 |
+
if len(potential_entities) >= 2:
|
128 |
+
for i in range(len(potential_entities)):
|
129 |
+
for j in range(i+1, len(potential_entities)):
|
130 |
+
source = potential_entities[i]
|
131 |
+
target = potential_entities[j]
|
132 |
+
|
133 |
+
# Find paths between these entities
|
134 |
+
entity_paths = self.ontology_manager.find_paths(source, target, max_length=3)
|
135 |
+
|
136 |
+
if entity_paths:
|
137 |
+
for path in entity_paths:
|
138 |
+
# Convert path to text
|
139 |
+
path_text = self._path_to_text(path)
|
140 |
+
paths.append({
|
141 |
+
"source": source,
|
142 |
+
"target": target,
|
143 |
+
"path": path,
|
144 |
+
"text": path_text
|
145 |
+
})
|
146 |
+
|
147 |
+
# Convert paths to documents
|
148 |
+
path_documents = []
|
149 |
+
for i, path_info in enumerate(paths):
|
150 |
+
path_documents.append(Document(
|
151 |
+
page_content=path_info["text"],
|
152 |
+
metadata={
|
153 |
+
"source": "semantic_path",
|
154 |
+
"path_id": i,
|
155 |
+
"source_entity": path_info["source"],
|
156 |
+
"target_entity": path_info["target"]
|
157 |
+
}
|
158 |
+
))
|
159 |
+
|
160 |
+
return {
|
161 |
+
"documents": basic_results + path_documents,
|
162 |
+
"paths": paths
|
163 |
+
}
|
164 |
+
|
165 |
+
def _path_to_text(self, path: List[Dict]) -> str:
|
166 |
+
"""Convert a path to a text description."""
|
167 |
+
if not path:
|
168 |
+
return ""
|
169 |
+
|
170 |
+
text_parts = []
|
171 |
+
for edge in path:
|
172 |
+
source = edge["source"]
|
173 |
+
target = edge["target"]
|
174 |
+
relation = edge["type"]
|
175 |
+
|
176 |
+
# Get entity information
|
177 |
+
source_info = self.ontology_manager.get_entity_info(source)
|
178 |
+
target_info = self.ontology_manager.get_entity_info(target)
|
179 |
+
|
180 |
+
# Get names if available
|
181 |
+
source_name = source
|
182 |
+
if "properties" in source_info and "name" in source_info["properties"]:
|
183 |
+
source_name = source_info["properties"]["name"]
|
184 |
+
|
185 |
+
target_name = target
|
186 |
+
if "properties" in target_info and "name" in target_info["properties"]:
|
187 |
+
target_name = target_info["properties"]["name"]
|
188 |
+
|
189 |
+
# Describe the relationship
|
190 |
+
text_parts.append(f"{source_name} {relation} {target_name}")
|
191 |
+
|
192 |
+
return " -> ".join(text_parts)
|
193 |
+
|
194 |
+
def search_by_property(self, class_type: str, property_name: str, property_value: str) -> List[Document]:
|
195 |
+
"""
|
196 |
+
Search for instances of a class with a specific property value.
|
197 |
+
|
198 |
+
Args:
|
199 |
+
class_type: The class to search in
|
200 |
+
property_name: The property name to match
|
201 |
+
property_value: The property value to match
|
202 |
+
|
203 |
+
Returns:
|
204 |
+
A list of matched entities as documents
|
205 |
+
"""
|
206 |
+
instances = self.ontology_manager.get_instances_of_class(class_type)
|
207 |
+
|
208 |
+
results = []
|
209 |
+
for instance_id in instances:
|
210 |
+
entity_info = self.ontology_manager.get_entity_info(instance_id)
|
211 |
+
if "properties" in entity_info:
|
212 |
+
properties = entity_info["properties"]
|
213 |
+
if property_name in properties:
|
214 |
+
# Simple string matching (could be enhanced with fuzzy matching)
|
215 |
+
if str(properties[property_name]).lower() == property_value.lower():
|
216 |
+
# Convert to document
|
217 |
+
doc_content = f"Instance: {instance_id}\n"
|
218 |
+
doc_content += f"Type: {class_type}\n"
|
219 |
+
doc_content += "Properties:\n"
|
220 |
+
|
221 |
+
for prop_name, prop_value in properties.items():
|
222 |
+
doc_content += f"- {prop_name}: {prop_value}\n"
|
223 |
+
|
224 |
+
results.append(Document(
|
225 |
+
page_content=doc_content,
|
226 |
+
metadata={
|
227 |
+
"source": "property_search",
|
228 |
+
"instance_id": instance_id,
|
229 |
+
"class_type": class_type
|
230 |
+
}
|
231 |
+
))
|
232 |
+
|
233 |
+
return results
|
src/visualization.py
ADDED
@@ -0,0 +1,1564 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# src/visualization.py
|
2 |
+
|
3 |
+
import streamlit as st
|
4 |
+
import json
|
5 |
+
import networkx as nx
|
6 |
+
import pandas as pd
|
7 |
+
from typing import Dict, List, Any, Optional, Set, Tuple
|
8 |
+
import plotly.graph_objects as go
|
9 |
+
import plotly.express as px
|
10 |
+
import matplotlib.pyplot as plt
|
11 |
+
import matplotlib.colors as mcolors
|
12 |
+
from collections import defaultdict
|
13 |
+
import math
|
14 |
+
|
15 |
+
def render_html_in_streamlit(html_content: str):
|
16 |
+
"""Display HTML content in Streamlit using an iframe."""
|
17 |
+
import base64
|
18 |
+
|
19 |
+
# Encode the HTML content
|
20 |
+
encoded_html = base64.b64encode(html_content.encode()).decode()
|
21 |
+
|
22 |
+
# Create an iframe with the data URL
|
23 |
+
iframe_html = f"""
|
24 |
+
<iframe
|
25 |
+
srcdoc="{encoded_html}"
|
26 |
+
width="100%"
|
27 |
+
height="600px"
|
28 |
+
frameborder="0"
|
29 |
+
allowfullscreen>
|
30 |
+
</iframe>
|
31 |
+
"""
|
32 |
+
|
33 |
+
# Display the iframe
|
34 |
+
st.markdown(iframe_html, unsafe_allow_html=True)
|
35 |
+
|
36 |
+
|
37 |
+
def display_ontology_stats(ontology_manager):
|
38 |
+
"""Display statistics and visualizations about the ontology."""
|
39 |
+
st.subheader("📊 Ontology Structure and Statistics")
|
40 |
+
|
41 |
+
# Get basic stats
|
42 |
+
classes = ontology_manager.get_classes()
|
43 |
+
class_hierarchy = ontology_manager.get_class_hierarchy()
|
44 |
+
|
45 |
+
# Count instances per class
|
46 |
+
class_counts = []
|
47 |
+
for class_name in classes:
|
48 |
+
instance_count = len(ontology_manager.get_instances_of_class(class_name, include_subclasses=False))
|
49 |
+
class_counts.append({
|
50 |
+
"Class": class_name,
|
51 |
+
"Instances": instance_count
|
52 |
+
})
|
53 |
+
|
54 |
+
# Display summary metrics
|
55 |
+
col1, col2, col3 = st.columns(3)
|
56 |
+
|
57 |
+
with col1:
|
58 |
+
st.metric("Total Classes", len(classes))
|
59 |
+
|
60 |
+
# Count total instances
|
61 |
+
total_instances = sum(item["Instances"] for item in class_counts)
|
62 |
+
with col2:
|
63 |
+
st.metric("Total Instances", total_instances)
|
64 |
+
|
65 |
+
# Count relationships
|
66 |
+
relationship_count = len(ontology_manager.ontology_data.get("relationships", []))
|
67 |
+
with col3:
|
68 |
+
st.metric("Relationship Types", relationship_count)
|
69 |
+
|
70 |
+
# Visualize class hierarchy
|
71 |
+
st.markdown("### Class Hierarchy")
|
72 |
+
|
73 |
+
# Create tabs for different views
|
74 |
+
tab1, tab2, tab3 = st.tabs(["Tree View", "Class Statistics", "Hierarchy Graph"])
|
75 |
+
|
76 |
+
with tab1:
|
77 |
+
# Create a collapsible tree view of class hierarchy
|
78 |
+
display_class_hierarchy_tree(ontology_manager, class_hierarchy)
|
79 |
+
|
80 |
+
with tab2:
|
81 |
+
# Display class stats and distribution
|
82 |
+
if class_counts:
|
83 |
+
# Filter to only show classes with instances
|
84 |
+
non_empty_classes = [item for item in class_counts if item["Instances"] > 0]
|
85 |
+
|
86 |
+
if non_empty_classes:
|
87 |
+
df = pd.DataFrame(non_empty_classes)
|
88 |
+
df = df.sort_values("Instances", ascending=False)
|
89 |
+
|
90 |
+
# Create horizontal bar chart
|
91 |
+
fig = px.bar(df,
|
92 |
+
x="Instances",
|
93 |
+
y="Class",
|
94 |
+
orientation='h',
|
95 |
+
title="Instances per Class",
|
96 |
+
color="Instances",
|
97 |
+
color_continuous_scale="viridis")
|
98 |
+
|
99 |
+
fig.update_layout(yaxis={'categoryorder':'total ascending'})
|
100 |
+
st.plotly_chart(fig, use_container_width=True)
|
101 |
+
else:
|
102 |
+
st.info("No classes with instances found.")
|
103 |
+
|
104 |
+
# Show distribution of classes by inheritance depth
|
105 |
+
display_class_depth_distribution(ontology_manager)
|
106 |
+
|
107 |
+
with tab3:
|
108 |
+
# Display class hierarchy as a graph
|
109 |
+
display_class_hierarchy_graph(ontology_manager)
|
110 |
+
|
111 |
+
# Relationship statistics
|
112 |
+
st.markdown("### Relationship Analysis")
|
113 |
+
|
114 |
+
# Get relationship usage statistics
|
115 |
+
relationship_usage = analyze_relationship_usage(ontology_manager)
|
116 |
+
|
117 |
+
# Display relationship usage in a table and chart
|
118 |
+
if relationship_usage:
|
119 |
+
tab1, tab2 = st.tabs(["Usage Statistics", "Domain/Range Distribution"])
|
120 |
+
|
121 |
+
with tab1:
|
122 |
+
# Create DataFrame for the table
|
123 |
+
df = pd.DataFrame(relationship_usage)
|
124 |
+
df = df.sort_values("Usage Count", ascending=False)
|
125 |
+
|
126 |
+
# Show table
|
127 |
+
st.dataframe(df)
|
128 |
+
|
129 |
+
# Create bar chart for relationship usage
|
130 |
+
fig = px.bar(df,
|
131 |
+
x="Relationship",
|
132 |
+
y="Usage Count",
|
133 |
+
title="Relationship Usage Frequency",
|
134 |
+
color="Usage Count",
|
135 |
+
color_continuous_scale="blues")
|
136 |
+
|
137 |
+
st.plotly_chart(fig, use_container_width=True)
|
138 |
+
|
139 |
+
with tab2:
|
140 |
+
# Display domain-range distribution
|
141 |
+
display_domain_range_distribution(ontology_manager)
|
142 |
+
|
143 |
+
|
144 |
+
def display_class_hierarchy_tree(ontology_manager, class_hierarchy):
|
145 |
+
"""Display class hierarchy as an interactive tree."""
|
146 |
+
# Find root classes (those that aren't subclasses of anything else)
|
147 |
+
all_subclasses = set()
|
148 |
+
for subclasses in class_hierarchy.values():
|
149 |
+
all_subclasses.update(subclasses)
|
150 |
+
|
151 |
+
root_classes = [cls for cls in ontology_manager.get_classes() if cls not in all_subclasses]
|
152 |
+
|
153 |
+
# Create a recursive function to display the hierarchy
|
154 |
+
def display_subclasses(class_name, indent=0):
|
155 |
+
# Get class info
|
156 |
+
class_info = ontology_manager.ontology_data["classes"].get(class_name, {})
|
157 |
+
description = class_info.get("description", "")
|
158 |
+
instance_count = len(ontology_manager.get_instances_of_class(class_name, include_subclasses=False))
|
159 |
+
|
160 |
+
# Display class with expander for subclasses
|
161 |
+
if indent == 0:
|
162 |
+
# Root level classes are always expanded
|
163 |
+
with st.expander(f"📁 {class_name} ({instance_count} instances)", expanded=True):
|
164 |
+
st.markdown(f"**Description:** {description}")
|
165 |
+
|
166 |
+
# Show properties if any
|
167 |
+
properties = class_info.get("properties", [])
|
168 |
+
if properties:
|
169 |
+
st.markdown("**Properties:**")
|
170 |
+
st.markdown(", ".join(properties))
|
171 |
+
|
172 |
+
# Display subclasses
|
173 |
+
subclasses = class_hierarchy.get(class_name, [])
|
174 |
+
if subclasses:
|
175 |
+
st.markdown("**Subclasses:**")
|
176 |
+
for subclass in sorted(subclasses):
|
177 |
+
display_subclasses(subclass, indent + 1)
|
178 |
+
else:
|
179 |
+
st.markdown("*No subclasses*")
|
180 |
+
else:
|
181 |
+
# Nested classes use indentation and only show direct instances
|
182 |
+
if instance_count > 0:
|
183 |
+
class_label = f"📁 {class_name} ({instance_count} instances)"
|
184 |
+
else:
|
185 |
+
class_label = f"📁 {class_name}"
|
186 |
+
|
187 |
+
with st.expander(class_label, expanded=False):
|
188 |
+
st.markdown(f"**Description:** {description}")
|
189 |
+
|
190 |
+
# Show properties if any
|
191 |
+
properties = class_info.get("properties", [])
|
192 |
+
if properties:
|
193 |
+
st.markdown("**Properties:**")
|
194 |
+
st.markdown(", ".join(properties))
|
195 |
+
|
196 |
+
# Display subclasses
|
197 |
+
subclasses = class_hierarchy.get(class_name, [])
|
198 |
+
if subclasses:
|
199 |
+
st.markdown("**Subclasses:**")
|
200 |
+
for subclass in sorted(subclasses):
|
201 |
+
display_subclasses(subclass, indent + 1)
|
202 |
+
else:
|
203 |
+
st.markdown("*No subclasses*")
|
204 |
+
|
205 |
+
# Display each root class
|
206 |
+
for root_class in sorted(root_classes):
|
207 |
+
display_subclasses(root_class)
|
208 |
+
|
209 |
+
|
210 |
+
def get_class_depths(ontology_manager) -> Dict[str, int]:
|
211 |
+
"""Calculate the inheritance depth of each class."""
|
212 |
+
depths = {}
|
213 |
+
class_data = ontology_manager.ontology_data["classes"]
|
214 |
+
|
215 |
+
def get_depth(class_name):
|
216 |
+
# If we've already calculated the depth, return it
|
217 |
+
if class_name in depths:
|
218 |
+
return depths[class_name]
|
219 |
+
|
220 |
+
# Get the class data
|
221 |
+
cls = class_data.get(class_name, {})
|
222 |
+
|
223 |
+
# If no parent, depth is 0
|
224 |
+
if "subClassOf" not in cls:
|
225 |
+
depths[class_name] = 0
|
226 |
+
return 0
|
227 |
+
|
228 |
+
# Otherwise, depth is 1 + parent's depth
|
229 |
+
parent = cls["subClassOf"]
|
230 |
+
parent_depth = get_depth(parent)
|
231 |
+
depths[class_name] = parent_depth + 1
|
232 |
+
return depths[class_name]
|
233 |
+
|
234 |
+
# Calculate depths for all classes
|
235 |
+
for class_name in class_data:
|
236 |
+
get_depth(class_name)
|
237 |
+
|
238 |
+
return depths
|
239 |
+
|
240 |
+
|
241 |
+
def display_class_depth_distribution(ontology_manager):
|
242 |
+
"""Display distribution of classes by inheritance depth."""
|
243 |
+
depths = get_class_depths(ontology_manager)
|
244 |
+
|
245 |
+
# Count classes at each depth
|
246 |
+
depth_counts = defaultdict(int)
|
247 |
+
for _, depth in depths.items():
|
248 |
+
depth_counts[depth] += 1
|
249 |
+
|
250 |
+
# Create dataframe
|
251 |
+
df = pd.DataFrame([
|
252 |
+
{"Depth": depth, "Count": count}
|
253 |
+
for depth, count in depth_counts.items()
|
254 |
+
])
|
255 |
+
|
256 |
+
if not df.empty:
|
257 |
+
df = df.sort_values("Depth")
|
258 |
+
|
259 |
+
# Create bar chart
|
260 |
+
fig = px.bar(df,
|
261 |
+
x="Depth",
|
262 |
+
y="Count",
|
263 |
+
title="Class Distribution by Inheritance Depth",
|
264 |
+
labels={"Depth": "Inheritance Depth", "Count": "Number of Classes"},
|
265 |
+
color="Count",
|
266 |
+
text="Count")
|
267 |
+
|
268 |
+
fig.update_traces(texttemplate='%{text}', textposition='outside')
|
269 |
+
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
|
270 |
+
|
271 |
+
st.plotly_chart(fig, use_container_width=True)
|
272 |
+
|
273 |
+
|
274 |
+
def display_class_hierarchy_graph(ontology_manager):
|
275 |
+
"""Display class hierarchy as a directed graph."""
|
276 |
+
# Create a directed graph
|
277 |
+
G = nx.DiGraph()
|
278 |
+
|
279 |
+
# Add nodes for each class
|
280 |
+
for class_name, class_info in ontology_manager.ontology_data["classes"].items():
|
281 |
+
# Count direct instances
|
282 |
+
instance_count = len(ontology_manager.get_instances_of_class(class_name, include_subclasses=False))
|
283 |
+
|
284 |
+
# Add node with attributes
|
285 |
+
G.add_node(class_name,
|
286 |
+
type="class",
|
287 |
+
description=class_info.get("description", ""),
|
288 |
+
instance_count=instance_count)
|
289 |
+
|
290 |
+
# Add edge for subclass relationship
|
291 |
+
if "subClassOf" in class_info:
|
292 |
+
parent = class_info["subClassOf"]
|
293 |
+
G.add_edge(parent, class_name, relationship="subClassOf")
|
294 |
+
|
295 |
+
# Create a Plotly graph visualization
|
296 |
+
# Calculate node positions using a hierarchical layout
|
297 |
+
pos = nx.nx_agraph.graphviz_layout(G, prog="dot")
|
298 |
+
|
299 |
+
# Convert positions to lists for Plotly
|
300 |
+
node_x = []
|
301 |
+
node_y = []
|
302 |
+
node_text = []
|
303 |
+
node_size = []
|
304 |
+
node_color = []
|
305 |
+
|
306 |
+
for node in G.nodes():
|
307 |
+
x, y = pos[node]
|
308 |
+
node_x.append(x)
|
309 |
+
node_y.append(y)
|
310 |
+
|
311 |
+
# Get node info for hover text
|
312 |
+
description = G.nodes[node].get("description", "")
|
313 |
+
instance_count = G.nodes[node].get("instance_count", 0)
|
314 |
+
|
315 |
+
# Prepare hover text
|
316 |
+
hover_text = f"Class: {node}<br>Description: {description}<br>Instances: {instance_count}"
|
317 |
+
node_text.append(hover_text)
|
318 |
+
|
319 |
+
# Size nodes by instance count (with a minimum size)
|
320 |
+
size = 10 + (instance_count * 2)
|
321 |
+
size = min(40, max(15, size)) # Limit size range
|
322 |
+
node_size.append(size)
|
323 |
+
|
324 |
+
# Color nodes by depth
|
325 |
+
depth = get_class_depths(ontology_manager).get(node, 0)
|
326 |
+
# Use a color scale from light to dark blue
|
327 |
+
node_color.append(depth)
|
328 |
+
|
329 |
+
# Create edge traces
|
330 |
+
edge_x = []
|
331 |
+
edge_y = []
|
332 |
+
|
333 |
+
for edge in G.edges():
|
334 |
+
x0, y0 = pos[edge[0]]
|
335 |
+
x1, y1 = pos[edge[1]]
|
336 |
+
|
337 |
+
# Add a curved line with multiple points
|
338 |
+
edge_x.append(x0)
|
339 |
+
edge_x.append(x1)
|
340 |
+
edge_x.append(None) # Add None to create a break between edges
|
341 |
+
|
342 |
+
edge_y.append(y0)
|
343 |
+
edge_y.append(y1)
|
344 |
+
edge_y.append(None)
|
345 |
+
|
346 |
+
# Create node trace
|
347 |
+
node_trace = go.Scatter(
|
348 |
+
x=node_x, y=node_y,
|
349 |
+
mode='markers+text',
|
350 |
+
text=[node for node in G.nodes()],
|
351 |
+
textposition="bottom center",
|
352 |
+
hoverinfo='text',
|
353 |
+
hovertext=node_text,
|
354 |
+
marker=dict(
|
355 |
+
showscale=True,
|
356 |
+
colorscale='Blues',
|
357 |
+
color=node_color,
|
358 |
+
size=node_size,
|
359 |
+
line=dict(width=2, color='DarkSlateGrey'),
|
360 |
+
colorbar=dict(
|
361 |
+
title="Depth",
|
362 |
+
thickness=15,
|
363 |
+
tickvals=[0, max(node_color)],
|
364 |
+
ticktext=["Root", f"Depth {max(node_color)}"]
|
365 |
+
)
|
366 |
+
)
|
367 |
+
)
|
368 |
+
|
369 |
+
# Create edge trace
|
370 |
+
edge_trace = go.Scatter(
|
371 |
+
x=edge_x, y=edge_y,
|
372 |
+
line=dict(width=1, color='#888'),
|
373 |
+
hoverinfo='none',
|
374 |
+
mode='lines'
|
375 |
+
)
|
376 |
+
|
377 |
+
# Create figure
|
378 |
+
fig = go.Figure(data=[edge_trace, node_trace],
|
379 |
+
layout=go.Layout(
|
380 |
+
showlegend=False,
|
381 |
+
hovermode='closest',
|
382 |
+
margin=dict(b=20, l=5, r=5, t=40),
|
383 |
+
xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
|
384 |
+
yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
|
385 |
+
title="Class Hierarchy Graph",
|
386 |
+
title_x=0.5
|
387 |
+
))
|
388 |
+
|
389 |
+
# Display the figure
|
390 |
+
st.plotly_chart(fig, use_container_width=True)
|
391 |
+
|
392 |
+
|
393 |
+
def analyze_relationship_usage(ontology_manager) -> List[Dict]:
|
394 |
+
"""Analyze how relationships are used in the ontology."""
|
395 |
+
relationship_data = ontology_manager.ontology_data.get("relationships", [])
|
396 |
+
instances = ontology_manager.ontology_data.get("instances", [])
|
397 |
+
|
398 |
+
# Initialize counters
|
399 |
+
usage_counts = defaultdict(int)
|
400 |
+
|
401 |
+
# Count relationship usage in instances
|
402 |
+
for instance in instances:
|
403 |
+
for rel in instance.get("relationships", []):
|
404 |
+
usage_counts[rel["type"]] += 1
|
405 |
+
|
406 |
+
# Prepare results
|
407 |
+
results = []
|
408 |
+
for rel in relationship_data:
|
409 |
+
rel_name = rel["name"]
|
410 |
+
domain = rel["domain"]
|
411 |
+
range_class = rel["range"]
|
412 |
+
cardinality = rel.get("cardinality", "many-to-many")
|
413 |
+
count = usage_counts.get(rel_name, 0)
|
414 |
+
|
415 |
+
results.append({
|
416 |
+
"Relationship": rel_name,
|
417 |
+
"Domain": domain,
|
418 |
+
"Range": range_class,
|
419 |
+
"Cardinality": cardinality,
|
420 |
+
"Usage Count": count
|
421 |
+
})
|
422 |
+
|
423 |
+
return results
|
424 |
+
|
425 |
+
|
426 |
+
def display_domain_range_distribution(ontology_manager):
|
427 |
+
"""Display domain and range distribution for relationships."""
|
428 |
+
relationship_data = ontology_manager.ontology_data.get("relationships", [])
|
429 |
+
|
430 |
+
# Count domains and ranges
|
431 |
+
domain_counts = defaultdict(int)
|
432 |
+
range_counts = defaultdict(int)
|
433 |
+
|
434 |
+
for rel in relationship_data:
|
435 |
+
domain_counts[rel["domain"]] += 1
|
436 |
+
range_counts[rel["range"]] += 1
|
437 |
+
|
438 |
+
# Create DataFrames
|
439 |
+
domain_df = pd.DataFrame([
|
440 |
+
{"Class": cls, "Count": count, "Type": "Domain"}
|
441 |
+
for cls, count in domain_counts.items()
|
442 |
+
])
|
443 |
+
|
444 |
+
range_df = pd.DataFrame([
|
445 |
+
{"Class": cls, "Count": count, "Type": "Range"}
|
446 |
+
for cls, count in range_counts.items()
|
447 |
+
])
|
448 |
+
|
449 |
+
# Combine
|
450 |
+
combined_df = pd.concat([domain_df, range_df])
|
451 |
+
|
452 |
+
# Create plot
|
453 |
+
if not combined_df.empty:
|
454 |
+
fig = px.bar(combined_df,
|
455 |
+
x="Class",
|
456 |
+
y="Count",
|
457 |
+
color="Type",
|
458 |
+
barmode="group",
|
459 |
+
title="Classes as Domain vs Range in Relationships",
|
460 |
+
color_discrete_map={"Domain": "#1f77b4", "Range": "#ff7f0e"})
|
461 |
+
|
462 |
+
fig.update_layout(xaxis={'categoryorder':'total descending'})
|
463 |
+
|
464 |
+
st.plotly_chart(fig, use_container_width=True)
|
465 |
+
|
466 |
+
|
467 |
+
def display_entity_details(entity_info: Dict[str, Any], ontology_manager):
|
468 |
+
"""Display detailed information about an entity."""
|
469 |
+
if not entity_info:
|
470 |
+
st.warning("Entity not found.")
|
471 |
+
return
|
472 |
+
|
473 |
+
st.subheader(f"📝 Entity: {entity_info['id']}")
|
474 |
+
|
475 |
+
# Determine entity type and get class hierarchy
|
476 |
+
entity_type = entity_info.get("type", "")
|
477 |
+
class_type = entity_info.get("class", entity_info.get("class_type", ""))
|
478 |
+
|
479 |
+
class_hierarchy = []
|
480 |
+
if class_type:
|
481 |
+
current_class = class_type
|
482 |
+
while current_class:
|
483 |
+
class_hierarchy.append(current_class)
|
484 |
+
parent_class = ontology_manager.ontology_data["classes"].get(current_class, {}).get("subClassOf", "")
|
485 |
+
if not parent_class or parent_class == current_class: # Prevent infinite loops
|
486 |
+
break
|
487 |
+
current_class = parent_class
|
488 |
+
|
489 |
+
# Display entity metadata
|
490 |
+
col1, col2 = st.columns([1, 2])
|
491 |
+
|
492 |
+
with col1:
|
493 |
+
st.markdown("### Basic Information")
|
494 |
+
|
495 |
+
# Basic info metrics
|
496 |
+
st.metric("Entity Type", entity_type)
|
497 |
+
|
498 |
+
if class_type:
|
499 |
+
st.metric("Class", class_type)
|
500 |
+
|
501 |
+
# Display class hierarchy
|
502 |
+
if class_hierarchy and len(class_hierarchy) > 1:
|
503 |
+
st.markdown("**Class Hierarchy:**")
|
504 |
+
hierarchy_str = " → ".join(reversed(class_hierarchy))
|
505 |
+
st.markdown(f"```\n{hierarchy_str}\n```")
|
506 |
+
|
507 |
+
with col2:
|
508 |
+
# Display class description if available
|
509 |
+
if "class_description" in entity_info:
|
510 |
+
st.markdown("### Description")
|
511 |
+
st.markdown(entity_info.get("class_description", "No description available."))
|
512 |
+
|
513 |
+
# Properties
|
514 |
+
if "properties" in entity_info and entity_info["properties"]:
|
515 |
+
st.markdown("### Properties")
|
516 |
+
|
517 |
+
# Create a more structured property display
|
518 |
+
properties = []
|
519 |
+
for key, value in entity_info["properties"].items():
|
520 |
+
# Handle different value types
|
521 |
+
if isinstance(value, list):
|
522 |
+
value_str = ", ".join(str(v) for v in value)
|
523 |
+
else:
|
524 |
+
value_str = str(value)
|
525 |
+
|
526 |
+
properties.append({"Property": key, "Value": value_str})
|
527 |
+
|
528 |
+
# Display as table with highlighting
|
529 |
+
property_df = pd.DataFrame(properties)
|
530 |
+
st.dataframe(
|
531 |
+
property_df,
|
532 |
+
column_config={
|
533 |
+
"Property": st.column_config.TextColumn("Property", width="medium"),
|
534 |
+
"Value": st.column_config.TextColumn("Value", width="large")
|
535 |
+
},
|
536 |
+
hide_index=True
|
537 |
+
)
|
538 |
+
|
539 |
+
# Relationships with visual enhancements
|
540 |
+
if "relationships" in entity_info and entity_info["relationships"]:
|
541 |
+
st.markdown("### Relationships")
|
542 |
+
|
543 |
+
# Group relationships by direction
|
544 |
+
outgoing = []
|
545 |
+
incoming = []
|
546 |
+
|
547 |
+
for rel in entity_info["relationships"]:
|
548 |
+
if "direction" in rel and rel["direction"] == "outgoing":
|
549 |
+
outgoing.append({
|
550 |
+
"Relationship": rel["type"],
|
551 |
+
"Direction": "→",
|
552 |
+
"Related Entity": rel["target"]
|
553 |
+
})
|
554 |
+
elif "direction" in rel and rel["direction"] == "incoming":
|
555 |
+
incoming.append({
|
556 |
+
"Relationship": rel["type"],
|
557 |
+
"Direction": "←",
|
558 |
+
"Related Entity": rel["source"]
|
559 |
+
})
|
560 |
+
|
561 |
+
# Create tabs for outgoing and incoming
|
562 |
+
if outgoing or incoming:
|
563 |
+
tab1, tab2 = st.tabs(["Outgoing Relationships", "Incoming Relationships"])
|
564 |
+
|
565 |
+
with tab1:
|
566 |
+
if outgoing:
|
567 |
+
st.dataframe(
|
568 |
+
pd.DataFrame(outgoing),
|
569 |
+
column_config={
|
570 |
+
"Relationship": st.column_config.TextColumn("Relationship Type", width="medium"),
|
571 |
+
"Direction": st.column_config.TextColumn("Direction", width="small"),
|
572 |
+
"Related Entity": st.column_config.TextColumn("Target Entity", width="medium")
|
573 |
+
},
|
574 |
+
hide_index=True
|
575 |
+
)
|
576 |
+
else:
|
577 |
+
st.info("No outgoing relationships.")
|
578 |
+
|
579 |
+
with tab2:
|
580 |
+
if incoming:
|
581 |
+
st.dataframe(
|
582 |
+
pd.DataFrame(incoming),
|
583 |
+
column_config={
|
584 |
+
"Relationship": st.column_config.TextColumn("Relationship Type", width="medium"),
|
585 |
+
"Direction": st.column_config.TextColumn("Direction", width="small"),
|
586 |
+
"Related Entity": st.column_config.TextColumn("Source Entity", width="medium")
|
587 |
+
},
|
588 |
+
hide_index=True
|
589 |
+
)
|
590 |
+
else:
|
591 |
+
st.info("No incoming relationships.")
|
592 |
+
|
593 |
+
# Visual relationship graph
|
594 |
+
st.markdown("#### Relationship Graph")
|
595 |
+
display_entity_relationship_graph(entity_info, ontology_manager)
|
596 |
+
|
597 |
+
|
598 |
+
def display_entity_relationship_graph(entity_info: Dict[str, Any], ontology_manager):
|
599 |
+
"""Display a graph of an entity's relationships."""
|
600 |
+
entity_id = entity_info["id"]
|
601 |
+
|
602 |
+
# Create graph
|
603 |
+
G = nx.DiGraph()
|
604 |
+
|
605 |
+
# Add central entity
|
606 |
+
G.add_node(entity_id, type="central")
|
607 |
+
|
608 |
+
# Add related entities and relationships
|
609 |
+
for rel in entity_info.get("relationships", []):
|
610 |
+
if "direction" in rel and rel["direction"] == "outgoing":
|
611 |
+
target = rel["target"]
|
612 |
+
rel_type = rel["type"]
|
613 |
+
|
614 |
+
# Add target node if not exists
|
615 |
+
if target not in G:
|
616 |
+
target_info = ontology_manager.get_entity_info(target)
|
617 |
+
node_type = target_info.get("type", "unknown")
|
618 |
+
G.add_node(target, type=node_type)
|
619 |
+
|
620 |
+
# Add edge
|
621 |
+
G.add_edge(entity_id, target, type=rel_type)
|
622 |
+
|
623 |
+
elif "direction" in rel and rel["direction"] == "incoming":
|
624 |
+
source = rel["source"]
|
625 |
+
rel_type = rel["type"]
|
626 |
+
|
627 |
+
# Add source node if not exists
|
628 |
+
if source not in G:
|
629 |
+
source_info = ontology_manager.get_entity_info(source)
|
630 |
+
node_type = source_info.get("type", "unknown")
|
631 |
+
G.add_node(source, type=node_type)
|
632 |
+
|
633 |
+
# Add edge
|
634 |
+
G.add_edge(source, entity_id, type=rel_type)
|
635 |
+
|
636 |
+
# Use a force-directed layout
|
637 |
+
pos = nx.spring_layout(G, k=0.5, iterations=50)
|
638 |
+
|
639 |
+
# Create Plotly figure
|
640 |
+
fig = go.Figure()
|
641 |
+
|
642 |
+
# Add edges with curved lines
|
643 |
+
for source, target, data in G.edges(data=True):
|
644 |
+
x0, y0 = pos[source]
|
645 |
+
x1, y1 = pos[target]
|
646 |
+
rel_type = data.get("type", "unknown")
|
647 |
+
|
648 |
+
# Calculate edge midpoint for label
|
649 |
+
mid_x = (x0 + x1) / 2
|
650 |
+
mid_y = (y0 + y1) / 2
|
651 |
+
|
652 |
+
# Draw edge
|
653 |
+
fig.add_trace(go.Scatter(
|
654 |
+
x=[x0, x1],
|
655 |
+
y=[y0, y1],
|
656 |
+
mode="lines",
|
657 |
+
line=dict(width=1, color="#888"),
|
658 |
+
hoverinfo="text",
|
659 |
+
hovertext=f"Relationship: {rel_type}",
|
660 |
+
showlegend=False
|
661 |
+
))
|
662 |
+
|
663 |
+
# Add relationship label
|
664 |
+
fig.add_trace(go.Scatter(
|
665 |
+
x=[mid_x],
|
666 |
+
y=[mid_y],
|
667 |
+
mode="text",
|
668 |
+
text=[rel_type],
|
669 |
+
textposition="middle center",
|
670 |
+
textfont=dict(size=10, color="#555"),
|
671 |
+
hoverinfo="none",
|
672 |
+
showlegend=False
|
673 |
+
))
|
674 |
+
|
675 |
+
# Add nodes with different colors by type
|
676 |
+
node_groups = defaultdict(list)
|
677 |
+
|
678 |
+
for node, data in G.nodes(data=True):
|
679 |
+
node_type = data.get("type", "unknown")
|
680 |
+
node_info = ontology_manager.get_entity_info(node)
|
681 |
+
|
682 |
+
# Get friendly name if available
|
683 |
+
name = node
|
684 |
+
if "properties" in node_info and "name" in node_info["properties"]:
|
685 |
+
name = node_info["properties"]["name"]
|
686 |
+
|
687 |
+
node_groups[node_type].append({
|
688 |
+
"id": node,
|
689 |
+
"name": name,
|
690 |
+
"x": pos[node][0],
|
691 |
+
"y": pos[node][1],
|
692 |
+
"info": node_info
|
693 |
+
})
|
694 |
+
|
695 |
+
# Define colors for different node types
|
696 |
+
colors = {
|
697 |
+
"central": "#ff7f0e", # Highlighted color for central entity
|
698 |
+
"instance": "#1f77b4",
|
699 |
+
"class": "#2ca02c",
|
700 |
+
"unknown": "#d62728"
|
701 |
+
}
|
702 |
+
|
703 |
+
# Add each node group with appropriate styling
|
704 |
+
for node_type, nodes in node_groups.items():
|
705 |
+
# Default to unknown color if type not in map
|
706 |
+
color = colors.get(node_type, colors["unknown"])
|
707 |
+
|
708 |
+
x = [node["x"] for node in nodes]
|
709 |
+
y = [node["y"] for node in nodes]
|
710 |
+
text = [node["name"] for node in nodes]
|
711 |
+
|
712 |
+
# Prepare hover text
|
713 |
+
hover_text = []
|
714 |
+
for node in nodes:
|
715 |
+
info = node["info"]
|
716 |
+
hover = f"ID: {node['id']}<br>Name: {node['name']}"
|
717 |
+
|
718 |
+
if "class_type" in info:
|
719 |
+
hover += f"<br>Type: {info['class_type']}"
|
720 |
+
|
721 |
+
hover_text.append(hover)
|
722 |
+
|
723 |
+
# Adjust size for central entity
|
724 |
+
size = 20 if node_type == "central" else 15
|
725 |
+
|
726 |
+
fig.add_trace(go.Scatter(
|
727 |
+
x=x,
|
728 |
+
y=y,
|
729 |
+
mode="markers+text",
|
730 |
+
marker=dict(
|
731 |
+
size=size,
|
732 |
+
color=color,
|
733 |
+
line=dict(width=2, color="white")
|
734 |
+
),
|
735 |
+
text=text,
|
736 |
+
textposition="bottom center",
|
737 |
+
hoverinfo="text",
|
738 |
+
hovertext=hover_text,
|
739 |
+
name=node_type.capitalize()
|
740 |
+
))
|
741 |
+
|
742 |
+
# Update layout
|
743 |
+
fig.update_layout(
|
744 |
+
title=f"Relationships for {entity_id}",
|
745 |
+
title_x=0.5,
|
746 |
+
showlegend=True,
|
747 |
+
hovermode="closest",
|
748 |
+
margin=dict(b=20, l=5, r=5, t=40),
|
749 |
+
xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
|
750 |
+
yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
|
751 |
+
height=500
|
752 |
+
)
|
753 |
+
|
754 |
+
st.plotly_chart(fig, use_container_width=True)
|
755 |
+
|
756 |
+
|
757 |
+
def display_graph_visualization(knowledge_graph, central_entity=None, max_distance=2):
|
758 |
+
"""Display an interactive visualization of the knowledge graph."""
|
759 |
+
st.subheader("🕸️ Knowledge Graph Visualization")
|
760 |
+
|
761 |
+
# Controls for the visualization
|
762 |
+
with st.expander("Visualization Settings", expanded=True):
|
763 |
+
col1, col2, col3 = st.columns(3)
|
764 |
+
|
765 |
+
with col1:
|
766 |
+
include_classes = st.checkbox("Include Classes", value=True)
|
767 |
+
|
768 |
+
with col2:
|
769 |
+
include_instances = st.checkbox("Include Instances", value=True)
|
770 |
+
|
771 |
+
with col3:
|
772 |
+
include_properties = st.checkbox("Include Properties", value=False)
|
773 |
+
|
774 |
+
st.markdown("---")
|
775 |
+
|
776 |
+
col1, col2 = st.columns(2)
|
777 |
+
|
778 |
+
with col1:
|
779 |
+
max_distance = st.slider("Max Relationship Distance", 1, 5, max_distance)
|
780 |
+
|
781 |
+
with col2:
|
782 |
+
layout_algorithm = st.selectbox(
|
783 |
+
"Layout Algorithm",
|
784 |
+
["Force-Directed", "Hierarchical", "Radial", "Circular"],
|
785 |
+
index=0
|
786 |
+
)
|
787 |
+
|
788 |
+
# Generate HTML visualization
|
789 |
+
html = knowledge_graph.generate_html_visualization(
|
790 |
+
include_classes=include_classes,
|
791 |
+
include_instances=include_instances,
|
792 |
+
central_entity=central_entity,
|
793 |
+
max_distance=max_distance,
|
794 |
+
include_properties=include_properties,
|
795 |
+
layout_algorithm=layout_algorithm.lower()
|
796 |
+
)
|
797 |
+
|
798 |
+
# Render the HTML
|
799 |
+
render_html_in_streamlit(html)
|
800 |
+
|
801 |
+
# Entity filter
|
802 |
+
with st.expander("Focus on Entity", expanded=central_entity is not None):
|
803 |
+
# Get all entities
|
804 |
+
entities = []
|
805 |
+
for class_name in knowledge_graph.ontology_manager.get_classes():
|
806 |
+
entities.extend(knowledge_graph.ontology_manager.get_instances_of_class(class_name))
|
807 |
+
|
808 |
+
# Deduplicate
|
809 |
+
entities = sorted(set(entities))
|
810 |
+
|
811 |
+
# Select entity
|
812 |
+
selected_entity = st.selectbox(
|
813 |
+
"Select Entity to Focus On",
|
814 |
+
["None"] + entities,
|
815 |
+
index=0 if central_entity is None else entities.index(central_entity) + 1
|
816 |
+
)
|
817 |
+
|
818 |
+
if selected_entity != "None":
|
819 |
+
st.button("Focus Graph", on_click=lambda: st.experimental_rerun())
|
820 |
+
|
821 |
+
# Display graph statistics
|
822 |
+
stats = knowledge_graph.get_graph_statistics()
|
823 |
+
if stats:
|
824 |
+
st.markdown("### Graph Statistics")
|
825 |
+
|
826 |
+
col1, col2, col3, col4 = st.columns(4)
|
827 |
+
col1.metric("Nodes", stats.get("node_count", 0))
|
828 |
+
col2.metric("Edges", stats.get("edge_count", 0))
|
829 |
+
col3.metric("Classes", stats.get("class_count", 0))
|
830 |
+
col4.metric("Instances", stats.get("instance_count", 0))
|
831 |
+
|
832 |
+
# Display relationship counts
|
833 |
+
if "relationship_counts" in stats:
|
834 |
+
rel_counts = stats["relationship_counts"]
|
835 |
+
rel_data = [{"Relationship": rel, "Count": count} for rel, count in rel_counts.items()
|
836 |
+
if rel not in ["subClassOf", "instanceOf"]] # Filter out structural relationships
|
837 |
+
|
838 |
+
if rel_data:
|
839 |
+
df = pd.DataFrame(rel_data)
|
840 |
+
fig = px.bar(df,
|
841 |
+
x="Relationship",
|
842 |
+
y="Count",
|
843 |
+
title="Relationship Distribution",
|
844 |
+
color="Count",
|
845 |
+
color_continuous_scale="viridis")
|
846 |
+
|
847 |
+
st.plotly_chart(fig, use_container_width=True)
|
848 |
+
|
849 |
+
def visualize_path(path_info, ontology_manager):
|
850 |
+
"""Visualize a semantic path between entities with enhanced graphics and details."""
|
851 |
+
if not path_info or "path" not in path_info:
|
852 |
+
st.warning("No path information available.")
|
853 |
+
return
|
854 |
+
|
855 |
+
st.subheader("🔄 Semantic Path Visualization")
|
856 |
+
|
857 |
+
path = path_info["path"]
|
858 |
+
|
859 |
+
# Get entity information for each node in the path
|
860 |
+
entities = {}
|
861 |
+
all_nodes = set()
|
862 |
+
|
863 |
+
# Add source and target
|
864 |
+
if "source" in path_info:
|
865 |
+
source_id = path_info["source"]
|
866 |
+
all_nodes.add(source_id)
|
867 |
+
entities[source_id] = ontology_manager.get_entity_info(source_id)
|
868 |
+
|
869 |
+
if "target" in path_info:
|
870 |
+
target_id = path_info["target"]
|
871 |
+
all_nodes.add(target_id)
|
872 |
+
entities[target_id] = ontology_manager.get_entity_info(target_id)
|
873 |
+
|
874 |
+
# Add all entities in the path
|
875 |
+
for edge in path:
|
876 |
+
source_id = edge["source"]
|
877 |
+
target_id = edge["target"]
|
878 |
+
all_nodes.add(source_id)
|
879 |
+
all_nodes.add(target_id)
|
880 |
+
|
881 |
+
if source_id not in entities:
|
882 |
+
entities[source_id] = ontology_manager.get_entity_info(source_id)
|
883 |
+
|
884 |
+
if target_id not in entities:
|
885 |
+
entities[target_id] = ontology_manager.get_entity_info(target_id)
|
886 |
+
|
887 |
+
# Create tabs for different views
|
888 |
+
tab1, tab2, tab3 = st.tabs(["Path Visualization", "Entity Details", "Path Summary"])
|
889 |
+
|
890 |
+
with tab1:
|
891 |
+
# Display path as a sequence diagram
|
892 |
+
display_path_visualization(path, entities)
|
893 |
+
|
894 |
+
with tab2:
|
895 |
+
# Display details of entities in the path
|
896 |
+
st.markdown("### Entities in Path")
|
897 |
+
|
898 |
+
# Group entities by type
|
899 |
+
entities_by_type = defaultdict(list)
|
900 |
+
for entity_id in all_nodes:
|
901 |
+
entity_info = entities.get(entity_id, {})
|
902 |
+
entity_type = entity_info.get("class_type", entity_info.get("class", "Unknown"))
|
903 |
+
entities_by_type[entity_type].append((entity_id, entity_info))
|
904 |
+
|
905 |
+
# Create an expander for each entity type
|
906 |
+
for entity_type, entity_list in entities_by_type.items():
|
907 |
+
with st.expander(f"{entity_type} ({len(entity_list)})", expanded=True):
|
908 |
+
for entity_id, entity_info in entity_list:
|
909 |
+
st.markdown(f"**{entity_id}**")
|
910 |
+
|
911 |
+
# Display properties if available
|
912 |
+
if "properties" in entity_info and entity_info["properties"]:
|
913 |
+
props_markdown = ", ".join([f"**{k}**: {v}" for k, v in entity_info["properties"].items()])
|
914 |
+
st.markdown(props_markdown)
|
915 |
+
|
916 |
+
st.markdown("---")
|
917 |
+
|
918 |
+
with tab3:
|
919 |
+
# Display textual summary of the path
|
920 |
+
st.markdown("### Path Description")
|
921 |
+
|
922 |
+
# If path_info has text, use it
|
923 |
+
if "text" in path_info and path_info["text"]:
|
924 |
+
st.markdown(f"**Path:** {path_info['text']}")
|
925 |
+
else:
|
926 |
+
# Otherwise, generate a description
|
927 |
+
path_steps = []
|
928 |
+
for edge in path:
|
929 |
+
source_id = edge["source"]
|
930 |
+
target_id = edge["target"]
|
931 |
+
relation = edge["type"]
|
932 |
+
|
933 |
+
# Get readable names if available
|
934 |
+
source_name = source_id
|
935 |
+
target_name = target_id
|
936 |
+
|
937 |
+
if source_id in entities and "properties" in entities[source_id]:
|
938 |
+
props = entities[source_id]["properties"]
|
939 |
+
if "name" in props:
|
940 |
+
source_name = props["name"]
|
941 |
+
|
942 |
+
if target_id in entities and "properties" in entities[target_id]:
|
943 |
+
props = entities[target_id]["properties"]
|
944 |
+
if "name" in props:
|
945 |
+
target_name = props["name"]
|
946 |
+
|
947 |
+
path_steps.append(f"{source_name} **{relation}** {target_name}")
|
948 |
+
|
949 |
+
st.markdown(" → ".join(path_steps))
|
950 |
+
|
951 |
+
# Display relevant business rules
|
952 |
+
relevant_rules = find_relevant_rules_for_path(path, ontology_manager)
|
953 |
+
if relevant_rules:
|
954 |
+
st.markdown("### Relevant Business Rules")
|
955 |
+
for rule in relevant_rules:
|
956 |
+
st.markdown(f"- **{rule['id']}**: {rule['description']}")
|
957 |
+
|
958 |
+
|
959 |
+
def display_path_visualization(path, entities):
|
960 |
+
"""Create an enhanced visual representation of the path."""
|
961 |
+
if not path:
|
962 |
+
st.info("Path is empty.")
|
963 |
+
return
|
964 |
+
|
965 |
+
# Create nodes and positions
|
966 |
+
nodes = []
|
967 |
+
x_positions = {}
|
968 |
+
|
969 |
+
# Collect all unique nodes in the path
|
970 |
+
unique_nodes = set()
|
971 |
+
for edge in path:
|
972 |
+
unique_nodes.add(edge["source"])
|
973 |
+
unique_nodes.add(edge["target"])
|
974 |
+
|
975 |
+
# Create ordered list of nodes
|
976 |
+
path_nodes = []
|
977 |
+
if path:
|
978 |
+
# Start with the first source
|
979 |
+
current_node = path[0]["source"]
|
980 |
+
path_nodes.append(current_node)
|
981 |
+
|
982 |
+
# Follow the path
|
983 |
+
for edge in path:
|
984 |
+
target = edge["target"]
|
985 |
+
path_nodes.append(target)
|
986 |
+
current_node = target
|
987 |
+
else:
|
988 |
+
# If no path, just use the unique nodes
|
989 |
+
path_nodes = list(unique_nodes)
|
990 |
+
|
991 |
+
# Assign positions along a line
|
992 |
+
for i, node_id in enumerate(path_nodes):
|
993 |
+
x_positions[node_id] = i
|
994 |
+
|
995 |
+
# Get node info
|
996 |
+
entity_info = entities.get(node_id, {})
|
997 |
+
properties = entity_info.get("properties", {})
|
998 |
+
entity_type = entity_info.get("class_type", entity_info.get("class", "Unknown"))
|
999 |
+
|
1000 |
+
# Get display name
|
1001 |
+
name = properties.get("name", node_id)
|
1002 |
+
|
1003 |
+
nodes.append({
|
1004 |
+
"id": node_id,
|
1005 |
+
"name": name,
|
1006 |
+
"type": entity_type,
|
1007 |
+
"properties": properties
|
1008 |
+
})
|
1009 |
+
|
1010 |
+
# Create Plotly figure for horizontal path
|
1011 |
+
fig = go.Figure()
|
1012 |
+
|
1013 |
+
# Add nodes
|
1014 |
+
node_x = []
|
1015 |
+
node_y = []
|
1016 |
+
node_text = []
|
1017 |
+
node_hover = []
|
1018 |
+
node_colors = []
|
1019 |
+
|
1020 |
+
# Color mapping for entity types
|
1021 |
+
color_map = {}
|
1022 |
+
for node in nodes:
|
1023 |
+
node_type = node["type"]
|
1024 |
+
if node_type not in color_map:
|
1025 |
+
# Assign colors from a categorical colorscale
|
1026 |
+
idx = len(color_map) % len(px.colors.qualitative.Plotly)
|
1027 |
+
color_map[node_type] = px.colors.qualitative.Plotly[idx]
|
1028 |
+
|
1029 |
+
for node in nodes:
|
1030 |
+
node_x.append(x_positions[node["id"]])
|
1031 |
+
node_y.append(0) # All nodes at y=0 for a horizontal path
|
1032 |
+
node_text.append(node["name"])
|
1033 |
+
|
1034 |
+
# Create detailed hover text
|
1035 |
+
hover = f"{node['id']}<br>{node['type']}"
|
1036 |
+
for k, v in node["properties"].items():
|
1037 |
+
hover += f"<br>{k}: {v}"
|
1038 |
+
node_hover.append(hover)
|
1039 |
+
|
1040 |
+
# Set node color by type
|
1041 |
+
node_colors.append(color_map.get(node["type"], "#7f7f7f"))
|
1042 |
+
|
1043 |
+
# Add node trace
|
1044 |
+
fig.add_trace(go.Scatter(
|
1045 |
+
x=node_x,
|
1046 |
+
y=node_y,
|
1047 |
+
mode="markers+text",
|
1048 |
+
marker=dict(
|
1049 |
+
size=30,
|
1050 |
+
color=node_colors,
|
1051 |
+
line=dict(width=2, color="DarkSlateGrey")
|
1052 |
+
),
|
1053 |
+
text=node_text,
|
1054 |
+
textposition="bottom center",
|
1055 |
+
hovertext=node_hover,
|
1056 |
+
hoverinfo="text",
|
1057 |
+
name="Entities"
|
1058 |
+
))
|
1059 |
+
|
1060 |
+
# Add edges with relationship labels
|
1061 |
+
for edge in path:
|
1062 |
+
source = edge["source"]
|
1063 |
+
target = edge["target"]
|
1064 |
+
edge_type = edge["type"]
|
1065 |
+
|
1066 |
+
source_pos = x_positions[source]
|
1067 |
+
target_pos = x_positions[target]
|
1068 |
+
|
1069 |
+
# Add edge line
|
1070 |
+
fig.add_trace(go.Scatter(
|
1071 |
+
x=[source_pos, target_pos],
|
1072 |
+
y=[0, 0],
|
1073 |
+
mode="lines",
|
1074 |
+
line=dict(width=2, color="#888"),
|
1075 |
+
hoverinfo="none",
|
1076 |
+
showlegend=False
|
1077 |
+
))
|
1078 |
+
|
1079 |
+
# Add relationship label above the line
|
1080 |
+
fig.add_trace(go.Scatter(
|
1081 |
+
x=[(source_pos + target_pos) / 2],
|
1082 |
+
y=[0.1], # Slightly above the line
|
1083 |
+
mode="text",
|
1084 |
+
text=[edge_type],
|
1085 |
+
textposition="top center",
|
1086 |
+
hoverinfo="none",
|
1087 |
+
showlegend=False
|
1088 |
+
))
|
1089 |
+
|
1090 |
+
# Update layout
|
1091 |
+
fig.update_layout(
|
1092 |
+
title="Path Visualization",
|
1093 |
+
showlegend=False,
|
1094 |
+
hovermode="closest",
|
1095 |
+
margin=dict(b=40, l=20, r=20, t=40),
|
1096 |
+
xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
|
1097 |
+
yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
|
1098 |
+
height=300,
|
1099 |
+
plot_bgcolor="white"
|
1100 |
+
)
|
1101 |
+
|
1102 |
+
# Add a legend for entity types
|
1103 |
+
for entity_type, color in color_map.items():
|
1104 |
+
fig.add_trace(go.Scatter(
|
1105 |
+
x=[None],
|
1106 |
+
y=[None],
|
1107 |
+
mode="markers",
|
1108 |
+
marker=dict(size=10, color=color),
|
1109 |
+
name=entity_type,
|
1110 |
+
showlegend=True
|
1111 |
+
))
|
1112 |
+
|
1113 |
+
fig.update_layout(legend=dict(
|
1114 |
+
orientation="h",
|
1115 |
+
yanchor="bottom",
|
1116 |
+
y=-0.3,
|
1117 |
+
xanchor="center",
|
1118 |
+
x=0.5
|
1119 |
+
))
|
1120 |
+
|
1121 |
+
st.plotly_chart(fig, use_container_width=True)
|
1122 |
+
|
1123 |
+
# Add step-by-step description
|
1124 |
+
st.markdown("### Step-by-Step Path")
|
1125 |
+
for i, edge in enumerate(path):
|
1126 |
+
source = edge["source"]
|
1127 |
+
target = edge["target"]
|
1128 |
+
relation = edge["type"]
|
1129 |
+
|
1130 |
+
# Get display names
|
1131 |
+
source_info = entities.get(source, {})
|
1132 |
+
target_info = entities.get(target, {})
|
1133 |
+
|
1134 |
+
source_name = source
|
1135 |
+
if "properties" in source_info and "name" in source_info["properties"]:
|
1136 |
+
source_name = source_info["properties"]["name"]
|
1137 |
+
|
1138 |
+
target_name = target
|
1139 |
+
if "properties" in target_info and "name" in target_info["properties"]:
|
1140 |
+
target_name = target_info["properties"]["name"]
|
1141 |
+
|
1142 |
+
st.markdown(f"**Step {i+1}:** {source_name} ({source}) **{relation}** {target_name} ({target})")
|
1143 |
+
|
1144 |
+
|
1145 |
+
def find_relevant_rules_for_path(path, ontology_manager):
|
1146 |
+
"""Find business rules relevant to the entities and relationships in a path."""
|
1147 |
+
rules = ontology_manager.ontology_data.get("rules", [])
|
1148 |
+
if not rules:
|
1149 |
+
return []
|
1150 |
+
|
1151 |
+
# Extract entities and relationships from the path
|
1152 |
+
entity_types = set()
|
1153 |
+
relationship_types = set()
|
1154 |
+
|
1155 |
+
for edge in path:
|
1156 |
+
source = edge["source"]
|
1157 |
+
target = edge["target"]
|
1158 |
+
relation = edge["type"]
|
1159 |
+
|
1160 |
+
# Get entity info
|
1161 |
+
source_info = ontology_manager.get_entity_info(source)
|
1162 |
+
target_info = ontology_manager.get_entity_info(target)
|
1163 |
+
|
1164 |
+
# Add entity types
|
1165 |
+
if "class_type" in source_info:
|
1166 |
+
entity_types.add(source_info["class_type"])
|
1167 |
+
|
1168 |
+
if "class_type" in target_info:
|
1169 |
+
entity_types.add(target_info["class_type"])
|
1170 |
+
|
1171 |
+
# Add relationship type
|
1172 |
+
relationship_types.add(relation)
|
1173 |
+
|
1174 |
+
# Find rules that mention these entities or relationships
|
1175 |
+
relevant_rules = []
|
1176 |
+
|
1177 |
+
for rule in rules:
|
1178 |
+
rule_text = json.dumps(rule).lower()
|
1179 |
+
|
1180 |
+
# Check if rule mentions any of the entity types or relationships
|
1181 |
+
is_relevant = False
|
1182 |
+
|
1183 |
+
for entity_type in entity_types:
|
1184 |
+
if entity_type.lower() in rule_text:
|
1185 |
+
is_relevant = True
|
1186 |
+
break
|
1187 |
+
|
1188 |
+
if not is_relevant:
|
1189 |
+
for rel_type in relationship_types:
|
1190 |
+
if rel_type.lower() in rule_text:
|
1191 |
+
is_relevant = True
|
1192 |
+
break
|
1193 |
+
|
1194 |
+
if is_relevant:
|
1195 |
+
relevant_rules.append(rule)
|
1196 |
+
|
1197 |
+
return relevant_rules
|
1198 |
+
|
1199 |
+
|
1200 |
+
def display_reasoning_trace(query: str, retrieved_docs: List[Dict], answer: str, ontology_manager):
|
1201 |
+
"""Display an enhanced trace of how ontological reasoning was used to answer the query."""
|
1202 |
+
st.subheader("🧠 Ontology-Enhanced Reasoning")
|
1203 |
+
|
1204 |
+
# Create a multi-tab interface for different aspects of reasoning
|
1205 |
+
tab1, tab2, tab3 = st.tabs(["Query Analysis", "Knowledge Retrieval", "Reasoning Path"])
|
1206 |
+
|
1207 |
+
with tab1:
|
1208 |
+
# Extract entity and relationship mentions with confidence
|
1209 |
+
entity_mentions, relationship_mentions = analyze_query_ontology_concepts(query, ontology_manager)
|
1210 |
+
|
1211 |
+
# Display detected entities with confidence scores
|
1212 |
+
if entity_mentions:
|
1213 |
+
st.markdown("### Entities Detected in Query")
|
1214 |
+
|
1215 |
+
# Convert to DataFrame for visualization
|
1216 |
+
entity_df = pd.DataFrame([{
|
1217 |
+
"Entity Type": e["type"],
|
1218 |
+
"Confidence": e["confidence"],
|
1219 |
+
"Description": e["description"]
|
1220 |
+
} for e in entity_mentions])
|
1221 |
+
|
1222 |
+
# Sort by confidence
|
1223 |
+
entity_df = entity_df.sort_values("Confidence", ascending=False)
|
1224 |
+
|
1225 |
+
# Create a horizontal bar chart
|
1226 |
+
fig = px.bar(entity_df,
|
1227 |
+
x="Confidence",
|
1228 |
+
y="Entity Type",
|
1229 |
+
orientation='h',
|
1230 |
+
title="Entity Type Detection Confidence",
|
1231 |
+
color="Confidence",
|
1232 |
+
color_continuous_scale="Blues",
|
1233 |
+
text="Confidence")
|
1234 |
+
|
1235 |
+
fig.update_traces(texttemplate='%{text:.0%}', textposition='outside')
|
1236 |
+
fig.update_layout(xaxis_tickformat=".0%")
|
1237 |
+
|
1238 |
+
st.plotly_chart(fig, use_container_width=True)
|
1239 |
+
|
1240 |
+
# Display descriptions
|
1241 |
+
st.subheader("Entity Type Descriptions")
|
1242 |
+
st.dataframe(
|
1243 |
+
entity_df[["Entity Type", "Description"]],
|
1244 |
+
hide_index=True
|
1245 |
+
)
|
1246 |
+
|
1247 |
+
# Display detected relationships
|
1248 |
+
if relationship_mentions:
|
1249 |
+
st.markdown("### Relationships Detected in Query")
|
1250 |
+
|
1251 |
+
# Convert to DataFrame
|
1252 |
+
rel_df = pd.DataFrame([{
|
1253 |
+
"Relationship": r["name"],
|
1254 |
+
"From": r["domain"],
|
1255 |
+
"To": r["range"],
|
1256 |
+
"Confidence": r["confidence"],
|
1257 |
+
"Description": r["description"]
|
1258 |
+
} for r in relationship_mentions])
|
1259 |
+
|
1260 |
+
# Sort by confidence
|
1261 |
+
rel_df = rel_df.sort_values("Confidence", ascending=False)
|
1262 |
+
|
1263 |
+
# Create visualization
|
1264 |
+
fig = px.bar(rel_df,
|
1265 |
+
x="Confidence",
|
1266 |
+
y="Relationship",
|
1267 |
+
orientation='h',
|
1268 |
+
title="Relationship Detection Confidence",
|
1269 |
+
color="Confidence",
|
1270 |
+
color_continuous_scale="Reds",
|
1271 |
+
text="Confidence")
|
1272 |
+
|
1273 |
+
fig.update_traces(texttemplate='%{text:.0%}', textposition='outside')
|
1274 |
+
fig.update_layout(xaxis_tickformat=".0%")
|
1275 |
+
|
1276 |
+
st.plotly_chart(fig, use_container_width=True)
|
1277 |
+
|
1278 |
+
# Display relationship details
|
1279 |
+
st.subheader("Relationship Details")
|
1280 |
+
st.dataframe(
|
1281 |
+
rel_df[["Relationship", "From", "To", "Description"]],
|
1282 |
+
hide_index=True
|
1283 |
+
)
|
1284 |
+
|
1285 |
+
with tab2:
|
1286 |
+
# Create an enhanced visualization of the retrieval process
|
1287 |
+
st.markdown("### Knowledge Retrieval Process")
|
1288 |
+
|
1289 |
+
# Group retrieved documents by source
|
1290 |
+
docs_by_source = defaultdict(list)
|
1291 |
+
for doc in retrieved_docs:
|
1292 |
+
if hasattr(doc, 'metadata'):
|
1293 |
+
source = doc.metadata.get('source', 'unknown')
|
1294 |
+
docs_by_source[source].append(doc)
|
1295 |
+
else:
|
1296 |
+
docs_by_source['unknown'].append(doc)
|
1297 |
+
|
1298 |
+
# Display retrieval visualization
|
1299 |
+
col1, col2 = st.columns([2, 1])
|
1300 |
+
|
1301 |
+
with col1:
|
1302 |
+
# Create a Sankey diagram to show flow from query to sources to answer
|
1303 |
+
display_retrieval_flow(query, docs_by_source)
|
1304 |
+
|
1305 |
+
with col2:
|
1306 |
+
# Display source distribution
|
1307 |
+
source_counts = {source: len(docs) for source, docs in docs_by_source.items()}
|
1308 |
+
|
1309 |
+
# Create a pie chart
|
1310 |
+
fig = px.pie(
|
1311 |
+
values=list(source_counts.values()),
|
1312 |
+
names=list(source_counts.keys()),
|
1313 |
+
title="Retrieved Context Sources",
|
1314 |
+
color_discrete_sequence=px.colors.qualitative.Plotly
|
1315 |
+
)
|
1316 |
+
|
1317 |
+
st.plotly_chart(fig, use_container_width=True)
|
1318 |
+
|
1319 |
+
# Display retrieved document details in expandable sections
|
1320 |
+
for source, docs in docs_by_source.items():
|
1321 |
+
with st.expander(f"{source.capitalize()} ({len(docs)})", expanded=source == "ontology_context"):
|
1322 |
+
for i, doc in enumerate(docs):
|
1323 |
+
# Add separator between documents
|
1324 |
+
if i > 0:
|
1325 |
+
st.markdown("---")
|
1326 |
+
|
1327 |
+
# Display document content
|
1328 |
+
if hasattr(doc, 'page_content'):
|
1329 |
+
st.markdown(f"**Content:**")
|
1330 |
+
|
1331 |
+
# Format depending on source
|
1332 |
+
if source in ["ontology", "ontology_context"]:
|
1333 |
+
st.markdown(doc.page_content)
|
1334 |
+
else:
|
1335 |
+
st.code(doc.page_content)
|
1336 |
+
|
1337 |
+
# Display metadata if present
|
1338 |
+
if hasattr(doc, 'metadata') and doc.metadata:
|
1339 |
+
st.markdown("**Metadata:**")
|
1340 |
+
for key, value in doc.metadata.items():
|
1341 |
+
if key != 'source': # Already shown in section title
|
1342 |
+
st.markdown(f"- **{key}**: {value}")
|
1343 |
+
|
1344 |
+
with tab3:
|
1345 |
+
# Show the reasoning flow from query to answer
|
1346 |
+
st.markdown("### Ontological Reasoning Process")
|
1347 |
+
|
1348 |
+
# Display reasoning steps
|
1349 |
+
reasoning_steps = generate_reasoning_steps(query, entity_mentions, relationship_mentions, retrieved_docs, answer)
|
1350 |
+
|
1351 |
+
for i, step in enumerate(reasoning_steps):
|
1352 |
+
with st.expander(f"Step {i+1}: {step['title']}", expanded=i == 0):
|
1353 |
+
st.markdown(step["description"])
|
1354 |
+
|
1355 |
+
# Visualization of how ontological structure influenced the answer
|
1356 |
+
st.markdown("### How Ontology Enhanced the Answer")
|
1357 |
+
|
1358 |
+
# Display ontology advantage explanation
|
1359 |
+
advantages = explain_ontology_advantages(entity_mentions, relationship_mentions)
|
1360 |
+
|
1361 |
+
for adv in advantages:
|
1362 |
+
st.markdown(f"**{adv['title']}**")
|
1363 |
+
st.markdown(adv["description"])
|
1364 |
+
|
1365 |
+
|
1366 |
+
def analyze_query_ontology_concepts(query: str, ontology_manager) -> Tuple[List[Dict], List[Dict]]:
|
1367 |
+
"""
|
1368 |
+
Analyze the query to identify ontology concepts with confidence scores.
|
1369 |
+
This is a simplified implementation that would be replaced with NLP in production.
|
1370 |
+
"""
|
1371 |
+
query_lower = query.lower().split()
|
1372 |
+
|
1373 |
+
# Entity detection
|
1374 |
+
entity_mentions = []
|
1375 |
+
classes = ontology_manager.get_classes()
|
1376 |
+
|
1377 |
+
for class_name in classes:
|
1378 |
+
# Simple token matching (would use NER in production)
|
1379 |
+
if class_name.lower() in query_lower:
|
1380 |
+
# Get class info
|
1381 |
+
class_info = ontology_manager.ontology_data["classes"].get(class_name, {})
|
1382 |
+
|
1383 |
+
# Assign a confidence score (this would be from an ML model in production)
|
1384 |
+
# Here we use a simple heuristic based on word length and specificity
|
1385 |
+
confidence = min(0.95, 0.5 + (len(class_name) / 20))
|
1386 |
+
|
1387 |
+
entity_mentions.append({
|
1388 |
+
"type": class_name,
|
1389 |
+
"confidence": confidence,
|
1390 |
+
"description": class_info.get("description", "")
|
1391 |
+
})
|
1392 |
+
|
1393 |
+
# Relationship detection
|
1394 |
+
relationship_mentions = []
|
1395 |
+
relationships = ontology_manager.ontology_data.get("relationships", [])
|
1396 |
+
|
1397 |
+
for rel in relationships:
|
1398 |
+
rel_name = rel["name"]
|
1399 |
+
|
1400 |
+
# Simple token matching
|
1401 |
+
if rel_name.lower() in query_lower:
|
1402 |
+
# Assign confidence
|
1403 |
+
confidence = min(0.9, 0.5 + (len(rel_name) / 20))
|
1404 |
+
|
1405 |
+
relationship_mentions.append({
|
1406 |
+
"name": rel_name,
|
1407 |
+
"domain": rel["domain"],
|
1408 |
+
"range": rel["range"],
|
1409 |
+
"confidence": confidence,
|
1410 |
+
"description": rel.get("description", "")
|
1411 |
+
})
|
1412 |
+
|
1413 |
+
return entity_mentions, relationship_mentions
|
1414 |
+
|
1415 |
+
|
1416 |
+
def display_retrieval_flow(query: str, docs_by_source: Dict[str, List]):
|
1417 |
+
"""Create a Sankey diagram showing the flow from query to sources to answer."""
|
1418 |
+
# Define node labels
|
1419 |
+
nodes = ["Query"]
|
1420 |
+
|
1421 |
+
# Add source nodes
|
1422 |
+
for source in docs_by_source.keys():
|
1423 |
+
nodes.append(f"Source: {source.capitalize()}")
|
1424 |
+
|
1425 |
+
nodes.append("Answer")
|
1426 |
+
|
1427 |
+
# Define links
|
1428 |
+
source_indices = []
|
1429 |
+
target_indices = []
|
1430 |
+
values = []
|
1431 |
+
|
1432 |
+
# Links from query to sources
|
1433 |
+
for i, (source, docs) in enumerate(docs_by_source.items()):
|
1434 |
+
source_indices.append(0) # Query is index 0
|
1435 |
+
target_indices.append(i + 1) # Source indices start at 1
|
1436 |
+
values.append(len(docs)) # Width based on number of docs
|
1437 |
+
|
1438 |
+
# Links from sources to answer
|
1439 |
+
for i in range(len(docs_by_source)):
|
1440 |
+
source_indices.append(i + 1) # Source index
|
1441 |
+
target_indices.append(len(nodes) - 1) # Answer is last node
|
1442 |
+
values.append(values[i]) # Same width as query to source
|
1443 |
+
|
1444 |
+
# Create Sankey diagram
|
1445 |
+
fig = go.Figure(data=[go.Sankey(
|
1446 |
+
node=dict(
|
1447 |
+
pad=15,
|
1448 |
+
thickness=20,
|
1449 |
+
line=dict(color="black", width=0.5),
|
1450 |
+
label=nodes,
|
1451 |
+
color=["#1f77b4"] + [px.colors.qualitative.Plotly[i % len(px.colors.qualitative.Plotly)]
|
1452 |
+
for i in range(len(docs_by_source))] + ["#2ca02c"]
|
1453 |
+
),
|
1454 |
+
link=dict(
|
1455 |
+
source=source_indices,
|
1456 |
+
target=target_indices,
|
1457 |
+
value=values
|
1458 |
+
)
|
1459 |
+
)])
|
1460 |
+
|
1461 |
+
fig.update_layout(
|
1462 |
+
title="Information Flow in RAG Process",
|
1463 |
+
font=dict(size=12)
|
1464 |
+
)
|
1465 |
+
|
1466 |
+
st.plotly_chart(fig, use_container_width=True)
|
1467 |
+
|
1468 |
+
|
1469 |
+
def generate_reasoning_steps(query: str, entity_mentions: List[Dict], relationship_mentions: List[Dict],
|
1470 |
+
retrieved_docs: List[Dict], answer: str) -> List[Dict]:
|
1471 |
+
"""Generate reasoning steps to explain how the system arrived at the answer."""
|
1472 |
+
steps = []
|
1473 |
+
|
1474 |
+
# Step 1: Query Understanding
|
1475 |
+
steps.append({
|
1476 |
+
"title": "Query Understanding",
|
1477 |
+
"description": f"""The system analyzes the query "{query}" and identifies key concepts from the ontology.
|
1478 |
+
{len(entity_mentions)} entity types and {len(relationship_mentions)} relationship types are recognized, allowing
|
1479 |
+
the system to understand the semantic context of the question."""
|
1480 |
+
})
|
1481 |
+
|
1482 |
+
# Step 2: Knowledge Retrieval
|
1483 |
+
if retrieved_docs:
|
1484 |
+
doc_count = len(retrieved_docs)
|
1485 |
+
ontology_count = sum(1 for doc in retrieved_docs if hasattr(doc, 'metadata') and
|
1486 |
+
doc.metadata.get('source', '') in ['ontology', 'ontology_context'])
|
1487 |
+
|
1488 |
+
steps.append({
|
1489 |
+
"title": "Knowledge Retrieval",
|
1490 |
+
"description": f"""Based on the identified concepts, the system retrieves {doc_count} relevant pieces of information,
|
1491 |
+
including {ontology_count} from the structured ontology. This hybrid approach combines traditional vector retrieval
|
1492 |
+
with ontology-aware semantic retrieval, enabling access to both explicit and implicit knowledge."""
|
1493 |
+
})
|
1494 |
+
|
1495 |
+
# Step 3: Relationship Traversal
|
1496 |
+
if relationship_mentions:
|
1497 |
+
rel_names = [r["name"] for r in relationship_mentions]
|
1498 |
+
steps.append({
|
1499 |
+
"title": "Relationship Traversal",
|
1500 |
+
"description": f"""The system identifies key relationships in the ontology: {', '.join(rel_names)}.
|
1501 |
+
By traversing these relationships, the system can connect concepts that might not appear together in the same text,
|
1502 |
+
allowing for multi-hop reasoning across the knowledge graph."""
|
1503 |
+
})
|
1504 |
+
|
1505 |
+
# Step 4: Ontological Inference
|
1506 |
+
if entity_mentions:
|
1507 |
+
entity_types = [e["type"] for e in entity_mentions]
|
1508 |
+
steps.append({
|
1509 |
+
"title": "Ontological Inference",
|
1510 |
+
"description": f"""Using the hierarchical structure of entities like {', '.join(entity_types)},
|
1511 |
+
the system makes inferences based on class inheritance and relationship constraints defined in the ontology.
|
1512 |
+
This allows it to reason about properties and relationships that might not be explicitly stated."""
|
1513 |
+
})
|
1514 |
+
|
1515 |
+
# Step 5: Answer Generation
|
1516 |
+
steps.append({
|
1517 |
+
"title": "Answer Synthesis",
|
1518 |
+
"description": f"""Finally, the system synthesizes the retrieved information and ontological knowledge to generate a comprehensive answer.
|
1519 |
+
The structured nature of the ontology ensures that the answer accurately reflects the relationships between concepts
|
1520 |
+
and respects the business rules defined in the knowledge model."""
|
1521 |
+
})
|
1522 |
+
|
1523 |
+
return steps
|
1524 |
+
|
1525 |
+
|
1526 |
+
def explain_ontology_advantages(entity_mentions: List[Dict], relationship_mentions: List[Dict]) -> List[Dict]:
|
1527 |
+
"""Explain how ontology enhanced the RAG process."""
|
1528 |
+
advantages = []
|
1529 |
+
|
1530 |
+
if entity_mentions:
|
1531 |
+
advantages.append({
|
1532 |
+
"title": "Hierarchical Knowledge Representation",
|
1533 |
+
"description": """The ontology provides a hierarchical class structure that enables the system to understand
|
1534 |
+
that concepts are related through is-a relationships. For instance, knowing that a Manager is an Employee
|
1535 |
+
allows the system to apply Employee-related knowledge when answering questions about Managers, even if
|
1536 |
+
the specific information was only stated for Employees in general."""
|
1537 |
+
})
|
1538 |
+
|
1539 |
+
if relationship_mentions:
|
1540 |
+
advantages.append({
|
1541 |
+
"title": "Explicit Relationship Semantics",
|
1542 |
+
"description": """The ontology defines explicit relationships between concepts with clear semantics.
|
1543 |
+
This allows the system to understand how entities are connected beyond simple co-occurrence in text.
|
1544 |
+
For example, understanding that 'ownedBy' connects Products to Departments helps answer questions
|
1545 |
+
about product ownership and departmental responsibilities."""
|
1546 |
+
})
|
1547 |
+
|
1548 |
+
advantages.append({
|
1549 |
+
"title": "Constraint-Based Reasoning",
|
1550 |
+
"description": """Business rules in the ontology provide constraints that guide the reasoning process.
|
1551 |
+
These rules ensure the system's answers are consistent with the organization's policies and practices.
|
1552 |
+
For instance, rules about approval workflows or data classification requirements can inform answers
|
1553 |
+
about process-related questions."""
|
1554 |
+
})
|
1555 |
+
|
1556 |
+
advantages.append({
|
1557 |
+
"title": "Cross-Domain Knowledge Integration",
|
1558 |
+
"description": """The ontology connects concepts across different domains of the enterprise, enabling
|
1559 |
+
integrated reasoning that traditional document-based retrieval might miss. This allows the system to
|
1560 |
+
answer questions that span organizational boundaries, such as how marketing decisions affect product
|
1561 |
+
development or how customer feedback influences business strategy."""
|
1562 |
+
})
|
1563 |
+
|
1564 |
+
return advantages
|
static/css/styles.css
ADDED
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
/* Custom styling for ontology-RAG application */
|
2 |
+
|
3 |
+
/* Main container styles */
|
4 |
+
.main-container {
|
5 |
+
padding: 20px;
|
6 |
+
max-width: 1200px;
|
7 |
+
margin: 0 auto;
|
8 |
+
}
|
9 |
+
|
10 |
+
/* Enhance visualization elements */
|
11 |
+
.vis-network {
|
12 |
+
border: 1px solid #ddd;
|
13 |
+
border-radius: 8px;
|
14 |
+
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
|
15 |
+
}
|
16 |
+
|
17 |
+
/* Custom tooltip styling */
|
18 |
+
.vis-tooltip {
|
19 |
+
position: absolute;
|
20 |
+
background-color: rgba(255, 255, 255, 0.95);
|
21 |
+
border: 1px solid #ccc;
|
22 |
+
border-radius: 5px;
|
23 |
+
padding: 12px;
|
24 |
+
font-family: Arial, sans-serif;
|
25 |
+
font-size: 13px;
|
26 |
+
color: #333;
|
27 |
+
max-width: 350px;
|
28 |
+
z-index: 9999;
|
29 |
+
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.15);
|
30 |
+
}
|
31 |
+
|
32 |
+
/* Enhance legend appearance */
|
33 |
+
.graph-legend {
|
34 |
+
background-color: rgba(255, 255, 255, 0.9) !important;
|
35 |
+
border: 1px solid #eee !important;
|
36 |
+
border-radius: 8px !important;
|
37 |
+
box-shadow: 0 2px 6px rgba(0, 0, 0, 0.1) !important;
|
38 |
+
}
|
39 |
+
|
40 |
+
/* Styling for entity detail cards */
|
41 |
+
.entity-detail-card {
|
42 |
+
border: 1px solid #eee;
|
43 |
+
border-radius: 5px;
|
44 |
+
padding: 15px;
|
45 |
+
margin-bottom: 15px;
|
46 |
+
box-shadow: 0 2px 4px rgba(0, 0, 0, 0.05);
|
47 |
+
}
|
48 |
+
|
49 |
+
/* Highlight for central entities */
|
50 |
+
.central-entity {
|
51 |
+
border-left: 4px solid #ff7f0e;
|
52 |
+
padding-left: 12px;
|
53 |
+
}
|
54 |
+
|
55 |
+
/* Enhanced path visualization */
|
56 |
+
.path-step {
|
57 |
+
padding: 8px;
|
58 |
+
margin: 8px 0;
|
59 |
+
border-left: 3px solid #1f77b4;
|
60 |
+
background-color: #f8f9fa;
|
61 |
+
}
|
62 |
+
|
63 |
+
/* Customization for Streamlit components */
|
64 |
+
.stButton button {
|
65 |
+
border-radius: 20px;
|
66 |
+
padding: 5px 15px;
|
67 |
+
}
|
68 |
+
|
69 |
+
.stSelectbox label {
|
70 |
+
font-weight: 500;
|
71 |
+
}
|
72 |
+
|
73 |
+
/* Tabs customization */
|
74 |
+
.streamlit-tabs .stTabs [role="tab"] {
|
75 |
+
font-size: 15px;
|
76 |
+
padding: 8px 16px;
|
77 |
+
}
|
78 |
+
|
79 |
+
/* Expander customization */
|
80 |
+
.streamlit-expanderContent {
|
81 |
+
border-left: 1px solid #ddd;
|
82 |
+
padding-left: 10px;
|
83 |
+
}
|