Update README.md
Browse files
README.md
CHANGED
@@ -18,16 +18,25 @@ ICKG (Integrated Contextual Knowledge Graph Generator) v3.2 is a knowledge graph
|
|
18 |
- **Website**: [https://xiaohui-victor-li.github.io/FinDKG/](https://xiaohui-victor-li.github.io/FinDKG/)
|
19 |
- **Paper**: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4608445](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4608445)
|
20 |
|
21 |
-
##
|
|
|
22 |
The primary use of ICKG LLM is for generating knowledge graphs (KG) based on instruction-following capability with specialized prompts. It's intended for researchers, data scientists, and developers interested in natural language processing, and knowledge graph construction.
|
23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
## How to Get Started with the Model
|
25 |
- **Python Code**: [https://github.com/xiaohui-victor-li/FinDKG](https://github.com/xiaohui-victor-li/FinDKG)
|
26 |
|
27 |
## Training Details
|
28 |
ICKG v3.2 is fine-tuned from the latest Mistral-7B using ~5K instruction-following demonstrations including KG construction input document and extracted KG triplets as response output. ICKG is thus learnt to extract list of KG triplets from given text document via prompt engineering. For more in-depth training details, refer to the "Generative Knowledge Graph Construction with Fine-tuned LLM" section of [the accompanying paper](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4608445).
|
29 |
|
30 |
-
- **Prompt Template**:
|
|
|
|
|
31 |
|
32 |
```
|
33 |
From the provided document labeled as INPUT_TEXT, your task is to extract structured information from it in the form of triplet for constructing a knowledge graph. Each tuple should be in the form of ('h', 'type', 'r', 'o', 'type'), where 'h' stands for the head entity, 'r' for the relationship, and 'o' for the tail entity. The 'type' denotes the category of the corresponding entity. Do NOT include redundant triplets, NOT include triplets with relationship that occurs in the past.
|
@@ -64,6 +73,23 @@ ICKG v3.2 is fine-tuned from the latest Mistral-7B using ~5K instruction-followi
|
|
64 |
<input_text>
|
65 |
```
|
66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
## Evaluation
|
68 |
|
69 |
ICKG-v3.2 has undergone preliminary evaluation comparing its performance to GPT-3.5, GPT-4, Vicuna-7B, the original Mistral-7B, and its early model variations (e.g., ICKG-v2.0). With respect to the KG construction task, it outperforms GPT-3.5, Vicuna-7B, and Mistral-7B, while exhibiting comparative capability as GPT-4. ICKG excels in generating instruction-based knowledge graphs with a particular emphasis on quality and adherence to format.
|
|
|
18 |
- **Website**: [https://xiaohui-victor-li.github.io/FinDKG/](https://xiaohui-victor-li.github.io/FinDKG/)
|
19 |
- **Paper**: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4608445](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4608445)
|
20 |
|
21 |
+
## Use Guidance
|
22 |
+
|
23 |
The primary use of ICKG LLM is for generating knowledge graphs (KG) based on instruction-following capability with specialized prompts. It's intended for researchers, data scientists, and developers interested in natural language processing, and knowledge graph construction.
|
24 |
|
25 |
+
- Generative Knowledge Graph Construction (KGC) refers to the process employing LLMs to system- atically extract entities and relationships from textual data via given prompts, subse- quently assembling them into event triplets (see [Li, [2023]](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4608445) for details).
|
26 |
+
|
27 |
+
- Aspect-Based Sentiment Analysis (ABSA) represents a refined facet of sentiment analysis that specifically targets the sentiments associated with distinct aspects or attributes within a text. This granular approach is crucial for applications where understanding nuanced opinions about specific features is essential. ABSA not only discerns the overall sentiment of the text but also pinpoints and evaluates sentiments related to individual aspects mentioned within the document.
|
28 |
+
|
29 |
+
|
30 |
+
|
31 |
## How to Get Started with the Model
|
32 |
- **Python Code**: [https://github.com/xiaohui-victor-li/FinDKG](https://github.com/xiaohui-victor-li/FinDKG)
|
33 |
|
34 |
## Training Details
|
35 |
ICKG v3.2 is fine-tuned from the latest Mistral-7B using ~5K instruction-following demonstrations including KG construction input document and extracted KG triplets as response output. ICKG is thus learnt to extract list of KG triplets from given text document via prompt engineering. For more in-depth training details, refer to the "Generative Knowledge Graph Construction with Fine-tuned LLM" section of [the accompanying paper](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4608445).
|
36 |
|
37 |
+
- **Prompt Template**:
|
38 |
+
|
39 |
+
[ Generative Knowledge Graph Construction ]: The entities and relationship can be customized for specific tasks. `<input_text>` is the document text to replace.
|
40 |
|
41 |
```
|
42 |
From the provided document labeled as INPUT_TEXT, your task is to extract structured information from it in the form of triplet for constructing a knowledge graph. Each tuple should be in the form of ('h', 'type', 'r', 'o', 'type'), where 'h' stands for the head entity, 'r' for the relationship, and 'o' for the tail entity. The 'type' denotes the category of the corresponding entity. Do NOT include redundant triplets, NOT include triplets with relationship that occurs in the past.
|
|
|
73 |
<input_text>
|
74 |
```
|
75 |
|
76 |
+
|
77 |
+
[ Aspect-Based Sentiment Analysis ]: `<input_text>` is the document text to replace, `<input_ent_set>` refers to a list of the aspects (entities) of interest to identify the associated sentiment score.
|
78 |
+
|
79 |
+
```
|
80 |
+
Act as if you are a senior financial analyst, from the provided news article labeled as 'INPUT_TEXT', your task is to analyze and extract sentiment scores for specific key entities. These key entities are marked as 'KEY_ENTITY' in the text.
|
81 |
+
You are required to evaluate the sentiment surrounding each of these key entities within the context of the transcript. The sentiment score should be a continuous value ranging from -1 (most negative) to +1 (most positive), with 0 representing a neutral sentiment. For each key entity, you will present the results in a JSON format where the entity name is the key, and the sentiment score is the value. Ensure the scores accurately reflect the sentiment expressed in the transcript concerning each key entity. ONLY output the JSON result.
|
82 |
+
========== Example ==============
|
83 |
+
"Global markets experienced volatility this week, with tech stocks taking a significant hit due to rising interest rates. However, the energy sector showed resilience, buoyed by increasing oil prices. Meanwhile, consumer confidence remained neutral despite economic uncertainties."
|
84 |
+
Key Entities: Tech Stocks, Energy Sector, Consumer Confidence
|
85 |
+
Your formatted output should be: { "Tech Stocks": -0.8, "Energy Sector": 0.6, "Consumer Confidence": 0 }
|
86 |
+
=================================
|
87 |
+
INPUT_TEXT: <input_doc>
|
88 |
+
KEY_ENT: <input_ent_set>
|
89 |
+
```
|
90 |
+
|
91 |
+
|
92 |
+
|
93 |
## Evaluation
|
94 |
|
95 |
ICKG-v3.2 has undergone preliminary evaluation comparing its performance to GPT-3.5, GPT-4, Vicuna-7B, the original Mistral-7B, and its early model variations (e.g., ICKG-v2.0). With respect to the KG construction task, it outperforms GPT-3.5, Vicuna-7B, and Mistral-7B, while exhibiting comparative capability as GPT-4. ICKG excels in generating instruction-based knowledge graphs with a particular emphasis on quality and adherence to format.
|