Update README_EN.md
Browse files- README_EN.md +15 -2
README_EN.md
CHANGED
@@ -178,8 +178,11 @@ Here [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC
|
|
178 |
|
179 |
# 5.Convert script
|
180 |
|
181 |
-
|
182 |
|
|
|
|
|
|
|
183 |
|
184 |
|
185 |
```bash
|
@@ -196,8 +199,11 @@ python kg2instruction/convert.py \
|
|
196 |
```
|
197 |
|
198 |
|
|
|
|
|
199 |
The `schema_path` specifies the path to a schema file (a JSON file). The schema file consists of three lines of JSON strings, organized in a fixed format. Taking Named Entity Recognition (NER) as an example, the meanings of each line are as follows:
|
200 |
|
|
|
201 |
```
|
202 |
["BookTitle", "Address", "Movie", ...] # List of entity types
|
203 |
[] # Empty list
|
@@ -238,8 +244,13 @@ For Event Extraction with Arguments (EEA) tasks:
|
|
238 |
</details>
|
239 |
|
240 |
|
|
|
|
|
|
|
|
|
|
|
|
|
241 |
|
242 |
-
[convert_test.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert_test.py) does not require data to have label (`entity`, `relation`, `event`) fields, only needs to have an `input` field and provide a `schema_path` is suitable for processing test data.
|
243 |
|
244 |
```bash
|
245 |
python kg2instruction/convert_test.py \
|
@@ -251,6 +262,8 @@ python kg2instruction/convert_test.py \
|
|
251 |
--sample 0
|
252 |
```
|
253 |
|
|
|
|
|
254 |
|
255 |
Here is an example of data conversion for Named Entity Recognition (NER) task:
|
256 |
|
|
|
178 |
|
179 |
# 5.Convert script
|
180 |
|
181 |
+
**Training Data Transformation**
|
182 |
|
183 |
+
Before inputting data into the model, it needs to be formatted to include `instruction` and `input` fields. To assist with this, we offer a script [kg2instruction/convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py), which can batch convert data into a format directly usable by the model.
|
184 |
+
|
185 |
+
> Before using the [kg2instruction/convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py) script, please ensure you have referred to the [data](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/data) directory. Please consult `sample.json` to understand the format of the data before conversion, `schema.json` illustrates the organization of the schema, and `processed.json` describes the format of the data after conversion.
|
186 |
|
187 |
|
188 |
```bash
|
|
|
199 |
```
|
200 |
|
201 |
|
202 |
+
**Negative Sampling**: Assuming dataset A contains labels [a, b, c, d, e, f], for a given sample s, it might involve only labels a and b. Our objective is to randomly introduce some relationships from the candidate relationship list that were originally unrelated to s, such as c and d. However, it's worth noting that in the output, the labels for c and d either won't be included, or they will be output as `NAN`.
|
203 |
+
|
204 |
The `schema_path` specifies the path to a schema file (a JSON file). The schema file consists of three lines of JSON strings, organized in a fixed format. Taking Named Entity Recognition (NER) as an example, the meanings of each line are as follows:
|
205 |
|
206 |
+
|
207 |
```
|
208 |
["BookTitle", "Address", "Movie", ...] # List of entity types
|
209 |
[] # Empty list
|
|
|
244 |
</details>
|
245 |
|
246 |
|
247 |
+
For more detailed information on the schema file, you can refer to the `schema.json` file in the respective task directories under the [data](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/data) directory.
|
248 |
+
|
249 |
+
|
250 |
+
**Testing Data Transformation**
|
251 |
+
|
252 |
+
For test data, you can use the [kg2instruction/convert_test.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert_test.py) script, which does not require the data to contain label fields (`entity`, `relation`, `event`), just the input field and the corresponding schema_path.
|
253 |
|
|
|
254 |
|
255 |
```bash
|
256 |
python kg2instruction/convert_test.py \
|
|
|
262 |
--sample 0
|
263 |
```
|
264 |
|
265 |
+
**Data Transformation Examples**
|
266 |
+
|
267 |
|
268 |
Here is an example of data conversion for Named Entity Recognition (NER) task:
|
269 |
|