Update README.md
Browse files
README.md
CHANGED
@@ -5,3 +5,108 @@ language:
|
|
5 |
- ru
|
6 |
library_name: transformers
|
7 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
- ru
|
6 |
library_name: transformers
|
7 |
---
|
8 |
+
|
9 |
+
---
|
10 |
+
|
11 |
+
# IsaNLP RST Parser v3
|
12 |
+
|
13 |
+
This repository hosts several versions of the IsaNLP RST Parser. For more details, visit the [GitHub repository](https://github.com/tchewik/isanlp_rst).
|
14 |
+
|
15 |
+
## Performance
|
16 |
+
|
17 |
+
The following table summarizes the end-to-end performance metrics of different model versions across various corpora:
|
18 |
+
|
19 |
+
### Corpora
|
20 |
+
- **English:** GUM<sub>9.1</sub>, RST-DT
|
21 |
+
- **Russian:** RRT<sub>2.1</sub>, RRG<sub>GUM-9.1</sub>
|
22 |
+
|
23 |
+
| Tag | Language | Train Data | Test Data | Seg | S | N | R | Full |
|
24 |
+
|--------------|------------|-------------|-------------|------|------|------|------|-------|
|
25 |
+
| `gumrrg` | En, Ru | GUM, RRG | GUM | 95.5 | 67.4 | 56.2 | 49.6 | 48.7 |
|
26 |
+
| | | | RRG | 97.0 | 67.1 | 54.6 | 46.5 | 45.4 |
|
27 |
+
| `rstdt` | En | RST-DT | RST-DT | 97.8 | 75.6 | 65.0 | 55.6 | 53.9 |
|
28 |
+
| `rstreebank` | Ru | RRT | RRT | 92.1 | 66.2 | 53.1 | 46.1 | 46.2 |
|
29 |
+
|
30 |
+
## Usage
|
31 |
+
|
32 |
+
To use the IsaNLP RST Parser with Hugging Face, follow these steps:
|
33 |
+
|
34 |
+
1. **Install the necessary Python package:**
|
35 |
+
|
36 |
+
You will need the `isanlp` library, which is available via pip:
|
37 |
+
|
38 |
+
```bash
|
39 |
+
pip install isanlp
|
40 |
+
```
|
41 |
+
|
42 |
+
2. **Example code for parsing RST:**
|
43 |
+
|
44 |
+
The following Python code demonstrates how to run a specific version of the parser using the Hugging Face model:
|
45 |
+
|
46 |
+
```python
|
47 |
+
from isanlp.processor_rst3 import ProcessorRST3
|
48 |
+
|
49 |
+
# Define the version of the model you want to use
|
50 |
+
version = 'gumrrg' # from {'gumrrg', 'rstdt', 'rstreebank'}
|
51 |
+
|
52 |
+
# Initialize the parser with the desired version
|
53 |
+
parser = ProcessorRST3(hf_model_name='tchewik/isanlp_rst_v3', hf_model_version=version, cuda_device=0)
|
54 |
+
|
55 |
+
# Example text for parsing
|
56 |
+
text = """
|
57 |
+
On Saturday, in the ninth edition of the T20 Men's Cricket World Cup, Team India won against South Africa by seven runs.
|
58 |
+
The final match was played at the Kensington Oval Stadium in Barbados. This marks India's second win in the T20 World Cup,
|
59 |
+
which was co-hosted by the West Indies and the USA between June 2 and June 29.
|
60 |
+
|
61 |
+
After winning the toss, India decided to bat first and scored 176 runs for the loss of seven wickets.
|
62 |
+
Virat Kohli top-scored with 76 runs, followed by Axar Patel with 47 runs. Hardik Pandya took three wickets,
|
63 |
+
and Jasprit Bumrah took two wickets.
|
64 |
+
"""
|
65 |
+
|
66 |
+
# Parse the text to obtain the RST tree
|
67 |
+
res = parser(text) # res['rst'] contains the binary discourse tree
|
68 |
+
|
69 |
+
# Display the structure of the RST tree
|
70 |
+
vars(res['rst'])
|
71 |
+
```
|
72 |
+
|
73 |
+
3. **Understanding the Output:**
|
74 |
+
|
75 |
+
```python
|
76 |
+
{
|
77 |
+
'id': 7,
|
78 |
+
'left': <isanlp.annotation_rst.DiscourseUnit at 0x7f771076add0>,
|
79 |
+
'right': <isanlp.annotation_rst.DiscourseUnit at 0x7f7750b93d30>,
|
80 |
+
'relation': 'elaboration',
|
81 |
+
'nuclearity': 'NS',
|
82 |
+
'start': 0,
|
83 |
+
'end': 336,
|
84 |
+
'text': "On Saturday, ... took two wickets .",
|
85 |
+
}
|
86 |
+
```
|
87 |
+
|
88 |
+
- **id**: A unique identifier for the discourse unit.
|
89 |
+
- **left** and **right**: The left and right children of the current discourse unit.
|
90 |
+
- **relation**: The rhetorical relation between the two sub-units. In this example, the relation is "elaboration," indicating that one part provides additional detail about the other.
|
91 |
+
- **nuclearity**: Indicates the nuclearity of the relation. "NS" means that the left unit is the nucleus (N) and the right unit is the satellite (S).
|
92 |
+
- **start** and **end**: The character offsets in the text for this discourse unit.
|
93 |
+
- **text**: The text span corresponding to this discourse unit.
|
94 |
+
|
95 |
+
4. **(Optional) Save the result in RS3 format:**
|
96 |
+
|
97 |
+
If you wish to save the resulting RST tree in the *.rs3 file, you can easily do so using the following command:
|
98 |
+
|
99 |
+
```python
|
100 |
+
# Export the RST tree to an RS3 file
|
101 |
+
res['rst'][0].to_rs3('filename.rs3')
|
102 |
+
```
|
103 |
+
|
104 |
+
The `filename.rs3` can then be opened in RSTTool or rstWeb for visualization or editing:
|
105 |
+
|
106 |
+
![RST Example](https://huggingface.co/username/your-model/resolve/main/example-image.png)
|
107 |
+
|
108 |
+
## Citation
|
109 |
+
|
110 |
+
If you use the IsaNLP RST Parser in your research, please cite our work as follows:
|
111 |
+
|
112 |
+
- **For versions `gumrrg`, `rstdt`, and `rstreebank`:** `TBA`
|