tchewik commited on
Commit
b5d0e70
1 Parent(s): bfe021a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -0
README.md CHANGED
@@ -5,3 +5,108 @@ language:
5
  - ru
6
  library_name: transformers
7
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - ru
6
  library_name: transformers
7
  ---
8
+
9
+ ---
10
+
11
+ # IsaNLP RST Parser v3
12
+
13
+ This repository hosts several versions of the IsaNLP RST Parser. For more details, visit the [GitHub repository](https://github.com/tchewik/isanlp_rst).
14
+
15
+ ## Performance
16
+
17
+ The following table summarizes the end-to-end performance metrics of different model versions across various corpora:
18
+
19
+ ### Corpora
20
+ - **English:** GUM<sub>9.1</sub>, RST-DT
21
+ - **Russian:** RRT<sub>2.1</sub>, RRG<sub>GUM-9.1</sub>
22
+
23
+ | Tag | Language | Train Data | Test Data | Seg | S | N | R | Full |
24
+ |--------------|------------|-------------|-------------|------|------|------|------|-------|
25
+ | `gumrrg` | En, Ru | GUM, RRG | GUM | 95.5 | 67.4 | 56.2 | 49.6 | 48.7 |
26
+ | | | | RRG | 97.0 | 67.1 | 54.6 | 46.5 | 45.4 |
27
+ | `rstdt` | En | RST-DT | RST-DT | 97.8 | 75.6 | 65.0 | 55.6 | 53.9 |
28
+ | `rstreebank` | Ru | RRT | RRT | 92.1 | 66.2 | 53.1 | 46.1 | 46.2 |
29
+
30
+ ## Usage
31
+
32
+ To use the IsaNLP RST Parser with Hugging Face, follow these steps:
33
+
34
+ 1. **Install the necessary Python package:**
35
+
36
+ You will need the `isanlp` library, which is available via pip:
37
+
38
+ ```bash
39
+ pip install isanlp
40
+ ```
41
+
42
+ 2. **Example code for parsing RST:**
43
+
44
+ The following Python code demonstrates how to run a specific version of the parser using the Hugging Face model:
45
+
46
+ ```python
47
+ from isanlp.processor_rst3 import ProcessorRST3
48
+
49
+ # Define the version of the model you want to use
50
+ version = 'gumrrg' # from {'gumrrg', 'rstdt', 'rstreebank'}
51
+
52
+ # Initialize the parser with the desired version
53
+ parser = ProcessorRST3(hf_model_name='tchewik/isanlp_rst_v3', hf_model_version=version, cuda_device=0)
54
+
55
+ # Example text for parsing
56
+ text = """
57
+ On Saturday, in the ninth edition of the T20 Men's Cricket World Cup, Team India won against South Africa by seven runs.
58
+ The final match was played at the Kensington Oval Stadium in Barbados. This marks India's second win in the T20 World Cup,
59
+ which was co-hosted by the West Indies and the USA between June 2 and June 29.
60
+
61
+ After winning the toss, India decided to bat first and scored 176 runs for the loss of seven wickets.
62
+ Virat Kohli top-scored with 76 runs, followed by Axar Patel with 47 runs. Hardik Pandya took three wickets,
63
+ and Jasprit Bumrah took two wickets.
64
+ """
65
+
66
+ # Parse the text to obtain the RST tree
67
+ res = parser(text) # res['rst'] contains the binary discourse tree
68
+
69
+ # Display the structure of the RST tree
70
+ vars(res['rst'])
71
+ ```
72
+
73
+ 3. **Understanding the Output:**
74
+
75
+ ```python
76
+ {
77
+ 'id': 7,
78
+ 'left': <isanlp.annotation_rst.DiscourseUnit at 0x7f771076add0>,
79
+ 'right': <isanlp.annotation_rst.DiscourseUnit at 0x7f7750b93d30>,
80
+ 'relation': 'elaboration',
81
+ 'nuclearity': 'NS',
82
+ 'start': 0,
83
+ 'end': 336,
84
+ 'text': "On Saturday, ... took two wickets .",
85
+ }
86
+ ```
87
+
88
+ - **id**: A unique identifier for the discourse unit.
89
+ - **left** and **right**: The left and right children of the current discourse unit.
90
+ - **relation**: The rhetorical relation between the two sub-units. In this example, the relation is "elaboration," indicating that one part provides additional detail about the other.
91
+ - **nuclearity**: Indicates the nuclearity of the relation. "NS" means that the left unit is the nucleus (N) and the right unit is the satellite (S).
92
+ - **start** and **end**: The character offsets in the text for this discourse unit.
93
+ - **text**: The text span corresponding to this discourse unit.
94
+
95
+ 4. **(Optional) Save the result in RS3 format:**
96
+
97
+ If you wish to save the resulting RST tree in the *.rs3 file, you can easily do so using the following command:
98
+
99
+ ```python
100
+ # Export the RST tree to an RS3 file
101
+ res['rst'][0].to_rs3('filename.rs3')
102
+ ```
103
+
104
+ The `filename.rs3` can then be opened in RSTTool or rstWeb for visualization or editing:
105
+
106
+ ![RST Example](https://huggingface.co/username/your-model/resolve/main/example-image.png)
107
+
108
+ ## Citation
109
+
110
+ If you use the IsaNLP RST Parser in your research, please cite our work as follows:
111
+
112
+ - **For versions `gumrrg`, `rstdt`, and `rstreebank`:** `TBA`