File size: 4,480 Bytes
64a2a87 112f4e7 64a2a87 4163212 64a2a87 4163212 64a2a87 61de029 64a2a87 98b0d0e 64a2a87 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
datasets:
- Herelles/lupan
language:
- fr
tags:
- text classification
- pytorch
- camembert
- urban planning
- natural risks
- risk management
- geography
inference: false
---
# CamemBERT LUPAN (Local Urban Plans And Natural risks)
## Overview
In France, urban planning and natural risk management operate the Local Land Plans (PLU – Plan Local d'Urbanisme) and the Natural risk prevention plans (PPRn – Plan de Prévention des Risques naturels) containing land use rules. To facilitate automatic extraction of the rules, we manually annotated a number of those documents concerning Montpellier, a rapidly evolving agglomeration exposed to natural risks, then fine-tuned a model.
This model classifies input text in French to determine if it contains an urban planning rule. It outputs one of 4 classes: Verifiable (indicating the possibility of verification with satellite images), Non-verifiable (indicating impossibility of verification with satellite images), Informative (containing non-strict rules in the form of recommendations), and Not pertinent (absence of any of the above rules). For better quality results, it is recommended to add a title and a subtitle to each textual input.
For more details please refer to our article: https://www.nature.com/articles/s41597-023-02705-y
## Training and evaluation data
The model is fine-tuned on top of CamemBERT using our corpus:
https://huggingface.co/datasets/Herelles/lupan
This is the first corpus in the French language in the fields of urban planning and natural risk management.
## Example of use
Attention: to run this code you need to have intalled `transformers`, `torch` and `numpy`. You can do it with `pip install transformers torch numpy`
Load necessary libraries:
```
from transformers import CamembertTokenizer, CamembertForSequenceClassification
import torch
import numpy as np
```
Define tokenizer:
```
tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
```
Define the model:
```
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = CamembertForSequenceClassification.from_pretrained("herelles/camembert-base-lupan")
model.to(device)
```
Define segment to predict:
```
new_segment = '''Article 1 : Occupations ou utilisations du sol interdites
1) Dans l’ensemble de la zone sont interdits :
Les constructions destinées à l’habitation ne dépendant pas d’une exploitation agricole autres
que celles visées à l’article 2 paragraphe 1).'''
```
Get the prediction:
```
test_ids = []
test_attention_mask = []
# Apply the tokenizer
encoding = tokenizer(new_segment, padding="longest", return_tensors="pt")
# Extract IDs and Attention Mask
test_ids.append(encoding['input_ids'])
test_attention_mask.append(encoding['attention_mask'])
test_ids = torch.cat(test_ids, dim = 0)
test_attention_mask = torch.cat(test_attention_mask, dim = 0)
# Forward pass, calculate logit predictions
with torch.no_grad():
output = model(test_ids.to(device), token_type_ids = None, attention_mask = test_attention_mask.to(device))
prediction = np.argmax(output.logits.cpu().numpy()).flatten().item()
if prediction == 0:
pred_label = 'Not pertinent'
elif prediction == 1:
pred_label = 'Pertinent (Soft)'
elif prediction == 2:
pred_label = 'Pertinent (Strict, Non-verifiable)'
elif prediction == 3:
pred_label = 'Pertinent (Strict, Verifiable)'
print('Input text: ', new_segment)
print('\n\nPredicted Class: ', pred_label)
```
## Online demo
- https://huggingface.co/spaces/Herelles/segments-lupan
## Citation
To cite the data set please use:
```
@article{koptelov2023manually,
title={A manually annotated corpus in French for the study of urbanization and the natural risk prevention},
author={Koptelov, Maksim and Holveck, Margaux and Cremilleux, Bruno and Reynaud, Justine and Roche, Mathieu and Teisseire, Maguelonne},
journal={Scientific Data},
volume={10},
number={1},
pages={818},
year={2023},
publisher={Nature Publishing Group UK London}
}
```
To cite the code please use:
```
@inproceedings{koptelov2023towards,
title={Towards a (Semi-) Automatic Urban Planning Rule Identification in the French Language},
author={Koptelov, Maksim and Holveck, Margaux and Cremilleux, Bruno and Reynaud, Justine and Roche, Mathieu and Teisseire, Maguelonne},
booktitle={2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)},
pages={1--10},
year={2023},
organization={IEEE}
}
``` |