File size: 4,408 Bytes
64a2a87 112f4e7 64a2a87 61de029 64a2a87 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
---
datasets:
- Herelles/lupan
language:
- fr
tags:
- text classification
- pytorch
- camembert
- urban planning
- natural risks
- risk management
- geography
inference: false
---
# CamemBERT LUPAN (Local Urban Plans And Natural risks)
## Overview
In France, urban planning and natural risk management operate the Local Land Plans (PLU – Plan Local d'Urbanisme) and the Natural risk prevention plans (PPRn – Plan de Prévention des Risques naturels) containing land use rules. To facilitate automatic extraction of the rules, we manually annotated a number of those documents concerning Montpellier, a rapidly evolving agglomeration exposed to natural risks, then fine-tuned a model.
This model classifies input text in French to determine if it contains an urban planning rule. It outputs one of 4 classes: Verifiable (indicating the possibility of verification with satellite images), Non-verifiable (indicating impossibility of verification with satellite images), Informative (containing non-strict rules in the form of recommendations), and Not pertinent (absence of any of the above rules). For better quality results, it is recommended to add a title and a subtitle to each textual input.
For more details please refer to our article: https://www.nature.com/articles/s41597-023-02705-y
## Training and evaluation data
The model is fine-tuned on top of CamemBERT using our corpus: https://huggingface.co/datasets/Herelles/lupan
This is the first corpus in the French language in the fields of urban planning and natural risk management.
## Example of use
Attention: to run this code you need to have intalled `transformers`, `torch` and `numpy`. You can do it with `pip install transformers torch numpy`.
Load necessary libraries:
```
from transformers import CamembertTokenizer, CamembertForSequenceClassification
import torch
import numpy as np
```
Define tokenizer:
```
tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
```
Define the model:
```
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = CamembertForSequenceClassification.from_pretrained("herelles/camembert-base-lupan")
model.to(device)
```
Define segment to predict:
```
new_segment = '''Article 1 : Occupations ou utilisations du sol interdites
1) Dans l’ensemble de la zone sont interdits :
Les constructions destinées à l’habitation ne dépendant pas d’une exploitation agricole autres
que celles visées à l’article 2 paragraphe 1).'''
```
Get the prediction:
```
test_ids = []
test_attention_mask = []
# Apply the tokenizer
encoding = tokenizer(new_segment, padding="longest", return_tensors="pt")
# Extract IDs and Attention Mask
test_ids.append(encoding['input_ids'])
test_attention_mask.append(encoding['attention_mask'])
test_ids = torch.cat(test_ids, dim = 0)
test_attention_mask = torch.cat(test_attention_mask, dim = 0)
# Forward pass, calculate logit predictions
with torch.no_grad():
output = model(test_ids.to(device), token_type_ids = None, attention_mask = test_attention_mask.to(device))
prediction = np.argmax(output.logits.cpu().numpy()).flatten().item()
if prediction == 0:
pred_label = 'Not pertinent'
elif prediction == 1:
pred_label = 'Pertinent (Soft)'
elif prediction == 2:
pred_label = 'Pertinent (Strict, Non-verifiable)'
elif prediction == 3:
pred_label = 'Pertinent (Strict, Verifiable)'
print('Input text: ', new_segment)
print('\n\nPredicted Class: ', pred_label)
```
## Citation
To cite the data set please use:
```
@article{koptelov2023manually,
title={A manually annotated corpus in French for the study of urbanization and the natural risk prevention},
author={Koptelov, Maksim and Holveck, Margaux and Cremilleux, Bruno and Reynaud, Justine and Roche, Mathieu and Teisseire, Maguelonne},
journal={Scientific Data},
volume={10},
number={1},
pages={818},
year={2023},
publisher={Nature Publishing Group UK London}
}
```
To cite the code please use:
```
@inproceedings{koptelov2023towards,
title={Towards a (Semi-) Automatic Urban Planning Rule Identification in the French Language},
author={Koptelov, Maksim and Holveck, Margaux and Cremilleux, Bruno and Reynaud, Justine and Roche, Mathieu and Teisseire, Maguelonne},
booktitle={2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)},
pages={1--10},
year={2023},
organization={IEEE}
}
``` |