|
--- |
|
datasets: |
|
- Herelles/lupan |
|
language: |
|
- fr |
|
tags: |
|
- text classification |
|
- pytorch |
|
- camembert |
|
- urban planning |
|
- natural risks |
|
- risk management |
|
- geography |
|
inference: false |
|
--- |
|
# CamemBERT LUPAN (Local Urban Plans And Natural risks) |
|
## Overview |
|
|
|
In France, urban planning and natural risk management operate the Local Land Plans (PLU – Plan Local d'Urbanisme) and the Natural risk prevention plans (PPRn – Plan de Prévention des Risques naturels) containing land use rules. To facilitate automatic extraction of the rules, we manually annotated a number of those documents concerning Montpellier, a rapidly evolving agglomeration exposed to natural risks, then fine-tuned a model. |
|
|
|
This model classifies input text in French to determine if it contains an urban planning rule. It outputs one of 4 classes: Verifiable (indicating the possibility of verification with satellite images), Non-verifiable (indicating impossibility of verification with satellite images), Informative (containing non-strict rules in the form of recommendations), and Not pertinent (absence of any of the above rules). For better quality results, it is recommended to add a title and a subtitle to each textual input. |
|
|
|
For more details please refer to our article: https://www.nature.com/articles/s41597-023-02705-y |
|
|
|
## Training and evaluation data |
|
|
|
The model is fine-tuned on top of CamemBERT using our corpus: https://huggingface.co/datasets/Herelles/lupan |
|
|
|
This is the first corpus in the French language in the fields of urban planning and natural risk management. |
|
|
|
## Example of use |
|
|
|
Attention: to run this code you need to have intalled `transformers`, `torch` and `numpy`. You can do it with `pip install transformers torch numpy`. |
|
|
|
Load necessary libraries: |
|
``` |
|
from transformers import CamembertTokenizer, CamembertForSequenceClassification |
|
|
|
import torch |
|
|
|
import numpy as np |
|
``` |
|
|
|
Define tokenizer: |
|
``` |
|
tokenizer = CamembertTokenizer.from_pretrained("camembert-base") |
|
``` |
|
|
|
Define the model: |
|
``` |
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
|
|
|
model = CamembertForSequenceClassification.from_pretrained("herelles/camembert-base-lupan") |
|
|
|
model.to(device) |
|
``` |
|
|
|
Define segment to predict: |
|
``` |
|
new_segment = '''Article 1 : Occupations ou utilisations du sol interdites |
|
|
|
1) Dans l’ensemble de la zone sont interdits : |
|
|
|
Les constructions destinées à l’habitation ne dépendant pas d’une exploitation agricole autres |
|
que celles visées à l’article 2 paragraphe 1).''' |
|
``` |
|
|
|
Get the prediction: |
|
``` |
|
test_ids = [] |
|
test_attention_mask = [] |
|
|
|
# Apply the tokenizer |
|
encoding = tokenizer(new_segment, padding="longest", return_tensors="pt") |
|
|
|
# Extract IDs and Attention Mask |
|
test_ids.append(encoding['input_ids']) |
|
test_attention_mask.append(encoding['attention_mask']) |
|
test_ids = torch.cat(test_ids, dim = 0) |
|
test_attention_mask = torch.cat(test_attention_mask, dim = 0) |
|
|
|
# Forward pass, calculate logit predictions |
|
with torch.no_grad(): |
|
output = model(test_ids.to(device), token_type_ids = None, attention_mask = test_attention_mask.to(device)) |
|
|
|
prediction = np.argmax(output.logits.cpu().numpy()).flatten().item() |
|
|
|
if prediction == 0: |
|
pred_label = 'Not pertinent' |
|
elif prediction == 1: |
|
pred_label = 'Pertinent (Soft)' |
|
elif prediction == 2: |
|
pred_label = 'Pertinent (Strict, Non-verifiable)' |
|
elif prediction == 3: |
|
pred_label = 'Pertinent (Strict, Verifiable)' |
|
|
|
print('Input text: ', new_segment) |
|
print('\n\nPredicted Class: ', pred_label) |
|
``` |
|
|
|
## Citation |
|
|
|
To cite the data set please use: |
|
``` |
|
@article{koptelov2023manually, |
|
title={A manually annotated corpus in French for the study of urbanization and the natural risk prevention}, |
|
author={Koptelov, Maksim and Holveck, Margaux and Cremilleux, Bruno and Reynaud, Justine and Roche, Mathieu and Teisseire, Maguelonne}, |
|
journal={Scientific Data}, |
|
volume={10}, |
|
number={1}, |
|
pages={818}, |
|
year={2023}, |
|
publisher={Nature Publishing Group UK London} |
|
} |
|
``` |
|
|
|
To cite the code please use: |
|
``` |
|
@inproceedings{koptelov2023towards, |
|
title={Towards a (Semi-) Automatic Urban Planning Rule Identification in the French Language}, |
|
author={Koptelov, Maksim and Holveck, Margaux and Cremilleux, Bruno and Reynaud, Justine and Roche, Mathieu and Teisseire, Maguelonne}, |
|
booktitle={2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)}, |
|
pages={1--10}, |
|
year={2023}, |
|
organization={IEEE} |
|
} |
|
``` |