BookNLP-fr
Collection
BookNLP-fr
•
4 items
•
Updated
•
2
This model, developed as part of the BookNLP-fr project, is a NER model built on top of camembert-large embeddings, trained to predict nested entities in french, specifically for literary texts.
The predicted entities are:
NER_tag | precision | recall | f1_score | support | support % |
---|---|---|---|---|---|
PER | 90.58% | 93.52% | 92.03% | 31,570 | 83.87% |
FAC | 70.49% | 71.75% | 71.12% | 2,294 | 6.09% |
TIME | 58.40% | 58.68% | 58.54% | 1,670 | 4.44% |
GPE | 76.69% | 74.05% | 75.35% | 871 | 2.31% |
LOC | 60.92% | 44.37% | 51.35% | 773 | 2.05% |
VEH | 66.18% | 49.25% | 56.47% | 465 | 1.24% |
micro_avg | 86.70% | 88.64% | 87.61% | 37,643 | 100.00% |
macro_avg | 70.55% | 65.27% | 67.48% | 37,643 | 100.00% |
Model Input: Maximum context camembert-large embeddings (1024 dimensions)
Locked Dropout: 0.5
Projection layer:
BiLSTM layer:
Linear layer:
CRF layer
Model Output: BIOES labels sequence
*** IN CONSTRUCTION ***
Document | Tokens Count | Is included in model eval | |
---|---|---|---|
0 | 1836_Gautier-Theophile_La-morte-amoureuse | 14,299 tokens | True |
1 | 1840_Sand-George_Pauline | 12,315 tokens | True |
2 | 1842_Balzac-Honore-de_La-Maison-du-chat-qui-pelote | 24,776 tokens | True |
3 | 1844_Balzac-Honore-de_La-Maison-Nucingen | 30,987 tokens | True |
4 | 1844_Balzac-Honore-de_Sarrasine | 15,408 tokens | True |
5 | 1856_Cousin-Victor_Madame-de-Hautefort | 11,768 tokens | True |
6 | 1863_Gautier-Theophile_Le-capitaine-Fracasse | 11,834 tokens | True |
7 | 1873_Zola-Emile_Le-ventre-de-Paris | 12,557 tokens | True |
8 | 1881_Flaubert-Gustave_Bouvard-et-Pecuchet | 12,281 tokens | True |
9 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-1_1-MADEMOISELLE-FIFI | 5,425 tokens | True |
10 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-1_2-MADAME-BAPTISTE | 2,554 tokens | True |
11 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-1_3-LA-ROUILLE | 2,929 tokens | True |
12 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-2_1-MARROCA | 4,067 tokens | True |
13 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-2_2-LA-BUCHE | 2,251 tokens | True |
14 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-2_3-LA-RELIQUE | 2,034 tokens | True |
15 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_1-FOU | 1,864 tokens | True |
16 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_2-REVEIL | 2,141 tokens | True |
17 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_3-UNE-RUSE | 2,441 tokens | True |
18 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_4-A-CHEVAL | 2,860 tokens | True |
19 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_5-UN-REVEILLON | 2,343 tokens | True |
20 | 1901_Lucie-Achard_Rosalie-de-Constant-sa-famille-et-ses-amis | 12,703 tokens | True |
21 | 1903_Conan-Laure_Elisabeth_Seton | 13,023 tokens | True |
22 | 1904_Rolland-Romain_Jean-Christophe_Tome-I-L-aube | 10,982 tokens | True |
23 | 1904_Rolland-Romain_Jean-Christophe_Tome-II-Le-matin | 10,305 tokens | True |
24 | 1917_Adèle-Bourgeois_Némoville | 12,389 tokens | True |
25 | 1923_Radiguet-Raymond_Le-diable-au-corps | 14,637 tokens | True |
26 | 1926_Audoux-Marguerite_De-la-ville-au-moulin | 11,902 tokens | True |
27 | 1937_Audoux-Marguerite_Douce-Lumiere | 12,285 tokens | True |
28 | TOTAL | 275,360 tokens | 28 files used for cross-validation |
Gold Labels | PER | FAC | TIME | GPE | LOC | VEH | O | support |
---|---|---|---|---|---|---|---|---|
PER | 29,525 | 27 | 13 | 6 | 7 | 26 | 1,966 | 31,570 |
FAC | 43 | 1,646 | 0 | 17 | 12 | 2 | 574 | 2,294 |
TIME | 5 | 1 | 980 | 1 | 1 | 0 | 682 | 1,670 |
GPE | 18 | 28 | 1 | 645 | 27 | 0 | 152 | 871 |
LOC | 5 | 63 | 0 | 54 | 343 | 0 | 308 | 773 |
VEH | 58 | 8 | 1 | 0 | 0 | 229 | 169 | 465 |
O | 2,902 | 532 | 682 | 110 | 167 | 89 | 0 | 4,482 |
mail: antoine [dot] bourgois [at] protonmail [dot] com
Base model
almanach/camembert-large