kiddothe2b's picture
Fine-tuning + SD penaltyin EURLEX (Level 2)
e4add91
|
raw
history blame
3.07 kB
metadata
widget:
  - text: >-
      KOMMISSIONENS BESLUTNING

      af 6. marts 2006

      om klassificering af visse byggevarers ydeevne med hensyn til reaktion ved
      brand for  vidt angår trægulve samt vægpaneler og vægbeklædning i
      massivt træ

      (meddelt under nummer K(2006) 655
datasets:
  - multi_eurlex
metrics:
  - f1
model-index:
  - name: coastalcph/danish-legal-longformer-eurlex-sd
    results:
      - task:
          type: text-classification
          name: Danish EURLEX (Level 2)
        dataset:
          name: multi_eurlex
          type: multi_eurlex
          config: multi_eurlex
          split: validation
        metrics:
          - name: Micro-F1
            type: micro-f1
            value: 0.76144
          - name: Macro-F1
            type: macro-f1
            value: 0.52878

Model description

This model is a fine-tuned version of coastalcph/danish-legal-longformer-base on the Danish part of MultiEURLEX dataset using an additional Spectral Decoupling penalty (Pezeshki et al., 2020.

Training and evaluation data

The Danish part of MultiEURLEX dataset.

Use of Model

As a text classifier:

from transformers import pipeline
import numpy as np

# Init text classification pipeline
text_cls_pipe = pipeline(task="text-classification",
                         model="coastalcph/danish-legal-longformer-eurlex",
                         use_auth_token='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')

# Encode and Classify document
predictions = text_cls_pipe("KOMMISSIONENS BESLUTNING\naf 6. marts 2006\nom klassificering af visse byggevarers "
                            "ydeevne med hensyn til reaktion ved brand for så vidt angår trægulve samt vægpaneler "
                            "og vægbeklædning i massivt træ\n(meddelt under nummer K(2006) 655")

# Print prediction
print(predictions)
# [{'label': 'building and public works', 'score': 0.9626012444496155}]

As a feature extractor (document embedder):

from transformers import pipeline
import numpy as np

# Init feature extraction pipeline
feature_extraction_pipe = pipeline(task="feature-extraction",
                                   model="coastalcph/danish-legal-longformer-eurlex",
                                   use_auth_token='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')

# Encode document
predictions = feature_extraction_pipe("KOMMISSIONENS BESLUTNING\naf 6. marts 2006\nom klassificering af visse byggevarers "
                                      "ydeevne med hensyn til reaktion ved brand for så vidt angår trægulve samt vægpaneler "
                                      "og vægbeklædning i massivt træ\n(meddelt under nummer K(2006) 655")

# Use CLS token representation as document embedding
document_features = token_wise_features[0][0]

print(document_features.shape)
# (768,)

Framework versions

  • Transformers 4.18.0
  • Pytorch 1.12.0+cu113
  • Datasets 2.0.0
  • Tokenizers 0.12.1