nishan-chatterjee
link to github
996e640
|
raw
history blame
3.22 kB
metadata
title: multilingual-persuasion-detection-from-text
app_file: inference.py
pinned: false
license: gpl-3.0
language:
  - multilingual
tags:
  - mbart-50
  - text-classification
  - multi-label-classification
  - persuasion-detection
  - meme-analysis
  - social-media-analysis
  - propaganda-detection
  - hierarchical-classification
  - multilingual
pipeline_tag: text-classification
inference: true
widget:
  - text: THIS IS WHY YOU NEED. A SHARPIE WITH YOU AT ALL TIMES.
    example_title: Sentence 1
  - text: WHEN YOU'RE THE FBI, THEY LET YOU DO IT.
    example_title: Sentence 2
  - text: |-
      Move your ships away!

      oooook

      Move your ships away!

      No, and I just added 10 more
    example_title: Sentence 3
  - text: Let's Make America Great Again!
    example_title: Sentence 4

Multilingual Persuasion Detection in Memes

Given only the “textual content” of a meme, the goal is to identify which of the 20 persuasion techniques, organized in a hierarchy, it uses. Selecting only the ancestor node of a technique gives only a partial reward. This is a hierarchical multi-label classification problem based on the SemEval 2024 Task 4 Subtask 1 of "Multilingual Detection of Persuasion Techniques in Memes".

The source code to train the model along with additional implementations can be found here. The paper describing our method was accepted at SemEval 2024. Link to paper coming soon!!

Hierarchy

Usage Example

  • Input: "I HATE TRUMP\n\nMOST TERRORIST DO",
  • Outputs:
    • Child-only Label List: ['Name calling/Labeling', 'Loaded Language']
    • Complete Hierarchical Label List: ['Ethos', 'Ad Hominem', 'Name calling/Labeling', 'Pathos', 'Loaded Language']

Note:

  • Make sure to have the dependencies installed in your environment from requirements.txt
  • Make to have the trained model and tokenizer in the same directory as inference.py

Training Hyperparameters

  • Base Model: "facebook/mbart-large-50-many-to-many-mmt"
  • Learning Rate: 5e-05
  • Max Length: 256
  • Batch Size: 64
  • Epoch: 3
  • Seed: 42

Model Statistics

The model obtained the following metrics on the Development Set as of March 31st, 2024:

  • Hierarchical F1: 63.58%
  • Hierarchical Precision: 58.3%
  • Hierarchical Recall: 69.9%

Licensing

The model is available under the GNU General Public License v3.0 (GPL-3.0), which allows for free use, modification, and distribution under the same license. However, it is strictly for research purposes only and cannot be used for malicious activities, including but not limited to manipulation, targeted harassment, hate speech, deception, and discrimination.

The dataset is available on the competition website. Users must accept an online agreement before downloading and using the data. This agreement stipulates that the data is for research purposes only and cannot be redistributed or used for malicious purposes as outlined above.