λͺ¨λΈ 상세 정보 (readme.md English Version)

1. κ°œμš”

이 λͺ¨λΈμ€ ν•œκ΅­μ–΄ λ¬Έμž₯ λ‚΄ μœ ν•΄ν‘œν˜„μ΄ ν¬ν•¨λ˜μ–΄μžˆλŠ”μ§€, 그리고 μœ ν•΄ν‘œν˜„μ˜ μœ ν˜•μ„ κ²€μΆœν•˜κΈ° μœ„ν•΄ ν•™μŠ΅λœ λͺ¨λΈμž…λ‹ˆλ‹€.
multi-label classification을 μˆ˜ν–‰ν•˜λ©°, μœ ν•΄ν‘œν˜„μ΄ ν¬ν•¨λ˜μ—ˆκ±°λ‚˜ 일반적인 λ¬Έμž₯인지 νŒλ‹¨(λΆ„λ₯˜)ν•˜λŠ” λͺ¨λΈμž…λ‹ˆλ‹€.
AI-Taskλ‘œλŠ” text-classification(multi-label)에 ν•΄λ‹Ήν•©λ‹ˆλ‹€. μ‚¬μš©ν•˜λŠ” 데이터셋은 TTA-DQA/hate_sentence μž…λ‹ˆλ‹€.

클래슀 ꡬ성은 μ•„λž˜μ™€ κ°™μŠ΅λ‹ˆλ‹€.

  • 0: 'insult'
  • 1: 'abuse',
  • 2: 'obscenity'
  • 3: 'TVPC' #Threats of violence/promotion of crime
  • 4: 'sexuality'
  • 5: 'age'
  • 6: 'race_region' #race and region
  • 7: 'disabled'
  • 8: 'religion'
  • 9: 'politics'
  • 10: 'job'
  • 11:'no_hate'

2. Training Information

  • Base Model: KcElectra (a pre-trained Korean language model based on Electra)
  • Source: beomi/KcELECTRA-base-v2022(https://huggingface.co/beomi/KcELECTRA-base-v2022)
  • Model Type: Casual Language Model
  • Pre-training (Korean): μ•½ 17GB (over 180 million sentences)
  • Fine-tuning (hate dataset): μ•½ 28.9MB (TTA-DQA/hate_sentence)
  • Learning Rate: 5e-6
  • Weight Decay: 0.01
  • Epochs: 30
  • Batch Size: 16
  • Data Loader Workers: 2
  • Tokenizer: BertWordPieceTokenizer
  • Model Size: Approximately 511MB

3. μš”κ΅¬μ‚¬ν•­

  • pytorch ~= 1.8.0
  • transformers ~= 4.11.3
  • emoji ~= 0.6.0
  • soynlp ~= 0.0.493

4. Quick Start

  • python
from transformers import AutoTokenizer, AutoModel
  
tokenizer = AutoTokenizer.from_pretrained("TTA-DQA/Hate-Detection-MultiLabel-KcElectra-FineTuning")
model = AutoModel.from_pretrained("TTA-DQA/Hate-Detection-MultiLabel-KcElectra-FineTuning")

5. Citation

  • 이 λͺ¨λΈμ€ μ΄ˆκ±°λŒ€AI ν•™μŠ΅μš© 데이터 ν’ˆμ§ˆκ²€μ¦ 사업(2024년도 μ΄ˆκ±°λŒ€AI ν•™μŠ΅μš© ν’ˆμ§ˆκ²€μ¦)에 μ˜ν•΄μ„œ κ΅¬μΆ•λ˜μ—ˆμŠ΅λ‹ˆλ‹€

6. 편ν–₯μ„±, μœ„ν—˜μ„±, μ œν•œμ„± λ“± ν‘œμ‹œ

  • λ³Έ λͺ¨λΈμ€ 각 클래슀 별 λ°μ΄ν„°μ˜ 양이 λ‹€μ†Œ 편ν–₯적인 뢀뢄이 μžˆμŠ΅λ‹ˆλ‹€.
  • λ˜ν•œ 클래슀 기쀀에 λŒ€ν•΄μ„œ, 언어적, 언어해석적 νŠΉμ„±μ— μ˜ν•΄ λ ˆμ΄λΈ”μ— λŒ€ν•œ 이견이 μžˆμ„ 수 μžˆμŠ΅λ‹ˆλ‹€.
  • μœ ν•΄ν‘œν˜„μ˜ 경우 μ–Έμ–΄, λ¬Έν™”, 적용 λΆ„μ•Ό, 개인적 견해에 따라 주관적인 뢀뢄이 μžˆμ–΄ 결과에 λŒ€ν•œ 편ν–₯ λ˜λŠ” λ…Όλž€μ΄ μžˆμ„ 수 μžˆμŠ΅λ‹ˆλ‹€.
  • λ”°λΌμ„œ, κ²°κ³Όκ°€ ν•œκ΅­μ–΄μ— λŒ€ν•œ μ ˆλŒ€μ μΈ μœ ν•΄ν‘œν˜„μ˜ 기쀀이 될 수 λŠ” μ—†μŠ΅λ‹ˆλ‹€.

μ‹€ν—˜κ²°κ³Ό

  • type : multi-label classification(text-classification)
  • f1-score : 0.8279
  • accuracy : 0.7013
Downloads last month
82
Safetensors
Model size
128M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for TTA-DQA/Hate-Detection-MultiLabel-KcElectra-FineTuning

Finetuned
(6)
this model