seastar105's picture
Update README.md
bb0a94d verified
metadata
library_name: transformers
language:
  - ko
base_model:
  - openai/whisper-small

Model Description

OpenAI의 whisper-small λͺ¨λΈμ„ μ•„λž˜ λ°μ΄ν„°μ…‹μœΌλ‘œ ν•™μŠ΅ν•œ λͺ¨λΈμž…λ‹ˆλ‹€. μ‚¬μš©μ€‘μΈ ν…ŒμŠ€νŠΈμ…‹ κΈ°μ€€μœΌλ‘œ 평균 μ„±λŠ₯이 whisper-large-v3보닀 μ’‹μŠ΅λ‹ˆλ‹€.

Training setup

train_steps: 50000
warmup_steps: 500
lr scheduler: linear warmup cosine decay
max learning rate: 1e-4
batch size: 1024
max_grad_norm: 1.0
adamw_beta1: 0.9
adamw_beta2: 0.98
adamw_eps: 1e-6

Evaluation

https://github.com/rtzr/Awesome-Korean-Speech-Recognition

μœ„ λ ˆν¬μ§€ν† λ¦¬μ—μ„œ μ£Όμš” μ˜μ—­λ³„ 회의 μŒμ„±μ„ μ œμ™Έν•œ ν…ŒμŠ€νŠΈμ…‹ κ²°κ³Όμž…λ‹ˆλ‹€. μ•„λž˜ ν…Œμ΄λΈ”μ—μ„œ whisper_small_komixv2κ°€ λ³Έ λͺ¨λΈ μ„±λŠ₯μž…λ‹ˆλ‹€.

Model Average cv_15_ko fleurs_ko kcall_testset kconf_test kcounsel_test klec_testset kspon_clean kspon_other
whisper_tiny 36.63 31.03 18.48 58.57 36.02 33.52 35.74 42.22 37.42
whisper_tiny_komixv2 11.6 14.56 6.54 9.12 13.19 11.62 13.16 12.13 12.52
whisper_base 40.61 22.45 15.7 85.94 41.95 32.38 39.24 46.92 40.29
whisper_base_komixv2 8.73 10.27 5.14 6.23 10.86 7.01 10.38 9.98 9.99
whisper_small 17.52 11.56 6.33 30.79 18.96 13.57 18.71 22.02 18.23
whisper_small_komixv2 7.36 7.07 4.19 5.6 9.67 5.5 8.55 9.26 9.07
whisper_medium 13.92 8.2 4.38 25.73 15.66 10.1 14.9 17.16 15.22
whisper_medium_komixv2 7.3 6.62 4.52 5.85 9.42 5.47 8.38 9.19 8.97
whisper_large_v3 7.99 5.11 3.72 5.45 9.35 3.83 8.46 15.08 12.89
whisper_large_v3_turbo 10.75 5.38 3.99 10.93 10.27 4.21 9.42 26.66 15.16

Acknowledgement

  • λ³Έ λͺ¨λΈμ€ κ΅¬κΈ€μ˜ TRC ν”„λ‘œκ·Έλž¨μ˜ μ§€μ›μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.
  • Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)