Overview

  • ModernBertMultilingual is a multilingual model trained from scratch
  • Using the ModernBERT-base architecture
  • It supports four languages and their variants, including Simplified Chinese, Traditional Chinese, English, Japanese, and Korean
  • And can effectively handle mixed-text tasks in East Asian languages

Technical Metrics

  • Using a slightly modified vocabulary of the Qwen2.5 series to support multilingual capabilities
  • Trained for approximately 100 hours on L40*7 devices, with about 60B tokens
  • Main training parameters:
    • Batch Size: 1792
    • Learning Rate: 5e-04
    • Maximum Sequence Length: 512
    • Optimizer: adamw_torch
    • LR Scheduler: warmup_stable_decay
    • Train Precision: bf16 mix
  • For additional technical metrics, please refer to the original release information and papers of ModernBERT-base

Release Versions

  • Three different weight versions are provided:
    • base: The version trained with general base data, suitable for various domain texts (default)
    • nodecay: The checkpoint before the annealing phase begins, which allows you to add domain-specific data for annealing to better adapt to the target domain
    • keyword_gacha_multilingual: The version annealed with ACGN-related texts (e.g., light novels, game scripts, comic scripts, etc.)
Model Version Description
modern_bert_multilingual 20250128 base
modern_bert_multilingual_nodecay 20250128 nodecay
keyword_gacha_base_multilingual 20250128 keyword_gacha_multilingual

Others

  • Training script available on Github

综述

  • ModernBertMultilingual 是一个从零开始训练的多语言模型
  • 使用 ModernBERT-base 架构
  • 支持 简体中文繁体中文英文日文韩文 等四种语言及其变种
  • 可以很好处理东亚语言混合文本任务

技术指标

  • 使用略微调整后的 Qwen2.5 系列的词表以支持多语言
  • L40*7 的设备上训练了大约 100 个小时,训练量大约 60B Token
  • 主要训练参数
    • Batch Size : 1792
    • Learing Rate : 5e-04
    • Maximum Sequence Length : 512
    • Optimizer : adamw_torch
    • LR Scheduler: warmup_stable_decay
    • Train Precision : bf16 mix
  • 其余技术指标可以参考 ModernBERT-base 原始发布信息与论文

发布版本

  • 提供 3 个不同的权重版本
    • base - 使用通用预料完整训练,可以较好的适用于各种不同领域文本
    • nodecay - 退火阶段开始前的检查点,你可以在这个权重的基础上添加领域语料进行退火以使其更适应目标领域
    • keyword_gacha_multilingual - 使用 ACGN(例如 轻小说游戏脚本漫画脚本 等)类型文本进行退火的版本
模型 版本 说明
modern_bert_multilingual 20250128 base
modern_bert_multilingual_nodecay 20250128 nodecay
keyword_gacha_base_multilingual 20250128 keyword_gacha_multilingual

其他

Downloads last month
2
Safetensors
Model size
228M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.