Jörg Tiedemann's picture

11

Jörg Tiedemann

tiedeman

·

https://blogs.helsinki.fi/tiedeman/

AI & ML interests

machine translation, multilingual NLP

Recent Activity

authored a paper 19 days ago

An Expanded Massive Multilingual Dataset for High-Performance Language Technologies

new activity 5 months ago

Helsinki-NLP/opus-mt-tc-big-en-tr:how many parameters are there in the model?

authored a paper 6 months ago

Uncertainty-Aware Natural Language Inference with Stochastic Weight Averaging

View all activity

Organizations

tiedeman's activity

authored a paper 19 days ago

An Expanded Massive Multilingual Dataset for High-Performance Language Technologies

Paper • 2503.10267 • Published 25 days ago

New activity in Helsinki-NLP/opus-mt-tc-big-en-tr 5 months ago

how many parameters are there in the model?

#9 opened 7 months ago by

authored 11 papers 6 months ago

Uncertainty-Aware Natural Language Inference with Stochastic Weight Averaging

Paper • 2304.04726 • Published Apr 10, 2023

Sentence Embeddings in NLI with Iterative Refinement Encoders

Paper • 1808.08762 • Published Aug 27, 2018

Domain-specific Continued Pretraining of Language Models for Capturing Long Context in Mental Health

Paper • 2304.10447 • Published Apr 20, 2023 • 1

The University of Helsinki submissions to the WMT19 news translation task

Paper • 1906.04040 • Published Jun 10, 2019

Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations

Paper • 1908.02262 • Published Aug 6, 2019

XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Paper • 2011.01612 • Published Nov 3, 2020 • 1

NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

Paper • 2104.04751 • Published Apr 10, 2021

How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets

Paper • 2201.04467 • Published Jan 12, 2022

A New Massive Multilingual Dataset for High-Performance Language Technologies

Paper • 2403.14009 • Published Mar 20, 2024 • 1

EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models

Paper • 2409.17892 • Published Sep 26, 2024 • 2

The Tatoeba Translation Challenge -- Realistic Data Sets for Low Resource and Multilingual MT

Paper • 2010.06354 • Published Oct 13, 2020

updated 4 models 6 months ago

Helsinki-NLP/opus-mt-tc-bible-big-deu_eng_fra_por_spa-mul

Translation • Updated Oct 12, 2024 • 172 • 1

Helsinki-NLP/opus-mt-tc-bible-big-mul-deu_eng_fra_por_spa

Translation • Updated Oct 12, 2024 • 168 • • 2

Helsinki-NLP/opus-mt-tc-bible-big-mul-deu_eng_nld

Translation • Updated Oct 12, 2024 • 26 • 2

Helsinki-NLP/opus-mt-tc-bible-big-mul-mul

Translation • Updated Oct 12, 2024 • 533 • • 4

updated a collection 6 months ago

OPUS-MT multilingual (TC+Bible)

multilingual translation models trained on the Tatoeba Translation Challenge dataset (from OPUS) and a massively multilingual Bible corpus • 85 items • Updated Oct 9, 2024 • 7