Flo Schneider's picture

Flo Schneider

floschne

·

https://www.inf.uni-hamburg.de/en/inst/ab/lt/people/florian-schneider.html

AI & ML interests

Large Vision-Language Models, Cross-modal Retrieval

Recent Activity

liked a Space 28 days ago

lmarena-ai/chatbot-arena-leaderboard

liked a model about 1 month ago

FacebookAI/xlm-roberta-large

liked a model about 1 month ago

answerdotai/ModernBERT-large

View all activity

Organizations

floschne's activity

upvoted 5 papers about 2 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 141

MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching

Paper • 2502.12852 • Published Feb 18 • 3

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 180

GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking

Paper • 2502.13766 • Published Feb 19 • 3

How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild

Paper • 2502.12769 • Published Feb 18 • 3

upvoted a collection 2 months ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated 9 days ago • 436

upvoted a collection 3 months ago

Centurio

Artifacts of the paper "Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model" • 6 items • Updated Feb 4 • 4

upvoted 2 papers 3 months ago

Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model

Paper • 2501.05122 • Published Jan 9 • 20

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 152

upvoted a collection 3 months ago

Qwen2-VL

Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 210

upvoted 3 papers 4 months ago

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Paper • 2412.13871 • Published Dec 18, 2024 • 18

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 364

Progressive Multimodal Reasoning via Active Retrieval

Paper • 2412.14835 • Published Dec 19, 2024 • 74

upvoted a paper 6 months ago

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8, 2024 • 112

upvoted a collection 7 months ago

LLaVA-Onevision

LLaVa_Onevision models for single-image, multi-image, and video scenarios • 9 items • Updated Sep 18, 2024 • 15

upvoted an article 7 months ago

Article

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 31

upvoted a paper 7 months ago

M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks

Paper • 2407.03791 • Published Jul 4, 2024 • 1

upvoted a paper 8 months ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61

upvoted a paper 10 months ago

What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12, 2024 • 42

upvoted a paper 11 months ago

Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 33