CompVis Community

university

AI & ML interests

None defined yet.

Recent Activity

compvis-community's activity

anton-l 
posted an update about 2 months ago
view post
Post
2396
Introducing 📐𝐅𝐢𝐧𝐞𝐌𝐚𝐭𝐡: the best public math pre-training dataset with 50B+ tokens!
HuggingFaceTB/finemath

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

We build the dataset by:
🛠️ carefully extracting math data from Common Crawl;
🔎 iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.

We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.

We hope this helps advance the performance of LLMs on math and reasoning! 🚀
We’re also releasing all the ablation models as well as the evaluation code.

HuggingFaceTB/finemath-6763fb8f71b6439b653482c2
toshas 
posted an update about 2 months ago
view post
Post
1255
Introducing ⇆ Marigold-DC — our training-free zero-shot approach to monocular Depth Completion with guided diffusion! If you have ever wondered how else a long denoising diffusion schedule can be useful, we have an answer for you!

Depth Completion addresses sparse, incomplete, or noisy measurements from photogrammetry or sensors like LiDAR. Sparse points aren’t just hard for humans to interpret — they also hinder downstream tasks.

Traditionally, depth completion was framed as image-guided depth interpolation. We leverage Marigold, a diffusion-based monodepth model, to reframe it as sparse-depth-guided depth generation. How the turntables! Check out the paper anyway 👇

🌎 Website: https://marigolddepthcompletion.github.io/
🤗 Demo: prs-eth/marigold-dc
📕 Paper: https://arxiv.org/abs/2412.13389
👾 Code: https://github.com/prs-eth/marigold-dc

Team ETH Zürich: Massimiliano Viola ( @mviola ), Kevin Qu ( @KevinQu7 ), Nando Metzger ( @nandometzger ), Bingxin Ke ( @Bingxin ), Alexander Becker, Konrad Schindler, and Anton Obukhov ( @toshas ). We thank
Hugging Face for their continuous support.