to_read - a hitchhiker3010 Collection

Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

hitchhiker3010 's Collections

to_read

to_read

updated 5 days ago

DocGraphLM: Documental Graph Language Model for Information Extraction

Paper • 2401.02823 • Published Jan 5 • 35
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4 • 62
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Paper • 2309.01131 • Published Sep 3, 2023 • 1
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 65
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 125
Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

Paper • 2403.13447 • Published Mar 20 • 18
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4 • 37
Unifying Vision, Text, and Layout for Universal Document Processing

Paper • 2212.02623 • Published Dec 5, 2022 • 10
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

Paper • 2406.15334 • Published Jun 21 • 8
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

Paper • 2407.07071 • Published Jul 9 • 11
Transformer Layers as Painters

Paper • 2407.09298 • Published Jul 12 • 13
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Paper • 2402.03216 • Published Feb 5 • 4
Visual Text Generation in the Wild

Paper • 2407.14138 • Published Jul 19 • 8
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Paper • 2407.12594 • Published Jul 17 • 19
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12 • 117
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 124
Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27 • 138
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning

Paper • 2307.03692 • Published Jul 5, 2023 • 25
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Paper • 2307.06304 • Published Jul 12, 2023 • 28
Contrastive Localized Language-Image Pre-Training

Paper • 2410.02746 • Published Oct 3 • 33
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

Paper • 2410.02762 • Published Oct 3 • 9
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Paper • 2410.02367 • Published Oct 3 • 47
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation

Paper • 2410.01731 • Published Oct 2 • 16
Contextual Document Embeddings

Paper • 2410.02525 • Published Oct 3 • 18
Compact Language Models via Pruning and Knowledge Distillation

Paper • 2407.14679 • Published Jul 19 • 38
LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21 • 57
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering

Paper • 2401.08500 • Published Jan 16 • 5
Automatic Prompt Optimization with "Gradient Descent" and Beam Search

Paper • 2305.03495 • Published May 4, 2023 • 1
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

Paper • 2312.10003 • Published Dec 15, 2023 • 37
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22 • 89
Personalization of Large Language Models: A Survey

Paper • 2411.00027 • Published Oct 29 • 31
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

Paper • 2411.00412 • Published Nov 1 • 9
Human-inspired Perspectives: A Survey on AI Long-term Memory

Paper • 2411.00489 • Published Nov 1
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7 • 111
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 16
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25 • 104
Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28 • 95
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published Nov 7 • 48
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Paper • 2411.07232 • Published Nov 11 • 62
MagicQuill: An Intelligent Interactive Image Editing System

Paper • 2411.09703 • Published Nov 14 • 57
Distilling System 2 into System 1

Paper • 2407.06023 • Published Jul 8 • 3
Altogether: Image Captioning via Re-aligning Alt-text

Paper • 2410.17251 • Published Oct 22
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents

Paper • 2409.15594 • Published Sep 23
Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21 • 43
PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published 21 days ago • 118
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published 20 days ago • 55
ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published 29 days ago • 76
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Paper • 2412.07626 • Published 15 days ago • 20
Phi-4 Technical Report

Paper • 2412.08905 • Published 14 days ago • 92
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

Paper • 2410.18779 • Published Oct 24 • 1
Asynchronous LLM Function Calling

Paper • 2412.07017 • Published 16 days ago
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 7 days ago • 103
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Paper • 2412.14161 • Published 7 days ago • 43
GUI Agents: A Survey

Paper • 2412.13501 • Published 8 days ago • 20
FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published 8 days ago • 13
Alignment faking in large language models

Paper • 2412.14093 • Published 7 days ago • 7

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs