Last Week in Medical AI: Top Research Papers/Models π (September 21 - September 27, 2024) 6 days ago β’ 1
Performance Comparison: Llama-3.2 vs. Llama-3.1 LLMs and Smaller Models (3B, 1B) in Medical and Healthcare AI Domains π©Ίπ§¬π 8 days ago β’ 5
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper β’ 2408.08872 β’ Published Aug 16 β’ 96
TraDiffusion: Trajectory-Based Training-Free Image Generation Paper β’ 2408.09739 β’ Published Aug 19 β’ 7
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges Paper β’ 2408.08946 β’ Published Aug 16 β’ 9
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data Paper β’ 2408.10119 β’ Published Aug 19 β’ 15
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views Paper β’ 2408.10195 β’ Published Aug 19 β’ 12
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model Paper β’ 2408.10198 β’ Published Aug 19 β’ 32
MambaEVT: Event Stream based Visual Object Tracking using State Space Model Paper β’ 2408.10487 β’ Published Aug 20 β’ 5
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model Paper β’ 2408.10764 β’ Published Aug 20 β’ 7
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper β’ 2408.11049 β’ Published Aug 20 β’ 10
NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency Paper β’ 2408.11054 β’ Published Aug 20 β’ 10
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning Paper β’ 2408.11001 β’ Published Aug 20 β’ 11
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper β’ 2408.11039 β’ Published Aug 20 β’ 56
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering Paper β’ 2408.09174 β’ Published Aug 17 β’ 51
Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification Paper β’ 2408.11237 β’ Published Aug 20 β’ 4
Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer Paper β’ 2408.08793 β’ Published Aug 16 β’ 4
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting Paper β’ 2408.11706 β’ Published Aug 21 β’ 5
TrackGo: A Flexible and Efficient Method for Controllable Video Generation Paper β’ 2408.11475 β’ Published Aug 21 β’ 16
GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models Paper β’ 2408.11817 β’ Published Aug 21 β’ 7
FocusLLM: Scaling LLM's Context by Parallel Decoding Paper β’ 2408.11745 β’ Published Aug 21 β’ 23
LLM Pruning and Distillation in Practice: The Minitron Approach Paper β’ 2408.11796 β’ Published Aug 21 β’ 53
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models Paper β’ 2408.11318 β’ Published Aug 21 β’ 54
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Paper β’ 2408.12480 β’ Published Aug 22 β’ 15
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design Paper β’ 2408.12503 β’ Published Aug 22 β’ 21
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications Paper β’ 2408.11878 β’ Published Aug 20 β’ 49
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments Paper β’ 2408.10945 β’ Published Aug 20 β’ 6
Memory-Efficient LLM Training with Online Subspace Descent Paper β’ 2408.12857 β’ Published Aug 23 β’ 10
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Paper β’ 2408.13233 β’ Published Aug 23 β’ 20
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Paper β’ 2408.13257 β’ Published Aug 23 β’ 25
Building and better understanding vision-language models: insights and future directions Paper β’ 2408.12637 β’ Published Aug 22 β’ 110
NanoFlow: Towards Optimal Large Language Model Serving Throughput Paper β’ 2408.12757 β’ Published Aug 22 β’ 15
TVG: A Training-free Transition Video Generation Method with Diffusion Models Paper β’ 2408.13413 β’ Published Aug 24 β’ 13
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper β’ 2408.13359 β’ Published Aug 23 β’ 21
Training-free Long Video Generation with Chain of Diffusion Model Experts Paper β’ 2408.13423 β’ Published Aug 24 β’ 20
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences Paper β’ 2408.14468 β’ Published Aug 26 β’ 33
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs Paper β’ 2408.13467 β’ Published Aug 24 β’ 23
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher Paper β’ 2408.14176 β’ Published Aug 26 β’ 59
DSTI at LLMs4OL 2024 Task A: Intrinsic versus extrinsic knowledge for type classification Paper β’ 2408.14236 β’ Published Aug 26 β’ 3
Text2SQL is Not Enough: Unifying AI and Databases with TAG Paper β’ 2408.14717 β’ Published Aug 27 β’ 23
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation Paper β’ 2408.15239 β’ Published Aug 27 β’ 27
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper β’ 2408.15237 β’ Published Aug 27 β’ 36
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts Paper β’ 2408.15664 β’ Published Aug 28 β’ 11
Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature Paper β’ 2408.15836 β’ Published Aug 28 β’ 11
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Paper β’ 2408.15915 β’ Published Aug 28 β’ 19
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper β’ 2408.15881 β’ Published Aug 28 β’ 20
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper β’ 2408.15518 β’ Published Aug 28 β’ 41
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper β’ 2408.15079 β’ Published Aug 27 β’ 51
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper β’ 2408.15998 β’ Published Aug 28 β’ 83
StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements Paper β’ 2408.15666 β’ Published Aug 28 β’ 9
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs Paper β’ 2407.02485 β’ Published Jul 2 β’ 5
Life Science, Health and Medical Datasets for ML Collection A collection of datasets for Medical Domain β’ 4 items β’ Updated Jun 24 β’ 2
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper β’ 2406.14491 β’ Published Jun 20 β’ 85
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper β’ 2404.14219 β’ Published Apr 22 β’ 251