Shivaen Ramshetty's picture

Shivaen Ramshetty

shivr

·

sramshetty

AI & ML interests

NLP, CV, Multimodal

Organizations

shivr's activity

upvoted 6 papers 4 months ago

Tokenization Falling Short: The Curse of Tokenization

Paper • 2406.11687 • Published Jun 17 • 15

TroL: Traversal of Layers for Large Language and Vision Models

Paper • 2406.12246 • Published Jun 18 • 34

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

Paper • 2406.06282 • Published Jun 10 • 36

The Prompt Report: A Systematic Survey of Prompting Techniques

Paper • 2406.06608 • Published Jun 6 • 52

Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

Paper • 2405.18669 • Published May 29 • 11

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 85

upvoted 6 papers 5 months ago

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 125

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30 • 108

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30 • 73

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18 • 53

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published Apr 19 • 38

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Paper • 2404.14396 • Published Apr 22 • 18

upvoted 9 papers 6 months ago

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 36

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 104

Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11 • 90

Are large language models superhuman chemists?

Paper • 2404.01475 • Published Apr 1 • 15

Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2 • 56

Gecko: Versatile Text Embeddings Distilled from Large Language Models

Paper • 2403.20327 • Published Mar 29 • 47

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 51

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 77

LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 64

upvoted 13 papers 7 months ago

RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15 • 66

DiPaCo: Distributed Path Composition

Paper • 2403.10616 • Published Mar 15 • 12

Recurrent Drafter for Fast Speculative Decoding in Large Language Models

Paper • 2403.09919 • Published Mar 14 • 20

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 124

MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12 • 75

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6 • 63

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Paper • 2403.03950 • Published Mar 6 • 13

Learning to Decode Collaboratively with Multiple Language Models

Paper • 2403.03870 • Published Mar 6 • 17

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 592

CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models

Paper • 2402.15021 • Published Feb 22 • 12

Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23 • 70

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21 • 43

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

Paper • 2402.14658 • Published Feb 22 • 82

upvoted 24 papers 8 months ago

User-LLM: Efficient LLM Contextualization with User Embeddings

Paper • 2402.13598 • Published Feb 21 • 18

In deep reinforcement learning, a pruned network is a good network

Paper • 2402.12479 • Published Feb 19 • 16

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 94

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19 • 40

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Paper • 2402.11131 • Published Feb 16 • 41

FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19 • 48

How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15 • 38

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 98

Lumos : Empowering Multimodal LLMs with Scene Text Recognition

Paper • 2402.08017 • Published Feb 12 • 24

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12 • 54

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Paper • 2402.07033 • Published Feb 10 • 16

Premier-TACO: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

Paper • 2402.06187 • Published Feb 9 • 9

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

Paper • 2402.06118 • Published Feb 9 • 13

Direct Language Model Alignment from Online AI Feedback

Paper • 2402.04792 • Published Feb 7 • 27

Grandmaster-Level Chess Without Search

Paper • 2402.04494 • Published Feb 7 • 65

MusicRL: Aligning Music Generation to Human Preferences

Paper • 2402.04229 • Published Feb 6 • 16

Learning Universal Predictors

Paper • 2401.14953 • Published Jan 26 • 18

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Paper • 2402.01831 • Published Feb 2 • 13

Rethinking Interpretability in the Era of Large Language Models

Paper • 2402.01761 • Published Jan 30 • 21

Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2 • 45

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138

MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24 • 44

AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

Paper • 2401.12963 • Published Jan 23 • 12

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Paper • 2401.12168 • Published Jan 22 • 24

upvoted 2 papers 9 months ago

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 142

Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18 • 15