Foundation Models and Tools - a Temus Collection

Temus 's Collections

Foundation AI Papers

Foundation Models and Tools

Foundation AI Papers (II)

Planning-with-LLM

Foundation Models and Tools

updated Jul 18, 2024

FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models

Paper • 2402.10986 • Published Feb 16, 2024 • 78
bigcode/starcoder2-15b

Text Generation • Updated Jun 5, 2024 • 17.3k • • 592
Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 123

Note Zephyr is by far the best aligned open-sourced LLM I've used. They recently have a -beta and a -gamma (fine-tuned out of Gemma) version too.
mixedbread-ai/mxbai-rerank-large-v1

Text Classification • Updated about 10 hours ago • 38.8k • • 124
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 63

Note "We clean data"
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

Paper • 2402.18334 • Published Feb 28, 2024 • 12
QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 50

Note Trade more computation with less memory. It's much like if you do not want to remember all the corollary from a math class, you'd then have to derive everything from the 3 axioms.
Large language models surpass human experts in predicting neuroscience results

Paper • 2403.03230 • Published Mar 4, 2024 • 4

Note Perplexity score used to decide which abstract makes more sense, given all the previous works on the field of neuroscience, beating expert's annotation. Advice to expert: pay attention when you annotate, otherwise you might lose your job (!)
Equall/Saul-7B-Instruct-v1

Text Generation • Updated Mar 10, 2024 • 29.5k • 81
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

Paper • 2403.04696 • Published Mar 7, 2024 • 4
Paused

71

71

AutoMerger

♾
Representation Engineering: A Top-Down Approach to AI Transparency

Paper • 2310.01405 • Published Oct 2, 2023 • 5

Note Fixed control vector gets added into each layer's output, convenient package here: https://github.com/vgel/repeng quite easy to use, and allow linear control along user-defined semantic dimension
Editing Conceptual Knowledge for Large Language Models

Paper • 2403.06259 • Published Mar 10, 2024 • 1
Learning to Edit: Aligning LLMs with Knowledge Editing

Paper • 2402.11905 • Published Feb 19, 2024 • 1
Knowledge Editing on Black-box Large Language Models

Paper • 2402.08631 • Published Feb 13, 2024 • 3
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering

Paper • 2311.06668 • Published Nov 11, 2023 • 5
Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries

Paper • 2402.13043 • Published Feb 20, 2024 • 2
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 77

Note Revolutionise next-token prediction based pre-training to enhance reasoning. Routing through multiple rationales for next-k-token prediction, combined with RL-based survival of the fittest rationales, achieve significant improvement in reasoning at the cost of huge increase in training cost. Likely lags behind Q* due to the missing of adaptive control of thought process.
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14, 2024 • 55

Note Arsenal for training VLM-based Front-end designer.
Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5, 2024 • 95

Note Stanford's drop-in replacement model for automating front-end design. Image (or a sketch) of the target website in, front-end code out.
SALT-NLP/Design2Code-18B-v0

Updated Mar 16, 2024 • 39

Note This LLM does Front-End engineering for you at the cost of your electricity.
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 126

Note Apple's own VLM.
vikhyatk/moondream2

Image-Text-to-Text • Updated Jan 9 • 133k • 1.07k

Note Very performant small VLM. It appears an extra vision encoder might just do the trick?
prometheus-eval/prometheus-13b-v1.0

Text2Text Generation • Updated Oct 14, 2023 • 2.12k • 137

Note Fine-tuned LLM for acting as LLM-as-a-Judge
Gorilla: Large Language Model Connected with Massive APIs

Paper • 2305.15334 • Published May 24, 2023 • 5

Note How to train a GPT-4 level function calling LLM from UC Berkeley
gorilla-llm/gorilla-openfunctions-v2

Text Generation • Updated Apr 18, 2024 • 718 • 227

Note GPT-4 level FunctionCalling LLM from UC Berkeley
Learning to Compress Prompt in Natural Language Formats

Paper • 2402.18700 • Published Feb 28, 2024 • 2

Note Soft Prompt Compression from Samsung
MemGPT: Towards LLMs as Operating Systems

Paper • 2310.08560 • Published Oct 12, 2023 • 7

Note LLM OS with MemGPT from UC Berkeley
Get an A in Math: Progressive Rectification Prompting

Paper • 2312.06867 • Published Dec 11, 2023 • 2
Qwen/Qwen-VL

Text Generation • Updated Jan 25, 2024 • 17.7k • 232
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference

Paper • 2110.03742 • Published Sep 24, 2021 • 4
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Paper • 2006.16668 • Published Jun 30, 2020 • 3
parler-tts/parler_tts_mini_v0.1

Text-to-Speech • Updated Apr 30, 2024 • 9.9k • 349
instruction-pretrain/instruction-synthesizer

Text Generation • Updated 13 days ago • 451 • 77
jinaai/jina-embeddings-v2-base-en

Feature Extraction • Updated Jan 6 • 232k • • 715
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Paper • 2407.11062 • Published Jul 10, 2024 • 8