Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark Paper • 2410.14702 • Published Oct 6, 2024 • 1
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks Paper • 2404.14723 • Published Apr 23, 2024 • 10
$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space Paper • 2402.05195 • Published Feb 7, 2024 • 18
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations Paper • 2312.04655 • Published Dec 7, 2023 • 20
LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks Paper • 2311.09564 • Published Nov 16, 2023
InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis Paper • 2302.08624 • Published Feb 16, 2023 • 2
TarGEN: Targeted Data Generation with Large Language Models Paper • 2310.17876 • Published Oct 27, 2023
"John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of Feasibility Paper • 2210.07471 • Published Oct 14, 2022
InstructExcel: A Benchmark for Natural Language Instruction in Excel Paper • 2310.14495 • Published Oct 23, 2023 • 1