AK

akhaliq

AI & ML interests

None yet

Recent Activity

commented a paper about 10 hours ago
LearnLM: Improving Gemini for Learning
commented a paper about 11 hours ago
OpenAI o1 System Card
View all activity

Organizations

Hugging Face's profile picture PromptHero Diffusion Models's profile picture Org for Gradio Tests's profile picture TEXTurePaper's profile picture EleutherAI's profile picture pix2pix-zero-library's profile picture 🧨Diffusers's profile picture Spaces-explorers's profile picture Gradio's profile picture DALLE mini's profile picture ControlNet 1.1 Preview's profile picture Demo Crafters πŸ€— 's profile picture controlnet-library's profile picture damo-vilab's profile picture Docs Demos's profile picture CVPR Demo Track's profile picture Adobe's profile picture make-a-audio's profile picture AUBMC AIM's profile picture autonomousvision's profile picture ELITE's profile picture SenseTime X-Lab's profile picture pytorch's profile picture mindspore-ai's profile picture test's profile picture PaddlePaddle's profile picture isl-org's profile picture  Visual-Attention-Network's profile picture Coursera's profile picture Tsinghua Machine Learning Group's profile picture tensorflow's profile picture onnx's profile picture custom diffusion's profile picture video-p2p-library's profile picture Gradio-Themes-Party's profile picture Picsart AI Research's profile picture Gradio-Blocks-Party's profile picture group2's profile picture Open-Source AI Meetup's profile picture lora concepts library's profile picture Huggingface Projects's profile picture EuroPython 2022's profile picture Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University's profile picture ICML 2022's profile picture ECCV 2022's profile picture NAACL 2022's profile picture YOLOv7's profile picture Kornia AI's profile picture Tune a video concepts library's profile picture CompVis's profile picture MMLab@NTU's profile picture AttendAndExcite's profile picture  Nyx AI's profile picture SIGGRAPH 2022's profile picture Gradio PR Deploys's profile picture VideoCrafter's profile picture Gradio Test Deploy's profile picture Generative AI For Audio's profile picture Interspeech2022's profile picture EuroSciPy 2022's profile picture CompVis Community's profile picture CarperAI's profile picture DeepFloyd's profile picture SIGGRAPH Asia 2022 Demos's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture Stable Diffusion Dreamfusion Library's profile picture Startup Shell's profile picture Musika's profile picture OpenShape's profile picture meta-private's profile picture Editing Images's profile picture ICCV2023's profile picture ICML2023's profile picture meta-mms's profile picture TTS Eval (OLD)'s profile picture ZeroGPU Explorers's profile picture Editing Audio's profile picture gg-hf's profile picture Gradio Templates's profile picture FlagOpen's profile picture AGI Workshop @ Tsinghua's profile picture ICLR2024's profile picture Gradio Community's profile picture TTS AGI's profile picture Narra's profile picture Social Post Explorers's profile picture Top Contributors: Space Likes's profile picture Top Contributors: Profile Followers's profile picture SIGGRAPH Asia 2024's profile picture Vchitect-XL's profile picture GradioSingaporeHackathon's profile picture Rhymes.AI's profile picture community-science-team's profile picture Sambanova-Gradio-Hackathon's profile picture

akhaliq's activity

posted an update 5 days ago
view post
Post
2237
Google drops Gemini 2.0 Flash Thinking

a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more

now available in anychat, try it out: akhaliq/anychat
reacted to freddyaboulton's post with πŸ€—πŸš€πŸ”₯ 11 days ago
view post
Post
1817
Version 0.0.21 of gradio-pdf now properly loads chinese characters!
posted an update 27 days ago
view post
Post
3706
QwQ-32B-Preview is now available in anychat

A reasoning model that is competitive with OpenAI o1-mini and o1-preview

try it out: akhaliq/anychat
  • 1 reply
Β·
posted an update 27 days ago
view post
Post
3669
New model drop in anychat

allenai/Llama-3.1-Tulu-3-8B is now available

try it here: akhaliq/anychat
posted an update about 1 month ago
view post
Post
2658
anychat

supports chatgpt, gemini, perplexity, claude, meta llama, grok all in one app

try it out there: akhaliq/anychat
reacted to singhsidhukuldeep's post with ❀️ 2 months ago
view post
Post
2843
If you have ~300+ GB of V-RAM, you can run Mochi from @genmo

A SOTA model that dramatically closes the gap between closed and open video generation models.

Mochi 1 introduces revolutionary architecture featuring joint reasoning over 44,520 video tokens with full 3D attention. The model implements extended learnable rotary positional embeddings (RoPE) in three dimensions, with network-learned mixing frequencies for space and time axes.

The model incorporates cutting-edge improvements, including:
- SwiGLU feedforward layers
- Query-key normalization for enhanced stability
- Sandwich normalization for controlled internal activations

What is currently available?
The base model delivers impressive 480p video generation with exceptional motion quality and prompt adherence. Released under the Apache 2.0 license, it's freely available for both personal and commercial applications.

What's Coming?
Genmo has announced Mochi 1 HD, scheduled for release later this year, which will feature:
- Enhanced 720p resolution
- Improved motion fidelity
- Better handling of complex scene warping
  • 2 replies
Β·
reacted to prithivMLmods's post with πŸ‘ 2 months ago
view post
Post
2506
SambaNova ☁️
⚑ Inference API with cURL Demo: https://huggingface.co/spaces/prithivMLmods/sambanova-inference-api

πŸ”—Sambanova API Documentation : (grab your APIs here) https://cloud.sambanova.ai/apis

export SAMBANOVA_API_KEY=<your token>

Sambanova's Inference API.

pip install sambanova-gradio

SambaNova X Gradio

import gradio as gr
import sambanova_gradio

gr.load(
    name='Meta-Llama-3.1-405B-Instruct',
    src=sambanova_gradio.registry,
).launch()

πŸ“ƒ Documentation: https://community.sambanova.ai/docs

reacted to sagar007's post with πŸ‘πŸ‘€ 4 months ago
view post
Post
1360
Excited to share my new Gradio app featuring the impressive Llama-3.1-Storm-8B model!
This app demonstrates the capabilities of Llama-3.1-Storm-8B, an 8B parameter language model created by Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh,@akjindal53244
Key highlights of Llama-3.1-Storm-8B:

Outperforms Llama-3.1-8B-Instruct on multiple benchmarks:

Instruction Following (IFEval): +3.93%
Knowledge-driven QA (GPQA): +7.21%
Reduced Hallucinations (TruthfulQA): +9%
Function Calling (BFCL): +7.92%


Achieves impressive results with only 8B parameters
Uses innovative techniques like self-curation and model merging

Try out the model yourself: sagar007/lama_storm_8b

Kudos to the creators for pushing the boundaries of smaller language models! This work makes advanced AI more accessible and efficient.
#AI #NLP #MachineLearning #GradioApp #Llama3
posted an update 7 months ago
view post
Post
20593
Phased Consistency Model

Phased Consistency Model (2405.18407)

The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models. However, its application to high-resolution, text-conditioned image generation in the latent space (a.k.a., LCM) remains unsatisfactory. In this paper, we identify three key flaws in the current design of LCM. We investigate the reasons behind these limitations and propose the Phased Consistency Model (PCM), which generalizes the design space and addresses all identified limitations. Our evaluations demonstrate that PCM significantly outperforms LCM across 1--16 step generation settings. While PCM is specifically designed for multi-step refinement, it achieves even superior or comparable 1-step generation results to previously state-of-the-art specifically designed 1-step methods. Furthermore, we show that PCM's methodology is versatile and applicable to video generation, enabling us to train the state-of-the-art few-step text-to-video generator.
posted an update 7 months ago
view post
Post
20892
Chameleon

Mixed-Modal Early-Fusion Foundation Models

Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818)

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an alignment recipe, and an architectural parameterization tailored for the early-fusion, token-based, mixed-modal setting. The models are evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. It also matches or exceeds the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation, where either the prompt or outputs contain mixed sequences of both images and text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents.
posted an update 8 months ago
view post
Post
6265
A Careful Examination of Large Language Model Performance on Grade School Arithmetic

A Careful Examination of Large Language Model Performance on Grade School Arithmetic (2405.00332)

Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). GSM1k is designed to mirror the style and complexity of the established GSM8k benchmark, the gold standard for measuring elementary mathematical reasoning. We ensure that the two benchmarks are comparable across important metrics such as human solve rates, number of steps in solution, answer magnitude, and more. When evaluating leading open- and closed-source LLMs on GSM1k, we observe accuracy drops of up to 13%, with several families of models (e.g., Phi and Mistral) showing evidence of systematic overfitting across almost all model sizes. At the same time, many models, especially those on the frontier, (e.g., Gemini/GPT/Claude) show minimal signs of overfitting. Further analysis suggests a positive relationship (Spearman's r^2=0.32) between a model's probability of generating an example from GSM8k and its performance gap between GSM8k and GSM1k, suggesting that many models may have partially memorized GSM8k.
posted an update 8 months ago
view post
Post
4756
Octopus v4

Graph of language models

Octopus v4: Graph of language models (2404.19296)

Language models have been effective in a wide range of applications, yet the most sophisticated models are often proprietary. For example, GPT-4 by OpenAI and various models by Anthropic are expensive and consume substantial energy. In contrast, the open-source community has produced competitive models, like Llama3. Furthermore, niche-specific smaller language models, such as those tailored for legal, medical or financial tasks, have outperformed their proprietary counterparts. This paper introduces a novel approach that employs functional tokens to integrate multiple open-source models, each optimized for particular tasks. Our newly developed Octopus v4 model leverages functional tokens to intelligently direct user queries to the most appropriate vertical model and reformat the query to achieve the best performance. Octopus v4, an evolution of the Octopus v1, v2, and v3 models, excels in selection and parameter understanding and reformatting. Additionally, we explore the use of graph as a versatile data structure that effectively coordinates multiple open-source models by harnessing the capabilities of the Octopus model and functional tokens.
posted an update 8 months ago
view post
Post
4648
Layer Skip

Enabling Early Exit Inference and Self-Speculative Decoding

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding (2404.16710)

We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exit at earlier layers, without adding any auxiliary layers or modules to the model. Third, we present a novel self-speculative decoding solution where we exit at early layers and verify and correct with remaining layers of the model. Our proposed self-speculative decoding approach has less memory footprint than other speculative decoding approaches and benefits from shared compute and activations of the draft and verification stages. We run experiments on different Llama model sizes on different types of training: pretraining from scratch, continual pretraining, finetuning on specific data domain, and finetuning on specific task. We implement our inference solution and show speedups of up to 2.16x on summarization for CNN/DM documents, 1.82x on coding, and 2.0x on TOPv2 semantic parsing task.
posted an update 8 months ago
view post
Post
3518
CatLIP

CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data (2404.15653)

Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings. However, pairwise similarity computation in contrastive loss between image and text pairs poses computational challenges. This paper presents a novel weakly supervised pre-training of vision models on web-scale image-text data. The proposed method reframes pre-training on image-text data as a classification task. Consequently, it eliminates the need for pairwise similarity computations in contrastive loss, achieving a remarkable 2.7times acceleration in training speed compared to contrastive learning on web-scale data. Through extensive experiments spanning diverse vision tasks, including detection and segmentation, we demonstrate that the proposed method maintains high representation quality.
posted an update 8 months ago
view post
Post
2985
OpenELM

An Efficient Language Model Family with Open-source Training and Inference Framework

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework (2404.14619)

The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring 2times fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors.
posted an update 8 months ago
view post
Post
3399
Phi-3 Technical Report

A Highly Capable Language Model Locally on Your Phone

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone (2404.14219)

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench).
posted an update 8 months ago
view post
Post
4276
Dynamic Typography

Bringing Words to Life

Dynamic Typography: Bringing Words to Life (2404.11614)

Text animation serves as an expressive medium, transforming static communication into dynamic experiences by infusing words with motion to evoke emotions, emphasize meanings, and construct compelling narratives. Crafting animations that are semantically aware poses significant challenges, demanding expertise in graphic design and animation. We present an automated text animation scheme, termed "Dynamic Typography", which combines two challenging tasks. It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts. Our technique harnesses vector graphics representations and an end-to-end optimization-based framework. This framework employs neural displacement fields to convert letters into base shapes and applies per-frame motion, encouraging coherence with the intended textual concept. Shape preservation techniques and perceptual loss regularization are employed to maintain legibility and structural integrity throughout the animation process. We demonstrate the generalizability of our approach across various text-to-video models and highlight the superiority of our end-to-end methodology over baseline methods, which might comprise separate tasks. Through quantitative and qualitative evaluations, we demonstrate the effectiveness of our framework in generating coherent text animations that faithfully interpret user prompts while maintaining readability.