flflow (flow)

liked 4 Spaces about 1 month ago

Running on L4

851

🎮

Weight Comparator

reacted to aaditya's post with 🚀 about 2 months ago

Post

3362

Last Week in Medical AI: Top Research Papers/Models 🔥
(November 2 -November 9, 2024)

🏅 Medical AI Paper of the Week:
Exploring Large Language Models for Specialist-level Oncology Care

Medical LLM & Other Models:
- GSCo: Generalist-Specialist AI Collaboration
- PediatricsGPT: Chinese Pediatric Assistant
- MEG: Knowledge-Enhanced Medical QA
- AutoProteinEngine: Multimodal Protein LLM

Frameworks and Methodologies:
- BrainSegFounder: 3D Neuroimage Analysis
- PASSION: Sub-Saharan Dermatology Dataset
- SAM for Lung X-ray Segmentation
- Label Critic: Data-First Approach
- Medprompt Runtime Strategies

Medical LLM Applications:
- CataractBot: Patient Support System
- CheX-GPT: X-ray Report Enhancement
- CardioAI: Cancer Cardiotoxicity Monitor
- HealthQ: Healthcare Conversation Chain
- PRObot: Diabetic Retinopathy Assistant

Medical LLMs & Benchmarks:
- MediQ: Clinical Reasoning Benchmark
- Touchstone: Segmentation Evaluation
- Medical LLM Adaptation Progress
- Fine-Tuning Medical QA Strategies

AI in Healthcare Ethics:
- Healthcare Robotics with LLMs
- XAI in Clinical Practice
- Precision Rehabilitation Framework
- Multimodal AI Challenges

Now you can watch and listen to the latest Medical AI papers daily on our YouTube and Spotify channels as well!

- Full Thread: https://x.com/OpenlifesciAI/status/1855207141302473090
- YouTube: https://youtu.be/ad0uTnYuTo8
- Spotify: https://open.spotify.com/episode/6s39t1UJZk1i10szuXP2qN

reacted to singhsidhukuldeep's post with 🤝 2 months ago

Post

1302

Looks like @Meta thinks we forgot they created PyTorch, so now they've open-sourced Lingua, a powerful and flexible library for training and inferencing large language models.

Things that stand out:

- Architecture: Pure PyTorch nn.Module implementation for easy customization.

- Checkpointing: Uses the new PyTorch distributed saving method (.distcp format) for flexible model reloading across different GPU configurations.

- Configuration: Utilizes data classes and YAML files for intuitive setup and modification.

- Profiling: Integrates with xFormers' profiler for automatic MFU and HFU calculation, plus memory profiling.

- Slurm Integration: Includes stool.py for seamless job launching on Slurm clusters.

Some results from @Meta to show off:

- 1B parameter models trained on 60B tokens achieve strong performance across various NLP tasks.

- 7B parameter Mamba model (trained on 200B tokens) shows competitive results with Llama 7B on benchmarks like ARC, MMLU, and BBH.

If you're working on LLM research or looking to experiment with cutting-edge language model architectures, Lingua is definitely worth exploring.

reacted to TuringsSolutions's post with 👀 3 months ago

Post

1889

Hyperdimensional Computing + Neural Network, tell your friends. To my knowledge, this is a completely novel implementation of HDC+Neural Networks. It would be a direct competitor to Transformers. It is off the charts more computationally efficient than Transformers could ever hope to be (which is why I tested it in the first place). It is far more similar to biological processes. My testing so far shows that it works surprisingly well. One surprise so far from my testing, adding an Attention Mechanism to the model does nothing at all. Weirdest thing. Like 1% performance increase. I guess Attention Is Not All You Need?

I made a Github repository for my Hyperdimensional Computing Neural Network: https://github.com/RichardAragon/HyperDimensionalComputingNeuralNetwork

I made a YouTube video showcasing the model and some of my experiments with it: https://youtu.be/Eg51o519zVM

4 replies

·

reacted to singhsidhukuldeep's post with 👀 4 months ago

Post

3463

This is an absolutely mind-boggling experiment!

@GuangyuRobert (Twitter Handle) from MIT has created Project Sid, which simulates over 1,000 autonomous AI agents collaborating in a Minecraft environment, operating for extended periods without human intervention. This simulation demonstrates unprecedented levels of agent interaction, decision-making, and societal development.

Agents operate independently for hours or days, showcasing advanced decision-making algorithms and goal-oriented behavior.

The simulation produced complex, emergent phenomena, including:
- Economic systems with currency (gems) and trading
- Cultural development and religious practices
- Agents even understood bribing. Priests were moving the most gems to bribe people into following them!
- Governmental structures and democratic processes

Project Sid addresses fundamental challenges in AI research:
- Coherence: Maintaining consistent agent behavior over extended periods.
- Multi-agent Collaboration: Enabling effective communication and coordination among numerous AI entities.
- Long-term Progression: Developing agents capable of learning and evolving over time.

While Minecraft serves as the initial testbed, the underlying AI architecture is designed to be game-agnostic, suggesting potential applications in various digital environments and real-world simulations.

Imagine a policy being debated by the government and how it might affect society; Sid can simulate its impact!

Even if this remains just a game experiment, the project successfully manages 1,000+ agents simultaneously, a feat that requires robust distributed computing and efficient agent architecture.

reacted to MonsterMMORPG's post with 👀 4 months ago

Post

1301

First fully multi-GPU supporting and very advanced batch image captioner APP with Gradio interface published (as far as i know first)

Multi-GPU batch caption with JoyCaption. JoyCaption uses Meta-Llama-3.1–8B and google/siglip-so400m-patch14–384 and a fine tuned image captioning neural network.

Link : https://www.patreon.com/posts/110613301

Link for batch caption editor : https://www.patreon.com/posts/108992085

Coding multi-gpu in Python and Torch and bitsandbytes was truly a challange.

Our APP uses JoyCaption image captioning fine tuned model.

Our APP supports bitsandbytes 4bit model loading as well even in multi GPU mode (9.5 GB VRAM)

Tested on 8x RTX A6000 (cloud) and RTX 3090 TI + RTX 3060 (my PC)

1-click to install on Windows, RunPod and Massed Compute

Excellent caption quality, automatically distributes images into each GPU, lots of features. You can resume caption with skip captioned images option.

For full details checkout screenshots

liked a model 5 months ago

LoneStriker/Meta-Llama-3.1-8B-Instruct-6.0bpw-h6-exl2

Text Generation • Updated Jul 24 • 8 • 1

New activity in LoneStriker/Meta-Llama-3.1-8B-Instruct-6.0bpw-h6-exl2 5 months ago

Magic

1

#1 opened 5 months ago by

flflow

liked a model 5 months ago

NeverSleep/Lumimaid-v0.2-8B

Text Generation • Updated Jul 31 • 693 • 65

reacted to Undi95's post with ❤️ 5 months ago

Post

13370

Exciting news!

After a long wait, Ikari and me finally made a new release of our last model on NeverSleep repo: Lumimaid-v0.2

This model can be used in different size, from the small Llama-3.1-8B to the gigantic Mistral-Large-123B, finetuned by us.

Try them now!

- NeverSleep/Lumimaid-v0.2-8B
- NeverSleep/Lumimaid-v0.2-12B
- NeverSleep/Lumimaid-v0.2-70B
- NeverSleep/Lumimaid-v0.2-123B

All the datasets we used will be added and credit will be given!
For the quant, we wait for fix to be applied (https://github.com/ggerganov/llama.cpp/pull/8676)
Hope you will enjoy them!

4 replies

·

reacted to sequelbox's post with 👀 5 months ago

Post

1330

JUST RELEASED: Fireplace 2 for Llama 3.1 8b Instruct!

Fireplace 2 is an 'expansion pack' of structured outputs you can request during your chat, using special request tokens to let Llama know you're looking for specific types of responses:
Inline function calls
SQL queries
JSON objects
Data visualization with matplotlib

ValiantLabs/Llama3.1-8B-Fireplace2

2 replies

·

liked a model 5 months ago

turboderp/Mistral-Nemo-Instruct-12B-exl2

Updated Jul 22 • 58 • 19

reacted to kenshinn's post with ❤️ 5 months ago

Post

2026

Sparse MoE (SMoE) has an unavoidable drawback: the performance of SMoE heavily relies on the choice of hyper-parameters, such as the number of activated experts per token (top-k) and the number of experts.

Also, identifying the optimal hyper-parameter without a sufficient number of ablation studies is challenging. As the size of the models continues to grow, this limitation could result in a significant waste of computational resources, and in turn, could hinder the efficiency of training MoE-based models in practice.

(READ MORE ↓↓↓) Now, our DynMoE addresses these challenges! 🙌 DynMoE incorporates:
(1) a novel gating method that enables each token to automatically determine the number of experts to activate.

(2) An adaptive process automatically adjusts the number of experts during training. Extensive numerical results across Vision, Language, and Vision-Language tasks demonstrate the effectiveness of our approach to achieve competitive performance compared to GMoE for vision and language tasks, and MoE-LLaVA for vision-language tasks, while maintaining efficiency by activating fewer parameters.

Our code is available at https://github.com/LINs-lab/DynMoE, also see the checkpoints at LINs-lab/dynmoe-family-665ed5a331a7e84463cab01a