Llama 3.1 405B Instruct beats GPT-4o on MixEval-Hard
Just ran MixEval for 405B, Sonnet-3.5 and 4o, with 405B landing right between the other two at 66.19
The GPT-4o result of 64.7 replicated locally but Sonnet-3.5 actually scored 70.25/69.45 in my replications ๐ค Still well ahead of the other 2 though.
Do you want to improve AI in your language? Here's how you can help.
I'm exploring different AI techniques for an upcoming project in journalism, and I wanted to test a cool idea by @davanstrien, Data is better together, which aims to foster a community of people to create DPO datasets in different languages.
This project gives the opportunity to explore various concepts: - Direct Preference Optimization (DPO) - Synthetic data - Data annotation - LLM as a judge
1๏ธโฃ Take the Aya dataset of human-annotated prompt-completion pairs across 71 languages and filter it to include only those in the language youโre interested in.
2๏ธโฃ Use distilabel from Argilla to generate a second response for each prompt and evaluate which response is best.
Basicaly, DPO datasets have a chosen and a rejected responses to a question, which helps align models on specific tasks. To quote Daniel: "Currently, there are only a few DPO datasets available for a limited number of languages. By generating more DPO datasets for different languages, we can help to improve the quality of generative models in a wider range of languages."
3๏ธโฃ Send this dataset and evaluations to the easy-to-use interface to evaluate the evaluations.
This is where you can help. :) You can rate the LLM evaluation of the prompt-responses pairs. For my example, I built a dataset in French. And without wanting to start a debate about homeopathy, the second result is clearly better in the example below! fdaudens/demo-aya-dpo-french
๐ Happy to announce about the collection called "Blackhole". It is a black hole of high quality data in many fields, multilingual to train LLMs with SFT and DPO methods. ๐ฆ There are now over 30++ high-quality datasets available so you can start creating interesting models. It will be updated in the future, glad if it helps someone.
๐ GPU Acceleration: RAPIDS cuDF leverages GPU computing. Users switch to GPU-accelerated operations without modifying existing pandas code.
๐ Unified Workflows: Seamlessly integrates GPU and CPU operations, falling back to CPU when necessary.
๐ Optimized Performance: With extreme parallel operation opportunity of GPUs, this achieves up to 150x speedup in data processing, demonstrated through benchmarks like DuckDB.
๐ New Limitations:
๐ฎ GPU Availability: Requires a GPU (not everything should need a GPU)
๐ Library Compatibility: Currently in the initial stages, all the functionality cannot be ported
๐ข Data Transfer Overhead: Moving data between CPU and GPU can introduce latency if not managed efficiently. As some operations still run on the CPU.
๐ค User Adoption: We already had vectorization support in Pandas, people just didn't use it as it was difficult to implement. We already had DASK for parallelization. It's not that solutions didn't exist
New Updates OpenGPT 4o 1. Live Chat (also known as video chat) (very powerful and fast, it can even identify famous places and persons) 2. Powerful Image Generation