86 54 130

Feynman Innovations

ajibawa-2023

AjinkyaBawase

AI & ML interests

LLM, RL, DL, ML, AGI. Developing LLMs (preferably fully fine tuned ) for various use cases.

Recent Activity

reacted to their post with ❤️ 34 minutes ago

Hi All, I recently released two Audio datasets which are generated using my earlier released dataset: https://huggingface.co/datasets/ajibawa-2023/Children-Stories-Collection First Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection-Large has 5600++ stories in .mp3 format. Second Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection has 600 stories in .mp3 format.

reacted to their post with 🚀 34 minutes ago

reacted to their post with 👍 34 minutes ago

View all activity

Organizations

ajibawa-2023's activity

reacted to their post with ❤️🚀👍🔥 34 minutes ago

Post

Hi All, I recently released two Audio datasets which are generated using my earlier released dataset: ajibawa-2023/Children-Stories-Collection

First Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection-Large has 5600++ stories in .mp3 format.

Second Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection has 600 stories in .mp3 format.

posted an update 36 minutes ago

Post

liked a dataset 6 days ago

ajibawa-2023/Audio-Children-Stories-Collection-Large

Viewer • Updated 9 days ago • 2.1k • 343 • 4

updated a dataset 9 days ago

ajibawa-2023/Audio-Children-Stories-Collection-Large

Viewer • Updated 9 days ago • 2.1k • 343 • 4

published a dataset 9 days ago

ajibawa-2023/Audio-Children-Stories-Collection-Large

Viewer • Updated 9 days ago • 2.1k • 343 • 4

New activity in ajibawa-2023/Children-Stories-Collection 12 days ago

Information and DOI request

#1 opened 11 months ago by

wiirginia

liked a dataset 14 days ago

ajibawa-2023/Audio-Children-Stories-Collection

Viewer • Updated 14 days ago • 600 • 129 • 2

updated a dataset 14 days ago

ajibawa-2023/Audio-Children-Stories-Collection

Viewer • Updated 14 days ago • 600 • 129 • 2

published a dataset 14 days ago

ajibawa-2023/Audio-Children-Stories-Collection

Viewer • Updated 14 days ago • 600 • 129 • 2

liked a model 21 days ago

sesame/csm-1b

Text-to-Speech • Updated 24 days ago • 88.1k • • 1.83k

New activity in ajibawa-2023/General-Stories-Collection 24 days ago

Which models were used to generate this dataset?

#2 opened 7 months ago by

sam-paech

liked a dataset about 1 month ago

aurora-m/books-generation

Viewer • Updated Mar 3 • 713k • 254 • 2

New activity in ajibawa-2023/Code-290k-ShareGPT about 1 month ago

How is this dataset created?

#3 opened 5 months ago by

oo22010

New activity in ajibawa-2023/Python-Code-23k-ShareGPT about 1 month ago

Origin

#2 opened about 1 month ago by

danfperam

New activity in cognitivecomputations/Code-290k-ShareGPT-Vicuna about 1 month ago

Data generation process and LLM used

#2 opened about 1 month ago by

Chintan-Shah

reacted to singhsidhukuldeep's post with 🔥 2 months ago

Post

3616

Exciting Research Alert: Revolutionizing Complex Information Retrieval!

A groundbreaking paper from researchers at MIT, AWS AI, and UPenn introduces ARM (Alignment-Oriented LLM-based Retrieval Method), a novel approach to tackle complex information retrieval challenges.

>> Key Innovations

Information Alignment
The method first decomposes queries into keywords and aligns them with available data using both BM25 and embedding similarity, ensuring comprehensive coverage of information needs.

Structure Alignment
ARM employs a sophisticated mixed-integer programming solver to identify connections between data objects, exploring relationships beyond simple semantic matching.

Self-Verification
The system includes a unique self-verification mechanism where the LLM evaluates and aggregates results from multiple retrieval paths, ensuring accuracy and completeness.

>> Performance Highlights

The results are impressive:
- Outperforms standard RAG by up to 5.2 points in execution accuracy on Bird dataset
- Achieves 19.3 points higher F1 scores compared to existing approaches on OTT-QA
- Reduces the number of required LLM calls while maintaining superior retrieval quality

>> Technical Implementation

The system uses a three-step process:
1. N-gram indexing and embedding computation for all data objects
2. Constrained beam decoding for information alignment
3. Mixed-integer programming optimization for structure exploration

This research represents a significant step forward in making complex information retrieval more efficient and accurate. The team's work demonstrates how combining traditional optimization techniques with modern LLM capabilities can solve challenging retrieval problems.

reacted to Tonic's post with 🔥 2 months ago

Post

2389

🙋🏻‍♂️hey there folks ,

Goedel's Theorem Prover is now being demo'ed on huggingface : Tonic/Math

give it a try !