Ankit Aglawe's picture

Ankit Aglawe

AnkitAI
ยท

AI & ML interests

Text Classification | Multimodality

Recent Activity

Organizations

huggingPartyParis's profile picture

AnkitAI's activity

reacted to Tar9897's post with ๐Ÿ‘ 6 months ago
view post
Post
3280
As we advance on the path towards true Artificial General Intelligence (AGI), it's crucial to recognize and address the limitations inherent in current technologies, particularly in large language models (LLMs) like those developed by OpenAI. While LLMs excel in processing and generating text, their capabilities are largely constrained to the domains of natural language understanding and generation. This poses significant limitations when dealing with more complex, abstract mathematical concepts such as topological analysis, 3D geometry, and homotopy type theory.

Topological Analysis and 3D Geometry: LLMs currently do not possess the inherent ability to understand or interpret the spatial and geometric data that is critical in fields like robotics, architecture, and advanced physics. These models lack the capacity to visualize or manipulate three-dimensional objects or comprehend the underlying properties that govern these forms.

Homotopy Type Theory is a branch of mathematics that combines homotopy theory and type theory. Homotopy type theory provides tools for a more robust handling of equivalences and transformations, something that LLMs are not designed to handle directly.

For the development of AGI, it is not sufficient to merely enhance existing models' capacities within their linguistic domains. Instead, a synthesis of symbolic AI with an understanding of homotopy type theory could pave the way. Symbolic AI, which manipulates symbols and performs logical operations, when combined with the abstract mathematical reasoning of homotopy type theory, could lead to breakthroughs in how machines understand and interact with the world.

To address these limitations we have developed Tenzin, which is a one-of-a-kind model with a planned release date within the next 1-2 weeks . To learn more join the waitlist at https://octave-x.com/.
ยท
reacted to gokaygokay's post with ๐Ÿ‘ 7 months ago
view post
Post
3005
I've created a Stable Diffusion 3 (SD3) image generation space for convenience. Now you can:

1. Generate SD3 prompts from images
2. Enhance your text prompts (turn 1-2 words into full SD3 prompts)

https://huggingface.co/spaces/gokaygokay/SD3-with-VLM-and-Prompt-Enhancer

These features are based on my custom models:

- VLM captioner for prompt generation:
- gokaygokay/sd3-long-captioner

- Prompt Enhancers for SD3 Models:
- gokaygokay/Lamini-Prompt-Enchance-Long
- gokaygokay/Lamini-Prompt-Enchance

You can now simplify your SD3 workflow with these tools!
reacted to nikgr's post with ๐Ÿ”ฅ 7 months ago
view post
Post
1560
๐Ÿฆ Do you remember IBIS? Not a fancy bird but the open challenge in Inferring Binding Specificities of unexplored human transcription factors. Check our site (https://ibis.autosome.org/) and have a sip of fresh news below.

๐Ÿ‘ฅ More than 100 teams registered for the challenge yet only two dozen are using the opportunity to explore their models on the Leaderboard. Don't miss the chance to participate in the Leaderboard stage, although independently of that you can submit the final solution.

๐ŸŒ Remember, the training data for Leaderboard and Final are available online, and you are free to mix-and-match it in any combination.

๐ŸŒŒ For Leaderboard, we have received 650 total submissions of AAA (advanced ML) and 296 PWM models (a whopping set of 6682 PWMs in total).

๐Ÿš€ For PWMs, the baseline is left far behind, but some TFs remain tough nuts to be cracked (see the attached figure 1).

๐Ÿ“ˆ For AAAs, there is a solid improvement over the best-submitted PWMs in A2G, but the G2A discipline remains unpopular (see the attached figure 2). Free hint: this is your chance!

๐Ÿ’ก Another free hint: If your model tends to overfit given a limited set of data for some TFs don't forget to use reverse-complement and shift augmentations. Also, don't hesitate to use multitarget models i.e. predicting the binding of multiple TFs at the same time.

๐Ÿ’ก Last but not least, try to combine knowledge from all accessible experiment types, especially for G2A discipline (ChIP-Seq & genomic HT-SELEX) in a single model!

๐Ÿ“ฃ Finally and importantly, following the requests from the community, we decided to EXTEND the Leaderboard until the final submission deadline.

๐Ÿ—“๏ธ The final submission deadline is also EXTENDED until Aug 15. The final submission form and details will be posted on the IBIS website in the first half of July, follow our Telegram group and mailing list (see the links at https://ibis.autosome.org).
reacted to m-ric's post with ๐Ÿ”ฅ 7 months ago
view post
Post
2673
๐—ฌ๐—ผ๐˜‚ ๐—ฑ๐—ผ๐—ป'๐˜ ๐—ป๐—ฒ๐—ฒ๐—ฑ "๐—ณ๐˜‚๐—ป๐—ฐ๐˜๐—ถ๐—ผ๐—ป ๐—ฐ๐—ฎ๐—น๐—น๐—ถ๐—ป๐—ด ๐—ณ๐—ถ๐—ป๐—ฒ-๐˜๐˜‚๐—ป๐—ถ๐—ป๐—ด" ๐˜๐—ผ ๐—ฏ๐˜‚๐—ถ๐—น๐—ฑ ๐—ด๐—ผ๐—ผ๐—ฑ ๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€ โ›”

It's trendy to share models "fine-tuned for function calling"; but from my observations, this fine-tuning is not necessary or sufficient to build good agent systems.
To name only a few:
๐Ÿฆโ€โฌ› Nexusflow/๐—ก๐—ฒ๐˜…๐˜‚๐˜€๐—ฅ๐—ฎ๐˜ƒ๐—ฒ๐—ป-๐—ฉ๐Ÿฎ-๐Ÿญ๐Ÿฏ๐—•
โŒ˜ CohereForAI/๐—ฐ๐Ÿฐ๐—ฎ๐—ถ-๐—ฐ๐—ผ๐—บ๐—บ๐—ฎ๐—ป๐—ฑ-๐—ฟ-๐—ฝ๐—น๐˜‚๐˜€
โ›ต๏ธ mistralai/๐— ๐—ถ๐˜…๐˜๐—ฟ๐—ฎ๐—น-๐Ÿด๐˜…๐Ÿฎ๐Ÿฎ๐—•-๐—œ๐—ป๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜-๐˜ƒ๐Ÿฌ.๐Ÿญ
"Fine-tuned for function-calling" generally means "fine-tuned to generate function calls in correct JSON for extremely simple tasks". In other terms, it means "improve the formatting of the tool calls".

Yet I discovered two things while improving Transformers Agents:
๐Ÿง Even when used as JSON agents, these fine-tuned models don't perform very well
๐Ÿ… ๐™‚๐™ค๐™ค๐™™ ๐™—๐™–๐™จ๐™š ๐™ข๐™ค๐™™๐™š๐™ก๐™จ ๐™ฅ๐™š๐™ง๐™›๐™ค๐™ง๐™ข ๐™—๐™š๐™ฉ๐™ฉ๐™š๐™ง ๐™ฌ๐™ž๐™ฉ๐™๐™ค๐™ช๐™ฉ ๐™–๐™ฃ๐™ฎ ๐™›๐™ž๐™ฃ๐™š-๐™ฉ๐™ช๐™ฃ๐™ž๐™ฃ๐™œ, ๐™Ÿ๐™ช๐™จ๐™ฉ ๐™ฅ๐™ก๐™–๐™ž๐™ฃ ๐™ฅ๐™ง๐™ค๐™ข๐™ฅ๐™ฉ๐™ž๐™ฃ๐™œ. (Llama-3-70B-Instruct, GPT-4o, Claude-3.5-Sonnet)

๐Ÿ‘‡ The graph below shows the count of errors for my GPT-4o validation run on the GAIA benchmark: ๐™ฐ๐š๐šŽ๐š—๐š๐™ฟ๐šŠ๐š›๐šœ๐š’๐š—๐š๐™ด๐š›๐š›๐š˜๐š› and ๐™ฐ๐š๐šŽ๐š—๐š๐™ด๐šก๐šŽ๐šŒ๐šž๐š๐š’๐š˜๐š—๐™ด๐š›๐š›๐š˜๐š› are the ones caused by incorrect formatting.
โžค As you can see, their count is already close to 0!
And given that GPT-4o is certainly not fine-tuned for our Code tool calling format, this shows that "function calling fine-tuning" is not necessary!

The hardest thing to get right in an agent is still to ๐™ฅ๐™ก๐™–๐™ฃ ๐™œ๐™ค๐™ค๐™™ ๐™ฉ๐™–๐™จ๐™ -๐™จ๐™ค๐™ก๐™ซ๐™ž๐™ฃ๐™œ ๐™ฉ๐™ง๐™–๐™Ÿ๐™š๐™˜๐™ฉ๐™ค๐™ง๐™ž๐™š๐™จ ๐™ค๐™ซ๐™š๐™ง ๐™จ๐™š๐™ซ๐™š๐™ง๐™–๐™ก ๐™จ๐™ฉ๐™š๐™ฅ๐™จ.
To improve this, we could:
- Use more powerful base models
- Make tool calling datasets with complex solving trajectories
- Use RL! cc @lvwerra
  • 3 replies
ยท
replied to DmitryRyumin's post 7 months ago
reacted to DmitryRyumin's post with ๐Ÿ”ฅ 7 months ago
view post
Post
2053
๐Ÿ”ฅ๐ŸŽญ๐ŸŒŸ New Research Alert - ECCV 2024 (Avatars Collection)! ๐ŸŒŸ๐ŸŽญ๐Ÿ”ฅ
๐Ÿ“„ Title: Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture ๐Ÿ”

๐Ÿ“ Description: Topo4D is a novel method for automated, high-fidelity 4D head tracking that optimizes dynamic topological meshes and 8K texture maps from multi-view time-series images.

๐Ÿ‘ฅ Authors: @Dazz1e , Y. Cheng, @Ryan-sjtu , H. Jia, D. Xu, W. Zhu, Y. Yan

๐Ÿ“… Conference: ECCV, 29 Sep โ€“ 4 Oct, 2024 | Milano, Italy ๐Ÿ‡ฎ๐Ÿ‡น

๐Ÿ“„ Paper: Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture (2406.00440)

๐ŸŒ Github Page: https://xuanchenli.github.io/Topo4D/
๐Ÿ“ Repository: https://github.com/XuanchenLi/Topo4D

๐Ÿš€ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

๐Ÿš€ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

๐Ÿš€ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

๐Ÿ” Keywords: #Topo4D #4DHead #3DModeling #4DCapture #FacialAnimation #ComputerGraphics #MachineLearning #HighFidelity #TextureMapping #DynamicMeshes #GaussianSplatting #VisualEffects #ECCV2024
  • 1 reply
ยท
reacted to merve's post with ๐Ÿ”ฅ 7 months ago
reacted to alvdansen's post with ๐Ÿ”ฅ 7 months ago
view post
Post
5836
New LoRA Model!

I trained this model on a new spot I'm really excited to share (soon!)

This Monday I will be posting my first beginning to end blog showing the tool I've used, dataset, captioning techniques, and parameters to finetune this LoRA.

For now, check out the model in the link below.

alvdansen/m3lt
ยท