lab

lab212

AI & ML interests

None yet

Recent Activity

replied to chansung's post about 2 months ago

🎙️ Listen to the audio "Podcast" of every single Hugging Face Daily Papers. Now, "AI Paper Reviewer" project can automatically generates audio podcasts on any papers published on arXiv, and this is integrated into the GitHub Action pipeline. I sounds pretty similar to hashtag#NotebookLM in my opinion. 🎙️ Try out yourself at https://deep-diver.github.io/ai-paper-reviewer/ This audio podcast is powered by Google technologies: 1) Google DeepMind Gemini 1.5 Flash model to generate scripts of a podcast, then 2) Google Cloud Vertex AI's Text to Speech model to synthesize the voice turning the scripts into the natural sounding voices (with latest addition of "Journey" voice style) "AI Paper Reviewer" is also an open source project. Anyone can use it to build and own a personal blog on any papers of your interests. Hence, checkout the project repository below if you are interested in! : https://github.com/deep-diver/paper-reviewer This project is going to support other models including open weights soon for both text-based content generation and voice synthesis for the podcast. The only reason I chose Gemini model is that it offers a "free-tier" which is enough to shape up this projects with non-realtime batch generations. I'm excited to see how others will use this tool to explore the world of AI research, hence feel free to share your feedback and suggestions!

reacted to chansung's post with 👍 about 2 months ago

View all activity

Organizations

None yet

lab212's activity

replied to chansung's post about 2 months ago

Thanks pardner.

reacted to chansung's post with 👍 about 2 months ago

Post

1868

🎙️ Listen to the audio "Podcast" of every single Hugging Face Daily Papers.

Now, "AI Paper Reviewer" project can automatically generates audio podcasts on any papers published on arXiv, and this is integrated into the GitHub Action pipeline. I sounds pretty similar to hashtag#NotebookLM in my opinion.

🎙️ Try out yourself at https://deep-diver.github.io/ai-paper-reviewer/

This audio podcast is powered by Google technologies: 1) Google DeepMind Gemini 1.5 Flash model to generate scripts of a podcast, then 2) Google Cloud Vertex AI's Text to Speech model to synthesize the voice turning the scripts into the natural sounding voices (with latest addition of "Journey" voice style)

"AI Paper Reviewer" is also an open source project. Anyone can use it to build and own a personal blog on any papers of your interests. Hence, checkout the project repository below if you are interested in!
: https://github.com/deep-diver/paper-reviewer

This project is going to support other models including open weights soon for both text-based content generation and voice synthesis for the podcast. The only reason I chose Gemini model is that it offers a "free-tier" which is enough to shape up this projects with non-realtime batch generations. I'm excited to see how others will use this tool to explore the world of AI research, hence feel free to share your feedback and suggestions!

3 replies

liked a Space 3 months ago

Running on CPU Upgrade

9.08k

👩‍🎨

AI Comic Factory

Create your own AI comic with a single prompt

reacted to nicolay-r's post with 🧠 3 months ago

Post

1008

📢 Two weeks ago I got a chance to share the most recent reasoning 🧠 capabilities of Large Language models in Sentiment Analysis NLPSummit-2024.

For those who missed and still wish to find out the advances of GenAI in that field, the recording is now available:
https://www.youtube.com/watch?v=qawLJsRHzB4

You will be aware of:
☑️ how well LLMs reasoning can be used for reasoning in sentiment analysis as in Zero-shot-Learning,
☑️ how to improve reasoning by applying and leaving step-by-step chains (Chain-of-Thought)
☑️ how to prepare the most advanced model in sentiment analysis using Chain-of-Thought.

Links:
📜 Paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
⭐ Code: https://github.com/nicolay-r/Reasoning-for-Sentiment-Analysis-Framework

reacted to reach-vb's post with 👍 3 months ago

Post

3128

NEW: Open Source Text/ Image to video model is out - MIT licensed - Rivals Gen-3, Pika & Kling 🔥

> Pyramid Flow: Training-efficient Autoregressive Video Generation method
> Utilizes Flow Matching
> Trains on open-source datasets
> Generates high-quality 10-second videos
> Video resolution: 768p
> Frame rate: 24 FPS
> Supports image-to-video generation

> Model checkpoints available on the hub 🤗: rain1011/pyramid-flow-sd3

liked a Space 3 months ago

Running on Zero

859

⚡

Screenshot to HTML

liked a Space 8 months ago

Running on Zero

541

⚡

Instant Video

Fast Text 2 Video Generator

reacted to KingNish's post with ❤️ 8 months ago

Post

5116

Introducing OpenGPT-4o
KingNish/OpenGPT-4o

Features:
1️⃣ Inputs possible are Text ✏️, Text + Image 📝🖼️, Audio 🎧, WebCam📸
and outputs possible are Image 🖼️, Image + Text 🖼️📝, Text 📝, Audio 🎧
2️⃣ Flat 100% FREE 💸 and Super-fast ⚡.
3️⃣ Publicly Available before GPT 4o.

Future Features:
1️⃣ Chat with PDF (Both voice and text)
2️⃣ Video generation.
3️⃣ Sequential Image Generation.
4️⃣ Better UI and customization.

Note: It's not possible to reach level of complexity of GPT 4o because OpenAI has been developing GPT-4o from six months with a team of over 450+ experienced members, Whereas I am only One. Moreover, they haven't released it fully publicly, So, it remains a test model.

31 replies