Thanks pardner.
lab
lab212
AI & ML interests
None yet
Recent Activity
View all activity
Organizations
None yet
lab212's activity
reacted to
chansung's
post with ๐
about 2 months ago
Post
1868
๐๏ธ Listen to the audio "Podcast" of every single Hugging Face Daily Papers.
Now, "AI Paper Reviewer" project can automatically generates audio podcasts on any papers published on arXiv, and this is integrated into the GitHub Action pipeline. I sounds pretty similar to hashtag#NotebookLM in my opinion.
๐๏ธ Try out yourself at https://deep-diver.github.io/ai-paper-reviewer/
This audio podcast is powered by Google technologies: 1) Google DeepMind Gemini 1.5 Flash model to generate scripts of a podcast, then 2) Google Cloud Vertex AI's Text to Speech model to synthesize the voice turning the scripts into the natural sounding voices (with latest addition of "Journey" voice style)
"AI Paper Reviewer" is also an open source project. Anyone can use it to build and own a personal blog on any papers of your interests. Hence, checkout the project repository below if you are interested in!
: https://github.com/deep-diver/paper-reviewer
This project is going to support other models including open weights soon for both text-based content generation and voice synthesis for the podcast. The only reason I chose Gemini model is that it offers a "free-tier" which is enough to shape up this projects with non-realtime batch generations. I'm excited to see how others will use this tool to explore the world of AI research, hence feel free to share your feedback and suggestions!
Now, "AI Paper Reviewer" project can automatically generates audio podcasts on any papers published on arXiv, and this is integrated into the GitHub Action pipeline. I sounds pretty similar to hashtag#NotebookLM in my opinion.
๐๏ธ Try out yourself at https://deep-diver.github.io/ai-paper-reviewer/
This audio podcast is powered by Google technologies: 1) Google DeepMind Gemini 1.5 Flash model to generate scripts of a podcast, then 2) Google Cloud Vertex AI's Text to Speech model to synthesize the voice turning the scripts into the natural sounding voices (with latest addition of "Journey" voice style)
"AI Paper Reviewer" is also an open source project. Anyone can use it to build and own a personal blog on any papers of your interests. Hence, checkout the project repository below if you are interested in!
: https://github.com/deep-diver/paper-reviewer
This project is going to support other models including open weights soon for both text-based content generation and voice synthesis for the podcast. The only reason I chose Gemini model is that it offers a "free-tier" which is enough to shape up this projects with non-realtime batch generations. I'm excited to see how others will use this tool to explore the world of AI research, hence feel free to share your feedback and suggestions!
reacted to
nicolay-r's
post with ๐ง
3 months ago
Post
1008
๐ข Two weeks ago I got a chance to share the most recent reasoning ๐ง capabilities of Large Language models in Sentiment Analysis NLPSummit-2024.
For those who missed and still wish to find out the advances of GenAI in that field, the recording is now available:
https://www.youtube.com/watch?v=qawLJsRHzB4
You will be aware of:
โ๏ธ how well LLMs reasoning can be used for reasoning in sentiment analysis as in Zero-shot-Learning,
โ๏ธ how to improve reasoning by applying and leaving step-by-step chains (Chain-of-Thought)
โ๏ธ how to prepare the most advanced model in sentiment analysis using Chain-of-Thought.
Links:
๐ Paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
โญ Code: https://github.com/nicolay-r/Reasoning-for-Sentiment-Analysis-Framework
For those who missed and still wish to find out the advances of GenAI in that field, the recording is now available:
https://www.youtube.com/watch?v=qawLJsRHzB4
You will be aware of:
โ๏ธ how well LLMs reasoning can be used for reasoning in sentiment analysis as in Zero-shot-Learning,
โ๏ธ how to improve reasoning by applying and leaving step-by-step chains (Chain-of-Thought)
โ๏ธ how to prepare the most advanced model in sentiment analysis using Chain-of-Thought.
Links:
๐ Paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
โญ Code: https://github.com/nicolay-r/Reasoning-for-Sentiment-Analysis-Framework
reacted to
reach-vb's
post with ๐
3 months ago
Post
3128
NEW: Open Source Text/ Image to video model is out - MIT licensed - Rivals Gen-3, Pika & Kling ๐ฅ
> Pyramid Flow: Training-efficient Autoregressive Video Generation method
> Utilizes Flow Matching
> Trains on open-source datasets
> Generates high-quality 10-second videos
> Video resolution: 768p
> Frame rate: 24 FPS
> Supports image-to-video generation
> Model checkpoints available on the hub ๐ค: rain1011/pyramid-flow-sd3
> Pyramid Flow: Training-efficient Autoregressive Video Generation method
> Utilizes Flow Matching
> Trains on open-source datasets
> Generates high-quality 10-second videos
> Video resolution: 768p
> Frame rate: 24 FPS
> Supports image-to-video generation
> Model checkpoints available on the hub ๐ค: rain1011/pyramid-flow-sd3
reacted to
KingNish's
post with โค๏ธ
8 months ago
Post
5116
Introducing OpenGPT-4o
KingNish/OpenGPT-4o
Features:
1๏ธโฃ Inputs possible are Text โ๏ธ, Text + Image ๐๐ผ๏ธ, Audio ๐ง, WebCam๐ธ
and outputs possible are Image ๐ผ๏ธ, Image + Text ๐ผ๏ธ๐, Text ๐, Audio ๐ง
2๏ธโฃ Flat 100% FREE ๐ธ and Super-fast โก.
3๏ธโฃ Publicly Available before GPT 4o.
Future Features:
1๏ธโฃ Chat with PDF (Both voice and text)
2๏ธโฃ Video generation.
3๏ธโฃ Sequential Image Generation.
4๏ธโฃ Better UI and customization.
Note: It's not possible to reach level of complexity of GPT 4o because OpenAI has been developing GPT-4o from six months with a team of over 450+ experienced members, Whereas I am only One. Moreover, they haven't released it fully publicly, So, it remains a test model.
KingNish/OpenGPT-4o
Features:
1๏ธโฃ Inputs possible are Text โ๏ธ, Text + Image ๐๐ผ๏ธ, Audio ๐ง, WebCam๐ธ
and outputs possible are Image ๐ผ๏ธ, Image + Text ๐ผ๏ธ๐, Text ๐, Audio ๐ง
2๏ธโฃ Flat 100% FREE ๐ธ and Super-fast โก.
3๏ธโฃ Publicly Available before GPT 4o.
Future Features:
1๏ธโฃ Chat with PDF (Both voice and text)
2๏ธโฃ Video generation.
3๏ธโฃ Sequential Image Generation.
4๏ธโฃ Better UI and customization.
Note: It's not possible to reach level of complexity of GPT 4o because OpenAI has been developing GPT-4o from six months with a team of over 450+ experienced members, Whereas I am only One. Moreover, they haven't released it fully publicly, So, it remains a test model.