AI & ML interests

Generative AI - AI Architectures - GPT, Llama, Flan, Flamingo, BLOOM, Meta, AR/VR/XR. Open Source and Teaching/Learning: https://github.com/AaronCWacker/Yggdrasil

Recent Activity

Welcome - This classroom organization holds examples and links for this session. Begin by adding a bookmark.

Chat and Clinical

πŸ₯«Open Datasets for Health CareπŸ“Š

  1. Datasets for open source or creative commons zero datasets and also links with PDF's for public clinical use:

Examples and Exercises - Create These Spaces in Your Account and Test / Modify

Easy Examples

  1. FastSpeech - https://huggingface.co/spaces/AIZero2HeroBootcamp/FastSpeech2LinerGradioApp
  2. Memory - https://huggingface.co/spaces/AIZero2HeroBootcamp/Memory
  3. StaticHTML5PlayCanvas - https://huggingface.co/spaces/AIZero2HeroBootcamp/StaticHTML5Playcanvas
  4. 3DHuman - https://huggingface.co/spaces/AIZero2HeroBootcamp/3DHuman
  5. TranscriptAILearnerFromYoutube - https://huggingface.co/spaces/AIZero2HeroBootcamp/TranscriptAILearnerFromYoutube
  6. AnimatedGifGallery - https://huggingface.co/spaces/AIZero2HeroBootcamp/AnimatedGifGallery
  7. VideoToAnimatedGif - https://huggingface.co/spaces/AIZero2HeroBootcamp/VideoToAnimatedGif

Hard Examples:

  1. ChatGPTandLangChain - https://huggingface.co/spaces/AIZero2HeroBootcamp/ChatGPTandLangchain a. Keys: https://platform.openai.com/account/api-keys
  2. MultiPDFQAChatGPTLangchain - https://huggingface.co/spaces/AIZero2HeroBootcamp/MultiPDF-QA-ChatGPT-Langchain

πŸ‘‹ Two easy ways to turbo boost your AI learning journey - Lets go 100X! πŸ’»

🌐 AI Pair Programming with GPT

Open 2 Browsers to:

  1. 🌐 ChatGPT URL or URL2 and
  2. 🌐 Huggingface URL in separate browser windows.
  3. πŸ€– Use prompts to generate a streamlit program on Huggingface or locally to test it.
  4. πŸ”§ For advanced work, add Python 3.10 and VSCode locally, and debug as gradio or streamlit apps.
  5. πŸš€ Use these two superpower processes to reduce the time it takes you to make a new AI program! ⏱️

πŸŽ₯ YouTube University Method:

  1. πŸ‹οΈβ€β™€οΈ Plan two hours each weekday to exercise your body and brain.
  2. 🎬 Make a playlist of videos you want to learn from on YouTube. Save the links to edit later.
  3. πŸš€ Try watching the videos at a faster speed while exercising, and sample the first five minutes of each video.
  4. πŸ“œ Reorder the playlist so the most useful videos are at the front, and take breaks to exercise.
  5. πŸ“ Practice note-taking in markdown to instantly save what you want to remember. Share your notes with others!
  6. πŸ‘₯ AI Pair Programming Using Long Answer Language Models with Human Feedback

πŸŽ₯ 2023 AI/ML Learning Playlists for ChatGPT, LLMs, Recent Events in AI:

  1. AI News: https://www.youtube.com/playlist?list=PLHgX2IExbFotMOKWOErYeyHSiikf6RTeX
  2. ChatGPT Code Interpreter: https://www.youtube.com/playlist?list=PLHgX2IExbFou1pOQMayB7PArCalMWLfU-
  3. Ilya Sutskever and Sam Altman: https://www.youtube.com/playlist?list=PLHgX2IExbFovr66KW6Mqa456qyY-Vmvw-
  4. Andrew Huberman on Neuroscience and Health: https://www.youtube.com/playlist?list=PLHgX2IExbFotRU0jl_a0e0mdlYU-NWy1r
  5. Andrej Karpathy: https://www.youtube.com/playlist?list=PLHgX2IExbFovbOFCgLNw1hRutQQKrfYNP
  6. Medical Futurist on GPT: https://www.youtube.com/playlist?list=PLHgX2IExbFosVaCMZCZ36bYqKBYqFKHB2
  7. ML APIs: https://www.youtube.com/playlist?list=PLHgX2IExbFovPX9z4m61rQImM7cDDY79L
  8. FastAPI and Streamlit: https://www.youtube.com/playlist?list=PLHgX2IExbFosyX2jzJJimPAI9C0FHflwB
  9. AI UI UX: https://www.youtube.com/playlist?list=PLHgX2IExbFosCUPzEp4bQaygzrzXPz81w
  10. ChatGPT Streamlit 2023: https://www.youtube.com/playlist?list=PLHgX2IExbFotDzxBRWwUBTb0_XFEr4Dlg

LLM Base Model Overview and Evolutionary Tree: https://github.com/Mooler0410/LLMsPracticalGuide

πŸŽ₯ 2023 AI/ML Advanced Learning Playlists:

  1. 2023 QA Models and Long Form Question Answering NLP
  2. FHIR Bioinformatics Development Using AI/ML and Python, Streamlit, and Gradio - 2022
  3. 2023 ChatGPT for Coding Assistant Streamlit, Gradio and Python Apps
  4. 2023 BigScience Bloom - Large Language Model for AI Systems and NLP
  5. 2023 Streamlit Pro Tips for AI UI UX for Data Science, Engineering, and Mathematics
  6. 2023 Fun, New and Interesting AI, Videos, and AI/ML Techniques
  7. 2023 Best Minds in AGI AI Gamification and Large Language Models
  8. 2023 State of the Art for Vision Image Classification, Text Classification and Regression, Extractive Question Answering and Tabular Classification
  9. 2023 AutoML DataRobot and AI Platforms for Building Models, Features, Test, and Transparency

πŸ₯«Open Datasets for Health CareπŸ“Š

Azure Development Architectures in 2023:

  1. ChatGPT: https://azure.github.io/awesome-azd/?tags=chatgpt
  2. Azure OpenAI Services: https://azure.github.io/awesome-azd/?tags=openai
  3. Python: https://azure.github.io/awesome-azd/?tags=python
  4. AI LLM Architecture - Guidance by MS: https://github.com/microsoft/guidance

Dockerfile and Azure ACR->ACA Easy Robust Deploys from VSCode:

  1. Set up VSCode with Azure and Remote extensions and install Azure CLI locally
  2. Get access to azure subscriptions. Form there in VSCode, expand to Container Apps
  3. In Container Apps create new and pick Dockerfile to deploy to a ACR then ACA spin up using Azure to build.

Dockerfile for Streamlit and Dockerfile for FastAPI:

Show two examples.

Example Starter Prompts for AIPP:

Write a streamlit program that demonstrates Data synthesis. Synthesize data from multiple sources to create new datasets. Use two datasets and demonstrate pandas dataframe query merge and join with two datasets in python list dictionaries: List of Hospitals that are over 1000 bed count by city and state, and State population size and square miles. Perform a calculated function on the merged dataset.

Comparison of Large Language Models

Model Name Model Size (in Parameters)
BigScience-tr11-176B 176 billion
GPT-3 175 billion
OpenAI's DALL-E 2.0 500 million
NVIDIA's Megatron 8.3 billion
Transformer-XL 250 million
XLNet 210 million

ChatGPT Datasets πŸ“š

  • WebText
  • Common Crawl
  • BooksCorpus
  • English Wikipedia
  • Toronto Books Corpus
  • OpenWebText

ChatGPT Datasets - Details πŸ“š

Big Science Model πŸš€

Datasets:

    • Universal Dependencies: A collection of annotated corpora for natural language processing in a range of languages, with a focus on dependency parsing.
    • WMT 2014: The fourth edition of the Workshop on Statistical Machine Translation, featuring shared tasks on translating between English and various other languages.
    • The Pile: An English language corpus of diverse text, sourced from various places on the internet.
    • HumanEval: A dataset of English sentences, annotated with human judgments on a range of linguistic qualities.
    • FLORES-101: A dataset of parallel sentences in 101 languages, designed for multilingual machine translation.
    • CrowS-Pairs: A dataset of sentence pairs, designed for evaluating the plausibility of generated text.
    • WikiLingua: A dataset of parallel sentences in 75 languages, sourced from Wikipedia.
    • MTEB: A dataset of English sentences, annotated with their entailment relationships with respect to other sentences.
    • xP3: A dataset of English sentences, annotated with their paraphrase relationships with respect to other sentences.
    • DiaBLa: A dataset of English dialogue, annotated with dialogue acts.

Deep RL ML Strategy 🧠

The AI strategies are:

  • Language Model Preparation using Human Augmented with Supervised Fine Tuning πŸ€–
  • Reward Model Training with Prompts Dataset Multi-Model Generate Data to Rank 🎁
  • Fine Tuning with Reinforcement Reward and Distance Distribution Regret Score 🎯
  • Proximal Policy Optimization Fine Tuning 🀝
  • Variations - Preference Model Pretraining πŸ€”
  • Use Ranking Datasets Sentiment - Thumbs Up/Down, Distribution πŸ“Š
  • Online Version Getting Feedback πŸ’¬
  • OpenAI - InstructGPT - Humans generate LM Training Text πŸ”
  • DeepMind - Advantage Actor Critic Sparrow, GopherCite 🦜
  • Reward Model Human Prefence Feedback πŸ†

For more information on specific techniques and implementations, check out the following resources:

  • OpenAI's paper on GPT-3 which details their Language Model Preparation approach
  • DeepMind's paper on SAC which describes the Advantage Actor Critic algorithm
  • OpenAI's paper on Reward Learning which explains their approach to training Reward Models
  • OpenAI's blog post on GPT-3's fine-tuning process

models

None public yet

datasets

None public yet