GPT-4 Vision Experiments. LLama Rephraser 13B beats GPT-4. Lyria SOTA Music GenAI. GPU AI Survival Kit. SQLCoder. ML Optimisation. AI Alignment Handbook.
GPT-4 Vision Experiments. Just reading in the news how the OpenAI Shakespearean drama is unfolding… Oh well.. I’ve been doing a bit of research on GPT-4 Vision capabilities recently. I’m particularly interested in the business applications of GPT-4V. Yes, its early days, and many use cases may seem simple or basic, but the possibilities are impressive and endless. Lots of potential! Let me share a few notes on GPT-4V.
Reviewing GPT-4 Vision capabilities. This is a great guide in which James & Piotr @Roboflow share their first impressions on GPT-4 image input feature and vision API. They run through a series of experiments to test the functionality of GPT-4V, showing where the model performs well and where it struggles. First impressions on GPT-4V(ision.)
Building interactive web apps with GPT-4 Vision. Nice article, in which Charly walks you through 8 practical use cases that exemplify new possibilities using GPT-4 with Vision. The article shows how to use GPT-4V to: build Streamlit apps from sketches and static images, refine your apps' user experience (including debugging and documentation,) and overcome LLMs limitations and hallucinations. Blogpost: 7 ways GPT-4 with Vision can uplevel your Streamlit apps.
Screenshot-to-code with GPT-4 Vision. Another use case with lots of potential. The idea is to show an image capture to GPT-4V and then generate code from that. Here is a simple app that converts a screenshot to HTML/Tailwind CSS code. It uses GPT-4 Vision to generate the code, and DALL-E 3 to generate similar looking images. Checkout repo, demos, examples here: screenshot-to-code
Understand, review images with GPT-4V and provide commentary. This can be applied to many use cases. Here is an example: An art co-pilot. It uses OpenAI Vision API to create a personal art critic that provides commentary narrated by “David Attenborough.” Checkout repo, demo, examples: ruskin. An Art Copilot.
Live webcam apps with GPT-4. I’ve seen many of these recently. Here is an example: WebcamGPT-Vision, a lightweight web app that enables users to process images from their webcam using OpenAI's GPT-4 Vision API. The app captures images from the user's webcam, sends them to the GPT-4 Vision API, and displays the descriptive results. See demo, repo: WebcamGPT-Vision.
And this one is very funny: a live Jim Carrey webcam narrator
GPT-4 for insurance adjustment. I really like this use case, application. This is a fantastic complete tutorial on how to use GPT-4 for insurance adjustments. After e.g. a fire, insurance adjusters have to process the damage and evaluate claims. The tutorial walks through how to build an image analysis pipeline and AI copilot for insurance that provides insights from images with GPT-4 Vision model. Blogpost: Multimodal RAG: Using Graphlit, OpenAI GPT-4 Vision for Insurance Adjustment.
Exploring GPT-4 Vision for self-driving. This is pretty interesting. A big team of Chinese researchers, recently published a paper in which they explore GPT-4V capabilities. The researchers share the original test images and in-depth results demonstrating the model's capabilities in understanding complex driving scenes and making decisions like a seasoned driver. Checkout paper, examples and repo: On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving.
OpenAI Vision API Experiments. A must-have resource for anyone who wants to experiment with and build on the OpenAI vision API. Repo: awesome openai vision api experiments.
Free live, deep dive into GPT-4 Vision. This seems like an interesting virtual event which promises a live, hands-on, coding sharing session. How to Chat with Images Data Using New GPT-4 Vision API.
Have a nice week.
10 Link-o-Troned
Catch Me If You Can! How to Beat GPT-4 with a 13B Model
My Failed AI Girlfriend Product, and My Lessons
GPU AI Survival Toolkit: The Bare Minimum for Developers
Building an AI Video Editor Prototype in 100 Days
How to Train Neural Networks Effectively
DeepMind + Youtube - Lyria SOTA GenAI Music
[free course] Algorithmic Aspects of ML
Are Your NLP models Deteriorating Post-deployment?
Hugging Face AI Alignment Handbook
the ML Pythonista
Pandas2 and Polars for Feature Engineering
SQLCoder - SOTA Natural Language to SQL
Meta AI - How to Build a Llama-enabled WhatsApp Chatbot
Deep & Other Learning Bits
Trends in Deep Learning Hardware: Bill Dally @NVIDIA
[free course] MIT HAN LAB - TinyML & Efficient Deep Learning
60 Implementations of DL Papers with Side-by-side Notes
AI/ DL ResearchDocs
A Survey on Language Models for Code
Chain-of-Note: Enhancing Robustness in Retrieval-Augmentation
Deepmind: Transformers Can’t Generalise Beyond Their Training Data
data v-i-s-i-o-n-s
The Multi-Chord Diagram: Visualizing Complex Set Relationships
Twenty Ways to visualise Percentages
Interactive Visualisation of Shortest Path in Major Cities
MLOps Untangled
Scaling and Evolving MLOps @Doordash
LoRAX Opensource: Serve 100s of Fine-Tuned LLMs in Prod
AI startups -> radar
Zelig - AI for Fashion Virtual Try-ons
Retorio- An AI Coach for Behavioural Intelligence
ML Datasets & Stuff
Universal Named Entity Recognition (UNER) Dataset
Song Describer Dataset - 1.1K Captions for Music Generation
FinanceBench - Financial Question Answering (QA) Pairs
Postscript, etc
Tips? Suggestions? Feedback? email Carlos
Curated by @ds_ldn in the middle of the night.