Data Machina #228

one year ago 187

GPT-4 Vision Experiments. LLama Rephraser 13B beats GPT-4. Lyria SOTA Music GenAI. GPU AI Survival Kit. SQLCoder. ML Optimisation. AI Alignment Handbook.

GPT-4 Vision Experiments. Just reading in the news how the OpenAI Shakespearean drama is unfolding� Oh well.. I�ve been doing a bit of research on GPT-4 Vision capabilities recently. I�m particularly interested in the business applications of GPT-4V. Yes, its early days, and many use cases may seem simple or basic, but the possibilities are impressive and endless. Lots of potential! Let me share a few notes on GPT-4V.

Reviewing GPT-4 Vision capabilities. This is a great guide in which James & Piotr @Roboflow share their first impressions on GPT-4 image input feature and vision API. They run through a series of experiments to test the functionality of GPT-4V, showing where the model performs well and where it struggles. First impressions on GPT-4V(ision.)

Building interactive web apps with GPT-4 Vision. Nice article, in which Charly walks you through 8 practical use cases that exemplify new possibilities using GPT-4 with Vision. The article shows how to use GPT-4V to: build Streamlit apps from sketches and static images, refine your apps' user experience (including debugging and documentation,) and overcome LLMs limitations and hallucinations. Blogpost: 7 ways GPT-4 with Vision can uplevel your Streamlit apps.

Screenshot-to-code with GPT-4 Vision. Another use case with lots of potential. The idea is to show an image capture to GPT-4V and then generate code from that. Here is a simple app that converts a screenshot to HTML/Tailwind CSS code. It uses GPT-4 Vision to generate the code, and DALL-E 3 to generate similar looking images. Checkout repo, demos, examples here: screenshot-to-code

Understand, review images with GPT-4V and provide commentary. This can be applied to many use cases. Here is an example: An art co-pilot. It uses OpenAI Vision API to create a personal art critic that provides commentary narrated by �David Attenborough.� Checkout repo, demo, examples: ruskin. An Art Copilot.

Live webcam apps with GPT-4. I�ve seen many of these recently. Here is an example: WebcamGPT-Vision, a lightweight web app that enables users to process images from their webcam using OpenAI's GPT-4 Vision API. The app captures images from the user's webcam, sends them to the GPT-4 Vision API, and displays the descriptive results. See demo, repo: WebcamGPT-Vision.

And this one is very funny: a live Jim Carrey webcam narrator

GPT-4 for insurance adjustment. I really like this use case, application. This is a fantastic complete tutorial on how to use GPT-4 for insurance adjustments. After e.g. a fire, insurance adjusters have to process the damage and evaluate claims. The tutorial walks through how to build an image analysis pipeline and AI copilot for insurance that provides insights from images with GPT-4 Vision model. Blogpost: Multimodal RAG: Using Graphlit, OpenAI GPT-4 Vision for Insurance Adjustment.

Exploring GPT-4 Vision for self-driving. This is pretty interesting. A big team of Chinese researchers, recently published a paper in which they explore GPT-4V capabilities. The researchers share the original test images and in-depth results demonstrating the model's capabilities in understanding complex driving scenes and making decisions like a seasoned driver. Checkout paper, examples and repo: On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving.

OpenAI Vision API Experiments. A must-have resource for anyone who wants to experiment with and build on the OpenAI vision API. Repo: awesome openai vision api experiments.

Free live, deep dive into GPT-4 Vision. This seems like an interesting virtual event which promises a live, hands-on, coding sharing session. How to Chat with Images Data Using New GPT-4 Vision API.

Have a nice week.

Subscribe now