Data Machina #228

12 months ago 159

GPT-4 Vision Experiments. LLama Rephraser 13B beats GPT-4. Lyria SOTA Music GenAI. GPU AI Survival Kit. SQLCoder. ML Optimisation. AI Alignment Handbook.

GPT-4 Vision Experiments. Just reading in the news how the OpenAI Shakespearean drama is unfolding… Oh well.. I’ve been doing a bit of research on GPT-4 Vision capabilities recently. I’m particularly interested in the business applications of GPT-4V. Yes, its early days, and many use cases may seem simple or basic, but the possibilities are impressive and endless. Lots of potential! Let me share a few notes on GPT-4V.

Reviewing GPT-4 Vision capabilities. This is a great guide in which James & Piotr @Roboflow share their first impressions on GPT-4 image input feature and vision API. They run through a series of experiments to test the functionality of GPT-4V, showing where the model performs well and where it struggles. First impressions on GPT-4V(ision.)

Building interactive web apps with GPT-4 Vision. Nice article, in which Charly walks you through 8 practical use cases that exemplify new possibilities using GPT-4 with Vision. The article shows how to use GPT-4V to: build Streamlit apps from sketches and static images, refine your apps' user experience (including debugging and documentation,) and overcome LLMs limitations and hallucinations. Blogpost: 7 ways GPT-4 with Vision can uplevel your Streamlit apps.

Screenshot-to-code with GPT-4 Vision. Another use case with lots of potential. The idea is to show an image capture to GPT-4V and then generate code from that. Here is a simple app that converts a screenshot to HTML/Tailwind CSS code. It uses GPT-4 Vision to generate the code, and DALL-E 3 to generate similar looking images. Checkout repo, demos, examples here: screenshot-to-code

Understand, review images with GPT-4V and provide commentary. This can be applied to many use cases. Here is an example: An art co-pilot. It uses OpenAI Vision API to create a personal art critic that provides commentary narrated by “David Attenborough.” Checkout repo, demo, examples: ruskin. An Art Copilot.

Live webcam apps with GPT-4. I’ve seen many of these recently. Here is an example: WebcamGPT-Vision, a lightweight web app that enables users to process images from their webcam using OpenAI's GPT-4 Vision API. The app captures images from the user's webcam, sends them to the GPT-4 Vision API, and displays the descriptive results. See demo, repo: WebcamGPT-Vision.

And this one is very funny: a live Jim Carrey webcam narrator

GPT-4 for insurance adjustment. I really like this use case, application. This is a fantastic complete tutorial on how to use GPT-4 for insurance adjustments. After e.g. a fire, insurance adjusters have to process the damage and evaluate claims. The tutorial walks through how to build an image analysis pipeline and AI copilot for insurance that provides insights from images with GPT-4 Vision model. Blogpost: Multimodal RAG: Using Graphlit, OpenAI GPT-4 Vision for Insurance Adjustment.

Exploring GPT-4 Vision for self-driving. This is pretty interesting. A big team of Chinese researchers, recently published a paper in which they explore GPT-4V capabilities. The researchers share the original test images and in-depth results demonstrating the model's capabilities in understanding complex driving scenes and making decisions like a seasoned driver. Checkout paper, examples and repo: On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving.

OpenAI Vision API Experiments. A must-have resource for anyone who wants to experiment with and build on the OpenAI vision API. Repo: awesome openai vision api experiments.

Free live, deep dive into GPT-4 Vision. This seems like an interesting virtual event which promises a live, hands-on, coding sharing session. How to Chat with Images Data Using New GPT-4 Vision API.

Have a nice week.

Subscribe now

10 Link-o-Troned

Catch Me If You Can! How to Beat GPT-4 with a 13B Model

Four Kinds of Optimisation

My Failed AI Girlfriend Product, and My Lessons

GPU AI Survival Toolkit: The Bare Minimum for Developers

Building an AI Video Editor Prototype in 100 Days

How to Train Neural Networks Effectively

DeepMind + Youtube - Lyria SOTA GenAI Music

[free course] Algorithmic Aspects of ML

Are Your NLP models Deteriorating Post-deployment?

Hugging Face AI Alignment Handbook


Share Data Machina with your friends


the ML Pythonista

Pandas2 and Polars for Feature Engineering

SQLCoder - SOTA Natural Language to SQL

Meta AI - How to Build a Llama-enabled WhatsApp Chatbot

Deep & Other Learning Bits

Trends in Deep Learning Hardware: Bill Dally @NVIDIA

[free course] MIT HAN LAB - TinyML & Efficient Deep Learning

60 Implementations of DL Papers with Side-by-side Notes

AI/ DL ResearchDocs

A Survey on Language Models for Code

Chain-of-Note: Enhancing Robustness in Retrieval-Augmentation

Deepmind: Transformers Can’t Generalise Beyond Their Training Data

data v-i-s-i-o-n-s

The Multi-Chord Diagram: Visualizing Complex Set Relationships

Twenty Ways to visualise Percentages

Interactive Visualisation of Shortest Path in Major Cities

MLOps Untangled

The MLOps Workbook

Scaling and Evolving MLOps @Doordash

LoRAX Opensource: Serve 100s of Fine-Tuned LLMs in Prod

AI startups -> radar

Kyutai- Open Sience AI Lab

Zelig - AI for Fashion Virtual Try-ons

Retorio- An AI Coach for Behavioural Intelligence

ML Datasets & Stuff

Universal Named Entity Recognition (UNER) Dataset

Song Describer Dataset - 1.1K Captions for Music Generation

FinanceBench - Financial Question Answering (QA) Pairs

Postscript, etc

Enjoyed this post? Tell your friends about Data Machina. Thanks for reading.

Share

Tips? Suggestions? Feedback? email Carlos

Curated by @ds_ldn in the middle of the night.


View Entire Post

Read Entire Article