How to create a YouTube video chapters' timings generator using Gemini over Vertex AI

5 months ago 36

While watching a long YouTube video have you felt like it would have been great to have relevant chapters? Especially for long podcasts or talks, you want to jump to the part that is most important and relevant to you but you need to pull the progress bar here and there to get to the crucial part. In this blog post, you will create a YouTube video chapters timing generator using Google’s Gemini LLM over Vertex AI, let’s get going! YouTube video chapters a huge time-saver # It is no secret that YouTube video chapters is a huge time saver for your audience. Be it a long 2-hour podcast or a 40-minute talk video if the video has relevant chapters your audience can jump to the parts that are most meaningful for them. Creating chapter data manually is a tedious and time-consuming task. You will need to sit down with something to take notes be it pen and paper or your mobile phone’s notes app. Then play the video and move the progress bar here and there to note down the timings from the video to list them down as chapters. What if there was an automated and easier way to do this, why can’t an LLM like Gemini which now has a 2 million context window? In February, the 1M context window for Gemini 1.5 pro could process 1 hour of video so with a 2M context window it should be able to process 2 hours of content. Even Gemini 1.5 flash has a 1M context window. So, technically your YouTube video could be 2 hours long and Gemini 1.5 Pro with the 2M context window should be able to generate chapter timings for it. Prerequisites # To begin, you will need to have the following pre-requisites sorted: Have a working Google Cloud Account (with some credit, processing long videos requires some GCP credit) Have at least one video on your YouTube account that you want to generate chapter data for It will be wise to be aware of Vertex AI Pricing with the models on offer. Next, you will create a GCP project to build your YouTube video chapter timings generator. You need 10-20 minutes to create the basic version and have a proof of concept code for it as shown below. Vertex AI on your GCP Project # Below are the steps to create a new GCP project (if you have an existing one you can use that too). Go to your Google Cloud Console and Create a new project called yt-chapters or anything relevant as seen below: Make sure you have selected the project created in Step 1 if you have multiple projects. You can do this by clicking on the project name on the top left corner of your Google Cloud Console. Go to Vertex AI from your Google Cloud Console, the easiest way to do it would be to search for vertex on the search bar as seen below: Click on Vertex AI 5. On the Vertex AI page, then click "Agree & Continue" as seen below : After that on the Vertext AI page click on “Enable All Recommended APIS” as follows (it will take some time): After the APIs are enabled, click on Freeform found on the left menu Crafting the prompt # Now you can add a prompt in the Prompt text box that will generate the YouTube video chapter timings as follows: Please provide relevant chapter information to put on YouTube description for this video with timings for the start of the chapter and do not add any formatting. If you are not sure about any info, please do not make it up. Give the output with the timings first and the chapter name after that. You can edit the prompt to suit your needs. Then turn off the Markdown slider and keep your mouse cursor at the start of the prompt. You can only use your own YouTube videos (or you get the following error “Must enter a YouTube video that you own”), to link the YouTube Video click on Insert Media and select the YouTube video URL option as follows: After that, paste the URL of a video that you is on your channel and then click the Validate button, once the ownership is verified you can Insert the video as follows: Given your YouTube video is included in the prompt it will look like the below: Saving the prompt # At this point you can save your prompt for further use, to save it click on Save on the top right corner and give it a name like yt-timings: You can use the saved prompt from the Saved Prompts tab on the left menu later. Configuring the model and parameters # Let’s look into the configuration a bit, First select gemini-1.5-flash-001 as the model. if you want the chapter timings to be less random you can set the temperature to be 0.1 It will be good to play around with the Advanced config of Top-p to fine-tune your result. I used 0.5 and it worked well for me: You can learn more about the configuration like temperature, top-K, etc in this blog post about Product description generator. Generating and saving chapter information # I am not covering safety settings for this tutorial. Now you can click on the > button to see what chapter timings Gemini Flash generates for your video, keep in mind depending on the length of the video it will take time and also use up your Google Cloud Credit: It took a minute or more for me to get the response. Now you can copy the response and paste it on your YouTube video description in YouTube Studio, save it and your viewers can benefit from the “generated” chapter information This is the video I used for this tutorial and I did edit the chapter’s information a bit to make it better. You can delete the video from the prompt and link another YouTube video then generate the chapter information for that one too. Amazing! There you have a fully reusable YouTube video chapter timings generator using your own YouTube video. For all the experiments I did to generate video chapter timings using Gemini Flash over Vertext AI it cost me 35 cents. Google Cloud credits are provided for this project and blog post, thanks to Google for that, it is part of the #AISprint . As LLMs give out probabilistic output than deterministic, the output is a good starting point. As a human, you will need to crosscheck and edit the output to fit your needs. The same thing applies to the chapter information generated by Gemini. I used it for a podcast video and interestingly enough it can generate chapter information even when there are no visual queues in the video. So it can use the transcribed captions too for generating the chapter information. This was the podcast video I used to generate chapters. You can also click the Get Code and play around with it, you can look at an example of generating and running the code in this example with Gemini and Vertex AI. Conclusion # In this blog post, you created a YouTube video chapter timings generator using Gemini Flash over Vertext AI. If your videos are longer you can use Gemini Pro which has a 2 million context window and it is more expensive than the Flash version. You started by creating a prompt useful to generate video chapter timings and used it to generate chapters for one video. Take this as scratching the surface, as the LLM has the context of the video’s content you can also craft a prompt to generate a compelling and SEO-optimized video description. Use AI as a tool to simplify your day-to-day tasks. Keep exploring and learning!


View Entire Post

Read Entire Article