User Tools

Site Tools


news:ai:the_killer_app_of_gemini_pro_1.5_is_video

The killer app of Gemini Pro 1.5 is video

Gemini Pro 1.5 has a 1,000,000 token context size. This is huge—previously that record was held by Claude 2.1 (200,000 tokens) and gpt-4-turbo (128,000 tokens)—though the difference in tokenizer implementations between the models means this isn’t a perfectly direct comparison.
I’ve been playing with Gemini Pro 1.5 for a few days, and I think the most exciting feature isn’t so much the token count… it’s the ability to use video as an input.
The ability to extract structured content from text is already one of the most exciting use-cases for LLMs. GPT-4 Vision and LLaVA expanded that to images. And now Gemini Pro 1.5 expands that to video.
The ability to analyze video like this feels SO powerful. Being able to take a 20 second video of a bookshelf and get back a JSON array of those books is just the first thing I thought to try.

Simon Willison: The killer app of Gemini Pro 1.5 is video

news/ai/the_killer_app_of_gemini_pro_1.5_is_video.txt · Last modified: 2024/02/21 22:15 by lmuszkie