Gemini Pro 1.5 has a 1,000,000 token context size. This is huge—previously that record was held by Claude 2.1 (200,000 tokens) and gpt-4-turbo (128,000 tokens)—though the difference in tokenizer implementations between the models means this isn’t a perfectly direct comparison.
I’ve been playing with Gemini Pro 1.5 for a few days, and I think the most exciting feature isn’t so much the token count… it’s the ability to use video as an input.
The ability to extract structured content from text is already one of the most exciting use-cases for LLMs. GPT-4 Vision and LLaVA expanded that to images. And now Gemini Pro 1.5 expands that to video.
The ability to analyze video like this feels SO powerful. Being able to take a 20 second video of a bookshelf and get back a JSON array of those books is just the first thing I thought to try.
Simon Willison: The killer app of Gemini Pro 1.5 is video
…there is a new version of Google’s Gemini (right after the release of the previous one!), which has a context window of over a million tokens. The context window is the information that the AI can have in memory at one time, and most chatbots have been frustratingly limited, holding a couple dozen pages, at most. This is why it is very hard to use ChatGPT to write long programs or documents; it starts to forget the start of the project as its context window fills up. But now Gemini 1.5 can hold something like 750,000 words in memory, with near-perfect recall. I fed it all my published academic work prior to 2022 — over 1,000 pages of PDFs spread across 20 papers and books — and Gemini was able to summarize the themes in my work and quote accurately from among the papers.
Ethan Mollick: Strategies for an Accelerating Future