Gemini (language model) facts for kids
![]() |
|
Developer(s) | Google DeepMind Google AI |
---|---|
Initial release | December 6, 2023 |
Predecessor | Google Assistant PaLM |
Available in | English |
Type | Large language model |
License | Proprietary |
Google Gemini is a group of very smart computer programs called large language models. These models were created by Google DeepMind. They are special because they are multimodal, meaning they can understand and work with many types of information. This includes text, pictures, sounds, and even videos.
Gemini is the next step after Google's earlier AI models, LaMDA and PaLM 2. It was first announced on December 6, 2023. Gemini is designed to compete with other advanced AI models, like GPT-4 from OpenAI. It also powers a chatbot that has the same name, Gemini.
Contents
How Gemini Was Created
Building the AI
Google first talked about Gemini at an event called Google I/O on May 10, 2023. It was introduced as a much stronger AI than Google's previous model, PaLM 2. Sundar Pichai, the CEO of Google, said that Gemini was still being developed.
What makes Gemini special is that it wasn't just trained on text. It was built to be multimodal. This means it can understand many different kinds of data at the same time. It can process text, images, audio, video, and even computer code.
Gemini was created by a team from Google DeepMind and Google Brain. These two parts of Google had joined together just a month before. Demis Hassabis, the CEO of DeepMind, said that Gemini would be very powerful. He believed it could be even better than OpenAI's ChatGPT. ChatGPT uses an AI model called GPT-4. Hassabis mentioned DeepMind's AlphaGo program, which famously beat a Go champion in 2016. He said Gemini would combine the strengths of AlphaGo with Google's other AI models.
In August 2023, a report said that Google planned to launch Gemini by the end of the year. Google wanted Gemini to be better than other AI models. They hoped to do this by combining chat abilities with AI-powered image creation. This would allow Gemini to create images that fit the conversation. It could also be used for many different things. Sergey Brin, one of Google's founders, even came out of retirement to help develop Gemini. Hundreds of other engineers also worked on it.
Because Gemini was trained using videos from YouTube, lawyers checked the content. They made sure no copyrighted material was used without permission. Other companies, like OpenAI, also started working faster on their own AI models. They wanted to add similar features to Gemini.
Gemini's Official Release
On December 6, 2023, Sundar Pichai and Demis Hassabis announced "Gemini 1.0." They held a press conference online. Gemini 1.0 came in three different versions:
- Gemini Ultra: This version is for very difficult tasks.
- Gemini Pro: This version is for many different kinds of tasks.
- Gemini Nano: This version is for tasks done directly on devices, like smartphones.
When it first launched, Gemini Pro was added to Google's Bard chatbot. Gemini Nano was put into the Pixel 8 Pro smartphone. Gemini Ultra was planned for a more advanced version of Bard and for software developers in early 2024. Google also plans to add Gemini to other products. These include Google Search, Google Ads, Google Chrome, and Duet AI in Google Workspace.
At first, Gemini was only available in English. Google said Gemini was their "largest and most capable AI model." They designed it to act like a human. However, they said it would not be widely available until the next year. This was because it needed a lot of safety testing. Gemini was trained using Google's special computer chips called Tensor Processing Units (TPUs). The name "Gemini" refers to the joining of DeepMind and Google Brain. It also refers to NASA's Project Gemini.
Gemini Ultra was said to perform better than other top AI models. These included GPT-4, Claude 2, Inflection-2, LLaMA 2, and Grok 1. Gemini Pro was also said to be better than GPT-3.5. Gemini Ultra was the first AI model to score higher than human experts on a difficult test called MMLU. It got a score of 90% on this test, which covers 57 different subjects.
Gemini Pro became available to Google Cloud customers on December 13. Gemini Nano will also be available for Android developers. Demis Hassabis said that DeepMind is looking into how Gemini could work with robots. This would allow robots to interact with the real world. Google also said they would share their testing results for Gemini Ultra with the U.S. government. They are also talking with the UK government to follow AI safety rules.
Recent Updates
In January 2024, Google worked with Samsung. They added Gemini Nano and Gemini Pro to Samsung's Galaxy S24 smartphones. The next month, Google combined Bard and Duet AI under the Gemini brand. A new, more advanced version called "Gemini Advanced with Ultra 1.0" was released. This came with a new "AI Premium" plan for Google One. Gemini Pro also became available around the world.
In February, Google launched "Gemini 1.5" for a small group of users. This version is even more powerful than 1.0 Ultra. It has new technology, including a special way of working called "mixture-of-experts." It also has a much larger "context window." This means it can understand a lot more information at once. For example, it can process about an hour of silent video, 11 hours of audio, 30,000 lines of code, or 700,000 words.
In the same month, Google also released Gemma. Gemma is a group of free and open-source AI models. They are lighter versions of Gemini. They come in two sizes, with different numbers of "parameters" (parts of the neural network). Many people saw this as Google's way of responding to other companies that were making their AI models open for everyone to use. This was a big change for Google, as they usually kept their AI technology private.
At the Google I/O 2024 event, Gemini 1.5 Flash was released.
How Gemini Works
The first version of Gemini, called "Gemini 1," has three models. They all share a similar computer design. They are built using a special type of AI structure called a Transformer. This design helps them learn and work efficiently on Google's TPUs. They can remember and understand a lot of information at once, up to 32,768 "tokens." A token can be a word or part of a word.
Two versions of Gemini Nano, Nano-1 and Nano-2, are smaller versions of the larger Gemini models. They are designed to work on smaller devices like smartphones. Since Gemini is multimodal, it can take in many different types of information at the same time. This information can be mixed together in any order. For example, you could start a conversation with text, then add a picture, then a video, and then some audio. Gemini can then respond using any of these types of information too.
Images can be of different sizes. Videos are processed as a series of images. Audio is turned into tokens by a special model called the Universal Speech Model. Gemini learned from a huge amount of data that included text, books, code, images, audio, and video. This data was also in many different languages.
Demis Hassabis said that training Gemini 1 used about the same amount of computer power as GPT-4.
The second version of Gemini, "Gemini 1.5," has two models released so far:
- Gemini 1.5 Pro: This is a multimodal model that uses a "sparse mixture-of-experts" approach. It can understand millions of tokens of information at once.
- Gemini 1.5 Flash: This model is a lighter version of Gemini 1.5 Pro. It can also understand over 2 million tokens of information.
See also
- Gato, another smart AI model developed by DeepMind