Google's Multimodal AI Model Gemini

A Step Closer to AGI with Comprehensive Holistic Coverage

ai generated image of a modern artificial brain

The AI scene has grown strong and vastly in the past year, with an increasing number of companies developing their own AI models after OpenAI’s ChatGPT prompted what is now called the AI boom.

Google, one of the big tech companies that have also released its own large language model, Bard, has taken a step further to rival companies thriving in the scene by developing its new, next-gen multimodal AI model, Gemini, incorporated into Bard.

google deepmind website homepagewelcome to the gemini era

Photo Courtesy of Google DeepMind

Multimodality for Complex Tasks

With a goal of providing the broadest range of information and solutions to highly complex tasks for Bard users, Google built Gemini with multimodality in mind. The model can understand text, images, video, audio, and coding language.

While other models, such as ChatGPT, have also started playing in the multimodality scene, most still rely on plugins and integrations to have fully multimodal capabilities. Gemini, on the other hand, was built with multimodal characteristics, allowing it to understand different types of information and seamlessly operate across them.

With these capabilities, Gemini is said to have the ability to handle simple, more generalized tasks, as well as intricate processes in fields that require high-level skills, such as math, physics, programming, and so on.

Created by Google and its parent company, Alphabet, Gemini continues to be improved by a team of researchers from Google’s DeepMind and Google Brain, which have specialized in AI development.

The team utilizes Google’s AI chips, the tech that powers OpenAI’s ChatGPT, and the techniques used to build AlphaGo, an AI-powered computer program also developed by DeepMind, which was designed to defeat professional human Go players. The program succeeded when it defeated one of the world’s greatest Go players, Lee Sedol, back in 2016.

look at me meme lee sedol alphago im the best go player now

Training for Diverse Needs and Prompting

Trained on knowledge bases across the internet as well as Google’s consumer products data, Gemini is said to not only provide a broad range of information and solutions to various problems but also have a greater ability to accurately understand each user’s needs and intentions. 

michael the office meme i... understand... everything gemini ai

In Google’s for Developers blog, it is apparent that Google can not only identify and describe what’s going on in an image but also reason multiple images together, especially when provided with a little context.

gemini identifying hand gestures as a game of rock paper scissors

Photo Courtesy of Google

Users can even create game prototypes with Gemini by presenting core ideas of what they want their game to be and look like. In addition to that, Gemini can implement executable codes, bringing users’ games closer to life, in which users only need to give Gemini simple coding instructions with natural language.

gemini creating video with natural language coding

Photo Courtesy of Google

Gemini 1.0 comes in three sizes, which are Gemini Nano, Gemini Pro, and Gemini Ultra. Nano is designed to efficiently run on smartphones, like the Pixel 8 phone, and perform on-device AI tasks without any need for external servers. Pro, available on Bard, is able to understand complex queries of various fields with swift response times. Ultra, as Google describes it, is its most capable model, designed to complete highly complex tasks, though it is still undergoing testing and further development.

gemini three sizes ultra pro nano

Photo Courtesy of Google DeepMind

Google gave out early version samples of Gemini Ultra to a small group of companies last September. One user believes Gemini may someday become the most powerful model in the AI market, as during usage, the model was able to avoid most hallucinations, which LLMs these days still often face.

During Google’s tech demo upon Gemini’s release on December 7, a software developer perceived the model’s capacities as fake and edited, “cut to look like it was faster and more capable than it actually is,” causing a brief public outcry. Since then, Google has clarified that some fine-tuning, such as reduced latency, was done to present brevity, but the user prompts and outputs were entirely factual.

shit just went from 0 to 100 real quick meme google during gemini's demo

Gemini Nano is currently available on phones like the Pixel 8, and Pro has been integrated into Bard. Google plans to make Gemini available on Search, Ads, Chrome, and more of its services sometime soon. As for Gemini Pro, users can access the model through Gemini API in Google’s AI Studio and Google Cloud Vertex AI on December 13.

Meme & AI-Generated Picture

cat fighting gpt-4 and gemini
titanic on top of the world meme
ai generated image of a super computer brain

Job Posting

  • Atlassian - Senior Backend Software Engineer - San Francisco, CA (Remote)

  • DeepMind - Principal, Strategy & Operations - Mountain View, CA (In Office)

  • Grammarly - Senior Data Scientist, Acquisition - San Francisco, CA+ (Remote/Hybrid)

  • Notion - Software Engineer, Fullstack - San Francisco, CA (Hybrid)

Promote your product/service to Digger Insights’ Community

Advertise with Digger Insights. Digger Insights’ Miners are professionals and business owners with diverse Industry backgrounds who are looking for interesting and helpful tools, products, services, jobs, events, apps, and books. Email us [email protected]

Your feedback would be greatly appreciated. Send it to [email protected] 

Reply

or to participate.