Google Joins the Vision-Language Model with PaliGemma 2, But How Will It Help its AI Charge?

There are different types of AI models available in the market for users to choose from, and it will largely depend on the type of service they need from the machine learning technology, and Google now followed up its previous tech with the PaliGemma 2. This new AI is an open vision-language model (VLM) which is meant to understand images and other forms of non-text media.

The company is ramping up its multi-faceted approach to artificial intelligence, particularly as it already gave the world the renowned Gemini AI which is a multimodal model that can accept different types of input.

Contents

Google Unveils PaliGemma 2, Its New Vision Language Model
How Can PaliGemma 2 Help Google’s Latest AI Tech?
Google Went All-in For AI This 2024

Google Unveils PaliGemma 2, Its New Vision Language Model

Google revealed their latest AI model for the world, and it is called the PaliGemma 2 which is part of the company’s Gemma models that focus on further experiences on artificial intelligence that centers on vision. Originally, Googleannounced the Gemma 2 model last I/O 2024 in May, and it centers on a vision-language model (VLM) whichspecializes in understanding visual cues for users.

PaliGemma 2 is the latest from the company after launching PaliGemma last May, and back then, it was already capable of providing short captions for images and short videos, using AI to better understand images, object detection and segmentation, as well as “visual question answering.”

However, with the launch of PaliGemma2, Google can now deliver “long captioning” for said images and videos, onewhich offers more detailed information regarding a specific photo, available in different sizes.

Google said that it currently offers model sizes including 3B, 10B, and 28B parameters, including 224px, 448px, and 896px resolutions. PaliGemma 2 can also describe an image actions, emotions, and narratives found on a scene.

Read Also:
Google Unveils Gemini 2.0: Faster AI Features ‘Agents’ That Improve Performance For All

How Can PaliGemma 2 Help Google’s Latest AI Tech?

Google is now leveraging its PaliGemma 2 open VLM to developers, available via Kaggle, Hugging Face, and Ollama for use in various applications to improve their capabilities in vision-based content.

The latest PaliGemma 2 is also capable of more complex and technical vision capabilities where it can detect chemical formulas, music scores, chest X-ray reports, spatial reasoning, and more to expand more of the AI.

According to Google, existing devs who have already used the original PaliGemma may use this next-gen version to upgrade the capabilities and features of their applications, with “immediate performance gains on most tasks without major code modifications.”

Google Went All-in For AI This 2024

Google had a massive 2024 which centered on expanding more of its artificial intelligence offers, with its models offering significant features for users, starting off with its early release of Gemini in December 2023. The company expanded more of its language model’s features and technologies available, now offering different types of its models, as well as the latest being Gemini 2.0.

One of the biggest shifts from Google this year was the I/O 2024 announced artificial intelligence feature of the company that is made available on its Search product, better known as AI Overview. That being said, the other products from Google like its Workspace suite, YouTube, Pixel, and more were all beneficiaries of the company’s latest AI developments.

Not only that, Google DeepMind also ramped up several AI developments where it can now teach robots several functions, factoring in vision models to help expand autonomous machines’ capabilities.

Related Article:
Google’s Quantum Chip Willow Shocks Tech World – But Will It Actually Work?