Looking
back at the beginning of the year, it's remarkable to consider the significant
advancements since generative AI burst onto the mainstream scene. This month,
Google introduced three notable developments: Gemini, the AI Hyper computer,
and Duet AI for Developers, which is now generally available. These additions
join numerous other gen AI products and hundreds of gen AI updates released in
2023, reflecting the astonishing pace of progress.
This rapid innovation is
evident across the board. Within Google Cloud, the number of active gen AI
projects on Vertex AI has grown by more than 7X. Already, Gemini is greatly
enhancing the Vertex AI platform, empowering developers to create sophisticated
AI agents. Moreover, it is set to become part of our Duet AI portfolio,
ensuring customers have AI support accessible whenever and wherever needed. There has been a notable surge in activity within the open-source generative AI
community, accompanied by the emergence of remarkable models from various
organizations in the industry. This represents a truly thrilling period of
growth.
Additionally,
at the outset of 2023, most models were restricted to their training data;
however, we now have robust solutions for fine-tuning models and integrating
them with external and proprietary sources. This enables organizations to
leverage the intelligence of AI models across their data. From
empowering question-answer chat-bots that access an organization's entire range
of data to synthesizing and evaluating a variety of information, these
capabilities are driving remarkable use cases.
To
refrain from exaggeration, my initial experiences with Gemini felt akin to a
magical "Eureka" moment. I eagerly anticipate the opportunity for
others to experience their own moments of revelation. This marks the point at
which an increasing number of leaders will not only identify new applications
for generative AI, but will personally integrate it into nearly every aspect of
their operations. Gemini
has been specifically designed to be multimodal, allowing it to effectively
process, comprehend, and integrate diverse forms of information such as text,
code, audio, images, and video concurrently. Consequently, Gemini is capable of
responding to inquiries like: "What was the cash dividend payout ratio for
this bank or online retailer over the last five years?"
Payout
ratios represent the portion of a company's earnings distributed to
shareholders as dividends relative to its total earnings. To address this, a
model must possess a comprehensive comprehension of the various definitions of
cash, cash equivalents, and dividends, and be capable of applying them within
the context of mathematical ratios. Furthermore, it should accurately retrieve
financial data from external systems over the past five years and leverage
other AI models to compute the ratio. Multimodality
distinguishes between models that can predict subsequent words in a sentence
and sophisticated models that not only comprehend but also act on information
across diverse data types. In order
to respond to the aforementioned query, a model must be able to recognize
mathematical ideas such as equations and locate the precise components required
– two tasks that were unthinkable less than a year ago.
According
to models like Gemini, a completely new era of generation AI is about to begin,
one that will get us closer to actual language understanding and enable systems
to synthesize a wide range of data kinds and generate much more value for
businesses across sectors. Because
models like Gemini can handle so many more scenarios, it also means that the
applicability across domains and real-world environments are that much
stronger. Our on-device, mobile-sized Gemini Nano model opens up new
possibilities for running artificial intelligence (AI) at the edge, allowing
for faster, more secure data analysis and response with constrained connectivity.
These mobile-first models can improve a wide range of jobs, including augmented
gaming, mobile banking, and emergency services.
Additionally,
multimodal capabilities give enterprises fresh approaches to combining
disparate data kinds to address real-world problems. Numerous sectors deal with
unstructured, unforeseen issues that might not be resolved by using a single
type of analysis or a small number of data sources.
For
example, increasing safety on building sites necessitates the analysis and
integration of a wide range of data. An organization may own visual data, such
as photos or video feeds, incident reports from building sites, or other kinds
of data, such as schedule delays or financial charges. With the aid of
multi-modal gen AI models, it will be possible to combine all of this data,
identify the locations, times, and modes of accidents, and develop safer, more
effective methods.
0 Comments