Case Study: Gemini Live Next

Original Case Study JUNIUSG.COM

Expanding on Junius Gunaratne obstacles for building a multi modal experience for Google Gemini

Leadership

Mentorship

Strategy

Brainstorming

Gemini Live is a feature that allows for a natural, conversational voice interaction with Google's Gemini LLM. It's built on a foundation of recent advances in voice technology, including on-device and server-side speech recognition and text-to-speech APIs. The core idea is to move beyond simple command-and-response and create a more fluid, human-like interaction. The original prototype used an early version of the LLM, but the final product incorporates more sophisticated technologies like advanced TTS engines and graphics shaders to create a visually compelling and responsive user interface. Gemini Live also integrates with other Google apps like Calendar and Keep, allowing for a more seamless, cross-application experience

Background of previous problem:

Original Case Study

2023’s Problems

Original Case Study JUNIUSG.COM

Problem types:

UI

Prototyping challenges

LLM limitations (tech)

Anthropomporhism

Mentioned Difficulties:

The original poster highlighted several key challenges in designing and prototyping conversational user interfaces for LLMs:

  • Lack of a comprehensive visual UI: Unlike traditional applications, voice-based UIs are often "invisible," making it difficult to convey system states (listening, processing, responding) to the user.
  • Prototyping challenges: Achieving realism in a prototype requires extensive use of on-device APIs, which can be limited in their capabilities (e.g., archaic-sounding text-to-speech).
  • LLM-specific issues: The current limitations of LLMs, such as the tendency to "hallucinate" or a lack of real-time internet access and conventional voice assistant capabilities (like setting timers), make it difficult to prototype for the "near future" capabilities of the technology.
  • Anthropomorphism: Making the system feel human and natural, rather than robotic, is a significant design challenge.

Summarized

No Visual Signal...

Invisible UI

Technology & tools limited for usecase

iOS

OS

Android

Prototyping limitations in 2023

Lack of trust in the system

Anthropomorphism challenges

2025’s Problems

Case Study by Arturo Garcia

Expanding on what works and solving for the biggest challenge - invisible cognitive processing support.

Leadership

Mentorship

Strategy

Brainstorming

Background:

As a Product Consultant and avid user, my contribution was to address the cognitive challenges of using Gemini's multi-modal features. This role involved finding novel solutions to the difficulties of interacting with the system, especially when multiple inputs (like voice and images) are at play.

My work focused on the user's side of the conversation, ensuring that the interface was not just technologically advanced but also intuitive and helpful for the human using it. This likely included focusing on how to make the multi-modal interaction feel seamless and effortless, reducing the cognitive load on the user.

Summarized

Knowledge is processed at system intake, not at human intake.

Cognitive processing starts well before Gemini receives Video, Screen-share, Images or Files.

New Problem

Thinking with eyes closed (left), thinking with eyes observing (middle), thinking while talking (right)

Intentional effort begins before the user even touches devices.

Analyzing 2025 Trends

Focus remains on improving system intake.

The existing system and the team supporting the technology have released amazing features that have focused publicly on information intake (input modalities) and the quality of analysis/evaluation (A.I. benchmarks). The looming problem and potential solution that reframes the accessibility and utility of A.I. lies in supporting the invisible cognition we humans go through everyday.

Team Direction

Alignment

Public ownership of quality of intake/evaluation

Source: Linkedin

Industry trend of empowering cognitive intake

Source: Meta neural band

Worn on the wrist, the band interprets your muscle activity so you can control your experiences in a more intuitive way.

Gemini Live

Problem 1 - Shorter turnaround for sunk cost fallacy

Increased dependence on external processing of contextualized phenomena is leading to a tool focused experience quickly maturing into complex monoliths.

User Problem

Systemic Problem

User Problem

Systemic Problem

Problem 2 - More entrenched user experiences.

Decreased effort & ability to fully comprehend subject matter phenomena as systems are fully capable of executing layered interpretation & comprehension.

Information Intake - Video

Information Intake - Screenshare

Information Intake - Images

Information Intake - Files

Public releaes of quality of intake/evaluation

Source: Gemini.Google

Gemini Live

New Problem

In-Depth

Overly prioritized intake of ever simplified (A.I. acceleration) & observable - repeatable evaluative phenomena neglects invisible cognition.

How might we bring invisible cognition back into focus.

Gemini Live

Solution

Solution

Organize invisible cognition

If user thinks, enable quick embed.

Swipe to contextualize thought with relevant categories (subject matter provided

Swipe

Quick Structured Embed

Live Talk

Swipe to contextualize thought with relevant categories (user provided)

Long

Press

Quick Structured Embed

Live Talk

Summarized

2025’s Problems

Case Study by Arturo Garcia

Expanding on what works and solving for the biggest challenge - invisible cognitive processing support.

Leadership

Mentorship

Strategy

Brainstorming

Live Talk

Swipe to contextualize thought with relevant categories (subject matter provided

Swipe

Quick Structured Embed

Live Talk

Gemini processing & organizing information

Gemini processing & organizing information

Live Talk

Swipe to contextualize thought with relevant categories (user provided)

Long

Press

Quick Structured Embed

Gemini Chat

Gemini processing & organizing embed clip of information

References subject matter for later processing

Swipe

Long

Press

5:15 PM

Location: Creek Trail

Gesture Embed

Case Study: Gemini Live Next

Original Case Study JUNIUSG.COM

Expanding on Junius Gunaratne obstacles for building a multi modal experience for Google Gemini

Leadership

Mentorship

Strategy

Brainstorming

Background of previous problem:

Gemini Live is a feature that allows for a natural, conversational voice interaction with Google's Gemini LLM. It's built on a foundation of recent advances in voice technology, including on-device and server-side speech recognition and text-to-speech APIs. The core idea is to move beyond simple command-and-response and create a more fluid, human-like interaction. The original prototype used an early version of the LLM, but the final product incorporates more sophisticated technologies like advanced TTS engines and graphics shaders to create a visually compelling and responsive user interface. Gemini Live also integrates with other Google apps like Calendar and Keep, allowing for a more seamless, cross-application experience

Original Case Study

Summarized

2023’s Problems

Original Case Study JUNIUSG.COM

Problem types:

UI

Prototyping challenges

LLM limitations (tech)

Anthropomporhism

Mentioned Difficulties:

The original poster highlighted several key challenges in designing and prototyping conversational user interfaces for LLMs:

  • Lack of a comprehensive visual UI: Unlike traditional applications, voice-based UIs are often "invisible," making it difficult to convey system states (listening, processing, responding) to the user.
  • Prototyping challenges: Achieving realism in a prototype requires extensive use of on-device APIs, which can be limited in their capabilities (e.g., archaic-sounding text-to-speech).
  • LLM-specific issues: The current limitations of LLMs, such as the tendency to "hallucinate" or a lack of real-time internet access and conventional voice assistant capabilities (like setting timers), make it difficult to prototype for the "near future" capabilities of the technology.
  • Anthropomorphism: Making the system feel human and natural, rather than robotic, is a significant design challenge.

No Visual Signal...

Invisible UI

iOS

OS

Android

Technology & tools limited for usecase

Prototyping limitations in 2023

Lack of trust in the system

Anthropomorphism challenges

Summarized

2025’s Problems

Case Study by Arturo Garcia

Expanding on what works and solving for the biggest challenge - invisible cognitive processing support.

Leadership

Mentorship

Strategy

Brainstorming

Background:

As a Product Consultant and avid user, my contribution was to address the cognitive challenges of using Gemini's multi-modal features. This role involved finding novel solutions to the difficulties of interacting with the system, especially when multiple inputs (like voice and images) are at play.

My work focused on the user's side of the conversation, ensuring that the interface was not just technologically advanced but also intuitive and helpful for the human using it. This likely included focusing on how to make the multi-modal interaction feel seamless and effortless, reducing the cognitive load on the user.

New Problem

Knowledge is processed at system intake, not at human intake.

Cognitive processing starts well before Gemini receives Video, Screen-share, Images or Files.

Thinking with eyes closed (left), thinking with eyes observing (middle), thinking while talking (right)

Intentional effort begins before the user even touches devices.

Gemini Live

Analyzing 2025 Trends

Focus remains on improving system intake.

The existing system and the team supporting the technology have released amazing features that have focused publicly on information intake (input modalities) and the quality of analysis/evaluation (A.I. benchmarks). The looming problem and potential solution that reframes the accessibility and utility of A.I. lies in supporting the invisible cognition we humans go through everyday.

Team Direction

Alignment

Public ownership of quality of intake/evaluation

Source: Linkedin

Industry trend of empowering cognitive intake

Source: Meta neural band

Worn on the wrist, the band interprets your muscle activity so you can control your experiences in a more intuitive way.

User Problem

Systemic Problem

Problem 1 - Shorter turnaround for sunk cost fallacy

Increased dependence on external processing of contextualized phenomena is leading to a tool focused experience quickly maturing into complex monoliths.

User Problem

Systemic Problem

Problem 2 - More entrenched user experiences.

Decreased effort & ability to fully comprehend subject matter phenomena as systems are fully capable of executing layered interpretation & comprehension.

Public releaes of quality of intake/evaluation

Source: Gemini.Google

Information Intake - Video

Information Intake - Screenshare

Information Intake - Images

Information Intake - Files

Gemini Live

New Problem

In-Depth

Overly prioritized intake of ever simplified (A.I. acceleration) & observable - repeatable evaluative phenomena neglects invisible cognition.

How might we bring invisible cognition back into focus.

Gemini Live

Solution

Solution

Organize invisible cognition

If user thinks, enable quick embed.

Live Talk

Swipe to contextualize thought with relevant categories (subject matter provided

Swipe

Quick Structured Embed

Live Talk

Swipe to contextualize thought with relevant categories (user provided)

Long

Press

Quick Structured Embed

Summarized

Gesture Embeds

Case Study by Arturo Garcia

Leadership

Mentorship

Strategy

Brainstorming

To combat the future of information intake prioritization and prevent any live feature from becoming more than simply an entrenched problem-solving tool, we must intervene by building the experience required to act upon the smallest cognitive thoughts.

 

My 5-year vision is to evolve Gemini Live from simply improving its input modality to instead embracing fleeting thoughts and driving the discovery and shared experiences of these fleeting thoughts.

Live Talk

Swipe to contextualize thought with relevant categories (subject matter provided)

Swipe

Quick Structured Embed

Gemini Chat

Gemini processing & organizing embed clip of information

References subject matter for later processing

Live Talk

Swipe to contextualize thought with relevant categories (user provided)

Long

Press

Quick Structured Embed

Gemini Chat

Gemini processing & organizing embed clip of information

References past conversations and existing knowledge breadth & depth

Swipe

Long

Press

5:15 PM

Location: Creek Trail

Gesture Embed