Case Study: Gemini Live Next
Original Case Study JUNIUSG.COM
Expanding on Junius Gunaratne obstacles for building a multi modal experience for Google Gemini
Leadership
Mentorship
Strategy
Brainstorming
Gemini Live is a feature that allows for a natural, conversational voice interaction with Google's Gemini LLM. It's built on a foundation of recent advances in voice technology, including on-device and server-side speech recognition and text-to-speech APIs. The core idea is to move beyond simple command-and-response and create a more fluid, human-like interaction. The original prototype used an early version of the LLM, but the final product incorporates more sophisticated technologies like advanced TTS engines and graphics shaders to create a visually compelling and responsive user interface. Gemini Live also integrates with other Google apps like Calendar and Keep, allowing for a more seamless, cross-application experience
Background of previous problem:
Original Case Study
2023’s Problems
Original Case Study JUNIUSG.COM
Problem types:
UI
Prototyping challenges
LLM limitations (tech)
Anthropomporhism
Mentioned Difficulties:
The original poster highlighted several key challenges in designing and prototyping conversational user interfaces for LLMs:
Summarized
No Visual Signal...
Invisible UI
Technology & tools limited for usecase
iOS
OS
Android
Prototyping limitations in 2023
Lack of trust in the system
Anthropomorphism challenges
2025’s Problems
Case Study by Arturo Garcia
Expanding on what works and solving for the biggest challenge - invisible cognitive processing support.
Leadership
Mentorship
Strategy
Brainstorming
Background:
As a Product Consultant and avid user, my contribution was to address the cognitive challenges of using Gemini's multi-modal features. This role involved finding novel solutions to the difficulties of interacting with the system, especially when multiple inputs (like voice and images) are at play.
My work focused on the user's side of the conversation, ensuring that the interface was not just technologically advanced but also intuitive and helpful for the human using it. This likely included focusing on how to make the multi-modal interaction feel seamless and effortless, reducing the cognitive load on the user.
Summarized
Knowledge is processed at system intake, not at human intake.
Cognitive processing starts well before Gemini receives Video, Screen-share, Images or Files.
New Problem

Thinking with eyes closed (left), thinking with eyes observing (middle), thinking while talking (right)

Intentional effort begins before the user even touches devices.
Analyzing 2025 Trends
Focus remains on improving system intake.
The existing system and the team supporting the technology have released amazing features that have focused publicly on information intake (input modalities) and the quality of analysis/evaluation (A.I. benchmarks). The looming problem and potential solution that reframes the accessibility and utility of A.I. lies in supporting the invisible cognition we humans go through everyday.
Team Direction
Alignment

Public ownership of quality of intake/evaluation
Source: Linkedin

Industry trend of empowering cognitive intake
Source: Meta neural band

Worn on the wrist, the band interprets your muscle activity so you can control your experiences in a more intuitive way.
Gemini Live
Problem 1 - Shorter turnaround for sunk cost fallacy
Increased dependence on external processing of contextualized phenomena is leading to a tool focused experience quickly maturing into complex monoliths.
User Problem
Systemic Problem
User Problem
Systemic Problem
Problem 2 - More entrenched user experiences.
Decreased effort & ability to fully comprehend subject matter phenomena as systems are fully capable of executing layered interpretation & comprehension.
Information Intake - Video
Information Intake - Screenshare
Information Intake - Images
Information Intake - Files
Public releaes of quality of intake/evaluation
Source: Gemini.Google

Gemini Live
New Problem
In-Depth
Overly prioritized intake of ever simplified (A.I. acceleration) & observable - repeatable evaluative phenomena neglects invisible cognition.
How might we bring invisible cognition back into focus.

Gemini Live
Solution

Solution
Organize invisible cognition
If user thinks, enable quick embed.
Swipe to contextualize thought with relevant categories (subject matter provided

Swipe
Quick Structured Embed
Live Talk
Swipe to contextualize thought with relevant categories (user provided)

Long
Press
Quick Structured Embed
Live Talk
Summarized
2025’s Problems
Case Study by Arturo Garcia
Expanding on what works and solving for the biggest challenge - invisible cognitive processing support.
Leadership
Mentorship
Strategy
Brainstorming
Live Talk
Swipe to contextualize thought with relevant categories (subject matter provided

Swipe
Quick Structured Embed
Live Talk
Gemini processing & organizing information

Gemini processing & organizing information
Live Talk
Swipe to contextualize thought with relevant categories (user provided)

Long
Press
Quick Structured Embed
Gemini Chat
Gemini processing & organizing embed clip of information

References subject matter for later processing




Swipe
Long
Press


5:15 PM
Location: Creek Trail
Gesture Embed
Case Study: Gemini Live Next
Original Case Study JUNIUSG.COM
Expanding on Junius Gunaratne obstacles for building a multi modal experience for Google Gemini
Leadership
Mentorship
Strategy
Brainstorming
Background of previous problem:
Gemini Live is a feature that allows for a natural, conversational voice interaction with Google's Gemini LLM. It's built on a foundation of recent advances in voice technology, including on-device and server-side speech recognition and text-to-speech APIs. The core idea is to move beyond simple command-and-response and create a more fluid, human-like interaction. The original prototype used an early version of the LLM, but the final product incorporates more sophisticated technologies like advanced TTS engines and graphics shaders to create a visually compelling and responsive user interface. Gemini Live also integrates with other Google apps like Calendar and Keep, allowing for a more seamless, cross-application experience
Original Case Study
Summarized
2023’s Problems
Original Case Study JUNIUSG.COM
Problem types:
UI
Prototyping challenges
LLM limitations (tech)
Anthropomporhism
Mentioned Difficulties:
The original poster highlighted several key challenges in designing and prototyping conversational user interfaces for LLMs:
No Visual Signal...
Invisible UI
iOS
OS
Android
Technology & tools limited for usecase
Prototyping limitations in 2023
Lack of trust in the system
Anthropomorphism challenges
Summarized
2025’s Problems
Case Study by Arturo Garcia
Expanding on what works and solving for the biggest challenge - invisible cognitive processing support.
Leadership
Mentorship
Strategy
Brainstorming
Background:
As a Product Consultant and avid user, my contribution was to address the cognitive challenges of using Gemini's multi-modal features. This role involved finding novel solutions to the difficulties of interacting with the system, especially when multiple inputs (like voice and images) are at play.
My work focused on the user's side of the conversation, ensuring that the interface was not just technologically advanced but also intuitive and helpful for the human using it. This likely included focusing on how to make the multi-modal interaction feel seamless and effortless, reducing the cognitive load on the user.
New Problem
Knowledge is processed at system intake, not at human intake.
Cognitive processing starts well before Gemini receives Video, Screen-share, Images or Files.

Thinking with eyes closed (left), thinking with eyes observing (middle), thinking while talking (right)

Intentional effort begins before the user even touches devices.
Gemini Live
Analyzing 2025 Trends
Focus remains on improving system intake.
The existing system and the team supporting the technology have released amazing features that have focused publicly on information intake (input modalities) and the quality of analysis/evaluation (A.I. benchmarks). The looming problem and potential solution that reframes the accessibility and utility of A.I. lies in supporting the invisible cognition we humans go through everyday.
Team Direction
Alignment

Public ownership of quality of intake/evaluation
Source: Linkedin

Industry trend of empowering cognitive intake
Source: Meta neural band

Worn on the wrist, the band interprets your muscle activity so you can control your experiences in a more intuitive way.
User Problem
Systemic Problem
Problem 1 - Shorter turnaround for sunk cost fallacy
Increased dependence on external processing of contextualized phenomena is leading to a tool focused experience quickly maturing into complex monoliths.
User Problem
Systemic Problem
Problem 2 - More entrenched user experiences.
Decreased effort & ability to fully comprehend subject matter phenomena as systems are fully capable of executing layered interpretation & comprehension.

Public releaes of quality of intake/evaluation
Source: Gemini.Google
Information Intake - Video
Information Intake - Screenshare
Information Intake - Images
Information Intake - Files
Gemini Live
New Problem
In-Depth
Overly prioritized intake of ever simplified (A.I. acceleration) & observable - repeatable evaluative phenomena neglects invisible cognition.
How might we bring invisible cognition back into focus.

Gemini Live
Solution

Solution
Organize invisible cognition
If user thinks, enable quick embed.
Live Talk
Swipe to contextualize thought with relevant categories (subject matter provided

Swipe
Quick Structured Embed
Live Talk
Swipe to contextualize thought with relevant categories (user provided)

Long
Press
Quick Structured Embed
Summarized
Gesture Embeds
Case Study by Arturo Garcia
Leadership
Mentorship
Strategy
Brainstorming
To combat the future of information intake prioritization and prevent any live feature from becoming more than simply an entrenched problem-solving tool, we must intervene by building the experience required to act upon the smallest cognitive thoughts.
My 5-year vision is to evolve Gemini Live from simply improving its input modality to instead embracing fleeting thoughts and driving the discovery and shared experiences of these fleeting thoughts.
Live Talk
Swipe to contextualize thought with relevant categories (subject matter provided)

Swipe
Quick Structured Embed
Gemini Chat
Gemini processing & organizing embed clip of information

References subject matter for later processing
Live Talk
Swipe to contextualize thought with relevant categories (user provided)

Long
Press
Quick Structured Embed
Gemini Chat
Gemini processing & organizing embed clip of information

References past conversations and existing knowledge breadth & depth




Swipe
Long
Press


5:15 PM
Location: Creek Trail
Gesture Embed