GEMINI RESEARCH SPRINT

Conducting rapid evaluative research to validate core journeys within Gemini's new personalization feature.

Timeline

Oct - Nov 2025

Role

UX Researcher

Team

3 Researchers, 1 PM

Scope

Usability Testing, Synthesis

Video asset via Google Blog / Google Design Team

OUTCOME

3/3

insights translated into shipped UI changes.

24hr

turnaround on daily insights for the product team.

Millions

of Gemini users impacted at launch.

Research delivered during a 3-week pre-launch sprint for Gemini's Personal Intelligence feature, now live to millions of users globally.

Designer's Note

This was my first AI-related research sprint with a prestigious company. Coming from government and startup backgrounds, the general structure of moderation and synthesis was familiar, but the required rigor was a masterclass in speed. It was highly iterative and without the drag of waiting for multiple teams’ approval. Observations from one meeting were sent directly to engineering for immediate model testing, and specific user phrases were surfaced to solve problems instantly. If the product team decided something wasn't working, we iterated the moderator's guide on the spot. Within three weeks, we evaluated whether users could trust the feature, whether it processed personal information accurately, and whether it walked the fine line between helpful and invasive.

To evaluate successful AI personalization, we built a framework analyzing friction points, trust signals, transparency, and ethical flags. We needed a structure that captured both functional and relational layers, while accounting for how different contexts shifted user reactions. This mattered especially when the model's language began to breach trust—triggering years of internalized dystopian fear about what AI is for and who it serves. Above all, users needed to feel in complete control of their data and their experience, with the clear ability to step away at any moment.

Triggering years of internalized dystopian fear about what AI is for and who it serves.

The biggest takeaway was that evaluating AI requires clear frameworks for both the model and the user, given it is a two-way experience. I witnessed creative freedom and a large budget at work, but packaged in a constrained manner. I learned how to incorporate that into my personal flow at a smaller scale. Constant communication and immediate response loops allowed us to fix glaring issues and testing new versions throughout the sessions. Looking at Personal Intelligence today, I can see how the contextual onboarding and copy rewrites have directly shifted since our study. Designing for AI will continually require this intentionality from researchers as products become more disruptive, constantly balancing the finite resources we have with the infinite ideas we can build.

CONTEXT

I partnered with two researchers to evaluate the new Personal Intelligence feature. Our tasks included moderating sessions and synthesizing data under strict confidentiality.

The Stakes

Rapid end to end usability research (moderation, synthesis, final insight report) must be conducted within a three week period to inform a final launch. 

Rationale

While previous research showed positive sentiment, no formal usability testing had been conducted. Potential usability issues, user comprehension of features’ controls needed to be identified and formed into actionable insights, helping inform final launch & future states. 

Do users understand what the feature is and what value it provides?

Key Questions

The study was framed through these four core questions that targeted value proposition, comprehension, usability and sense of control:

01

Experience & Trust:

Do people like it? Why or why not?

02

Usability & Control:

What doesn't work as expected?

03

Comprehension

How would they describe it to a friend?

04

Value Proposition

What causes people to try it (or stop using it)?

METHODS

Recruitment

18 active Gemini users within an age range (confidential).

Users also must use 1 core Google service(ex. Gmail, Photos).


Protocol

Moderated usability sessions (60 minutes)

10 Desktop, 8 Mobile conducted via Google Meet.

If a certain element was not accessible, we pivoted to a slide deck of mock-ups.

Analysis

Used a structured coding sheet to tag and synthesize patterns.

Notes & broad insights were taken for daily reviews.

This is a recreation of the sheet we used to track key insights by screens per participant, tagging them and sorting them in the correct categories.

COllaboration

During this process, moderation sessions & post-session debriefs were conducted to ensure stakeholders were engaged throughout the sprint.

01

Live Observation Streams

I helped facilitate private live streaming links for invited stakeholders to watch and provide Q&A feedback in real-time.

02

Daily Insights

Beyond the post-session debriefs, I also delivered 24-hour turnaround highlights and assisted with a mid-sprint topline report. 

THe Prioritization Framework

The raw data coded into first user observation sheet was processed again through a detailed overarching system that captured broad overviews. We triaged observations based on a Severity Scale.

RESULTS

Personal Intelligence has now launched to millions of Gemini users globally. Three of our research recommendations had already been incrementally implemented through the daily insights and live observation streams that we provided, giving the product team real-time access to user feedback throughout the sprint. The result was research that shipped successfully!

INSIGHT 01

Activation Gap

Summary

Lack of explicit activation cues left users unsure on what has changed.

Evidence

11

18

participants completed setup flow due to brand trust but was unclear about what the feature would do.

[To a friend] I’ll describe it as ... it connects everything, but I’m not sure what it does.

PARTICIPANT P1, Desktop

RECOMMENDATION

01

Demonstration & In-depth Explanation

Have ability to see more & review more within the opt-in flow to clarify users understanding.

02

Instructive Completion Overlay

Adding an orientation screen after opt-in with instructions & examples to help guide users with usage.

LIVE UI

Contextual Onboarding

A dedicated orientation screen was implemented to guide users through the new aspects of the feature.

INSIGHT 02

Controls & Transparency Gap

Summary

Controls were clear once found, but groupings and intrusive phrases lead to hesitation regarding data boundaries.

Evidence

9

18

participants requested more granular control within the feature. A few more reacted negatively to the copy, which felt uncomfortable.

I didn’t realize Google Workspace would be grouped into one. I was expecting it to be separate.

PARTICIPANT P21, Mobile

RECOMMENDATION

01

Granular Control & Freedom

Break Google Workspace into it’s constituent parts, giving users even more control over their permissions.

02

Factual & Assistive Tone Adherence

Rewrite descriptive copy to focus less on psychological framing (“ex. analyze you..) and more on assistance (find this..).

LIVE UI

Copy Rewrite

Previously worded "show some interesting hidden patterns", the initial prompt now has a friendlier, non-invasive tone.

INSIGHT 03

Personalization Gap

Summary

When the feature works, users are positive. However, the inconsistency & lack of visual attribution prevents users from full satisfication.

Evidence

9

18

participants couldn’t tell when suggestions were personalized versus generic. Other users ran into generated mistakes, recommended content based on outdated data or tenuous links, lowering trust.

I don’t see what AI and automation has to do with football and basketball.. it feels like it’s making assumptions that aren’t there.

PARTICIPANT P10, Desktop

RECOMMENDATION

01

Inline Citations

Personalized claims should have clickable citations to allow for verification.

02

Relevance Adjustment Mechanism

A quick action to provide feedback for mistakes or tweaks to suggestions to retrain the model. (Ex. Thumbs up/down)

Video asset via Google Design Team

LIVE UI

Model Retraining

Users can provide feedback when encountering inaccurate responses or "overpersonalization" with a thumbs/up down mechanism.

IMPACT

3/3

insights translated into shipped UI changes.

Contextual onboarding, copy rewrites, and a model retraining mechanism each landed in the live product before the final deck was delivered.

24hr

turnaround on daily insights for the product team.

Live observation streams, same-day debriefs, and per-participant summaries gave stakeholders structured findings to act on throughout the sprint.

Millions

of Gemini users impacted at launch.

Personal Intelligence is now live with research-backed adjustments to activation, transparency, and trust signals shaping the experience users see.

Takeaways

Being my first AI related research sprint, I found that while the conduct remained the same, the areas being tested, team communication and iteration had key differences.

KEY LEARNINGS

01

In AI research, "Wizard of Oz" prototyping is critical. We can't just test the UI flow; we have to simulate the personality and latency of the model to get authentic trust signals.

02

To finish a project of this scale within three weeks, it takes detailed communication on a daily basis, future planning and on spot iterations. Several tweaks were made after one session, that were built on every time an error was found or to capture important data. These were communicated after every usability session.

03

Getting analysis done for the final report, it was important to build on it day by day and a detailed coding sheet helped researchers work in tandem. The sheet itself needs to be set up to track insights by categories & tags.

Jaclyn Chin

©

2026

Gemini research sprint

Conducting rapid evaluative research to validate core journeys within Gemini's new personalization feature.

Timeline

Oct - Nov 2025

Role

UX Researcher

Team

3 Researchers, 1 PM

Scope

Usability Testing, Synthesis

Video asset via Google Blog / Google Design Team

Designer's Note

This was my first AI-related research sprint with a prestigious company. Coming from government and startup backgrounds, the general structure of moderation and synthesis was familiar, but the required rigor was a masterclass in speed. It was highly iterative and without the drag of waiting for multiple teams’ approval. Observations from one meeting were sent directly to engineering for immediate model testing, and specific user phrases were surfaced to solve problems instantly. If the product team decided something wasn't working, we iterated the moderator's guide on the spot. Within three weeks, we evaluated whether users could trust the feature, whether it processed personal information accurately, and whether it walked the fine line between helpful and invasive.

To evaluate successful AI personalization, we built a framework analyzing friction points, trust signals, transparency, and ethical flags. We needed a structure that captured both functional and relational layers, while accounting for how different contexts shifted user reactions. This mattered especially when the model's language began to breach trust—triggering years of internalized dystopian fear about what AI is for and who it serves. Above all, users needed to feel in complete control of their data and their experience, with the clear ability to step away at any moment.

Triggering years of internalized dystopian fear about what AI is for and who it serves.

The biggest takeaway was that evaluating AI requires clear frameworks for both the model and the user, given it is a two-way experience. I witnessed creative freedom and a large budget at work, but packaged in a constrained manner. I learned how to incorporate that into my personal flow at a smaller scale. Constant communication and immediate response loops allowed us to fix glaring issues and testing new versions throughout the sessions. Looking at Personal Intelligence today, I can see how the contextual onboarding and copy rewrites have directly shifted since our study. Designing for AI will continually require this intentionality from researchers as products become more disruptive, constantly balancing the finite resources we have with the infinite ideas we can build.

CONTEXT

I partnered with two researchers to evaluate a high-priority Gemini personalization feature. Our tasks included moderating sessions and synthesizing data under strict confidentiality.

The Stakes

Rapid end to end usability research (moderation, synthesis, final insight report) must be conducted within a three week period to inform a final launch. 

Rationale

While previous research showed positive sentiment, no formal usability testing had been conducted. Potential usability issues, user comprehension of features’ controls needed to be identified and formed into actionable insights, helping inform final launch & future states. 

Do users understand what the feature is and what value it provides?

Key Questions

The study was framed through these four core questions that targeted value proposition, comprehension, usability and sense of control:

01

Experience & Trust:

Do people like it? Why or why not?

02

Usability & Control:

What doesn't work as expected?

03

Comprehension

How would they describe it to a friend?

04

Value Proposition

What causes people to try it (or stop using it)?

METHODS

Recruitment

18 active Gemini users within an age range (confidential).

Users also must use 1 core Google service(ex. Gmail, Photos).


Protocol

Moderated usability sessions (60 minutes)

10 Desktop, 8 Mobile conducted via Google Meet.

If a certain element was not accessible, we pivoted to a slide deck of mock-ups.

Analysis

Used a structured coding sheet to tag and synthesize patterns.

Notes & broad insights were taken for daily reviews.

This is a recreation of the sheet we used to track key insights by screens per participant, tagging them and sorting them in the correct categories.

COllaboration

During this process, moderation sessions & post-session debriefs were conducted to ensure stakeholders were engaged throughout the sprint.

01

Live Observation Streams

I helped facilitate private live streaming links for invited stakeholders to watch and provide Q&A feedback in real-time.

02

Daily Insights

Beyond the post-session debriefs, I also delivered 24-hour turnaround highlights and assisted with a mid-sprint topline report. 

THe Prioritization Framework

The raw data coded into first user observation sheet was processed again through a detailed overarching system that captured broad overviews. We triaged observations based on a Severity Scale.

OUTCOME

3/3

insights turned into shipped UI changes.

24hr

turnaround on daily insights for the product team.

Millions

of Gemini users impacted at launch.

Research delivered during a 3-week pre-launch sprint for Gemini's Personal Intelligence feature, now live to millions of users globally.

Results

Personal Intelligence has now launched to millions of Gemini users globally. Three of our research recommendations had already been incrementally implemented through the daily insights and live observation streams that we provided, giving the product team real-time access to user feedback throughout the sprint. The result was research that shipped successfully!

INSIGHT 01

Activation Gap

Summary

Lack of explicit activation cues left users unsure on what has changed.

Evidence

11

18

participants completed setup flow due to brand trust but was unclear about what the feature would do.

[To a friend] I’ll describe it as ... it connects everything, but I’m not sure what it does.

PARTICIPANT P1, Desktop

RECOMMENDATION

01

Demonstration & In-depth Explanation

Have ability to see more & review more within the opt-in flow to clarify users understanding.

02

Instructive Completion Overlay

Adding an orientation screen after opt-in with instructions & examples to help guide users with usage.

INSIGHT 02

Controls & Transparency Gap

Summary

Controls were clear once found, but groupings and intrusive phrases lead to hesitation regarding data boundaries.

Evidence

9

18

participants requested more granular control within the feature. A few more reacted negatively to the copy, which felt uncomfortable.

I didn’t realize [redacted] would be grouped into one. I was expecting [redacted] to be separate.

PARTICIPANT P21, Mobile

RECOMMENDATION

01

Granular Control & Freedom

Break a specific toggle into it’s constituent parts, giving users even more control over their permissions.

02

Factual & Assistive Tone Adherence

Rewrite descriptive copy to focus less on psychological framing (“ex. analyze you..) and more on assistance (find this..).

INSIGHT 03

Personalization Gap

Summary

When the feature works, users are positive. However, the inconsistency & lack of visual attribution prevents users from full satisfication.

Evidence

8

18

participants couldn’t tell when suggestions were personalized versus generic. Other users ran into generated mistakes, recommended content based on outdated data or tenuous links, lowering trust.

I don’t see what AI and automation has to do with football and basketball.. it feels like it’s making assumptions that aren’t there.

PARTICIPANT P10, Desktop

RECOMMENDATION

01

Inline Citations

Personalized claims should have clickable citations to allow for verification.

02

Relevance Adjustment Mechanism

An quick action to provide feedback for mistakes or tweaks to suggestions to retrain the model. (Ex. Thumbs up/down)

LIVE UI

Contextual Onboarding

A dedicated orientation screen was implemented to guide users through the new aspects of the feature.

LIVE UI

Copy Rewrite

Previously worded "show some interesting hidden patterns", the initial prompt now has a friendlier, non-invasive tone.

Video asset via Google Design Team

LIVE UI

Model Retraining

Users can provide feedback when encountering inaccurate responses or "overpersonalization" with a thumbs/up down mechanism.

Takeaways

Being my first AI related research sprint, I found that while the conduct remained the same, the areas being tested, team communication and iteration had key differences.

KEY LEARNINGS

01

In AI research, "Wizard of Oz" prototyping is critical. We can't just test the UI flow; we have to simulate the personality and latency of the model to get authentic trust signals.

02

To finish a project of this scale within three weeks, it takes detailed communication on a daily basis, future planning and on spot iterations. Several tweaks were made after one session, that were built on every time an error was found or to capture important data. These were communicated after every usability session.

03

Getting analysis done for the final report, it was important to build on it day by day and a detailed coding sheet helped researchers work in tandem. The sheet itself needs to be set up to track insights by categories & tags.

3/3

insights translated into shipped UI changes.

Contextual onboarding, copy rewrites, and a model retraining mechanism each landed in the live product before the final deck was delivered.

24hr

turnaround on insights for the product team.

Live observation streams, same-day debriefs, and per-participant summaries gave stakeholders structured findings to act on throughout the sprint.

Millions

of Gemini users impacted at launch.

Personal Intelligence is now live with research-backed adjustments to activation, transparency, and trust signals shaping the experience users see.

IMPACT

Designer's Note

This was my first AI-related research sprint with a prestigious company. Coming from government and startup backgrounds, the general structure of moderation and synthesis was familiar, but the required rigor was a masterclass in speed. It was highly iterative and without the drag of waiting for multiple teams’ approval. Observations from one meeting were sent directly to engineering for immediate model testing, and specific user phrases were surfaced to solve problems instantly. If the product team decided something wasn't working, we iterated the moderator's guide on the spot. Within three weeks, we evaluated whether users could trust the feature, whether it processed personal information accurately, and whether it walked the fine line between helpful and invasive.

To evaluate successful AI personalization, we built a framework analyzing friction points, trust signals, transparency, and ethical flags. We needed a structure that captured both functional and relational layers, while accounting for how different contexts shifted user reactions. This mattered especially when the model's language began to breach trust—triggering years of internalized dystopian fear about what AI is for and who it serves. Above all, users needed to feel in complete control of their data and their experience, with the clear ability to step away at any moment.

The IRS designs as inclusively as possible, but at times that can make a feature muddy and non-intuitive for everyone.

The biggest takeaway was that evaluating AI requires clear frameworks for both the model and the user, given it is a two-way experience. I witnessed creative freedom and a large budget at work, but packaged in a constrained manner. I learned how to incorporate that into my personal flow at a smaller scale. Constant communication and immediate response loops allowed us to fix glaring issues and testing new versions throughout the sessions. Looking at Personal Intelligence today, I can see how the contextual onboarding and copy rewrites have directly shifted since our study. Designing for AI will continually require this intentionality from researchers as products become more disruptive, constantly balancing the finite resources we have with the infinite ideas we can build.

Jaclyn Chin

©

2026

Gemini research sprint

Conducting rapid evaluative research to validate core journeys within Gemini's new personalization feature.

Timeline

Oct - Nov 2025

Role

UX Researcher

Team

3 Researchers, 1 PM

Scope

Usability Testing

Synthesis

Video asset via Google Blog / Google Design Team

CONTEXT

I partnered with two researchers to evaluate a high-priority Gemini personalization feature. Our tasks included moderating sessions and synthesizing data under strict confidentiality.

The Stakes

Rapid end to end usability research (moderation, synthesis, final insight report) must be conducted within a three week period to inform a final launch. 

Rationale

While previous research showed positive sentiment, no formal usability testing had been conducted. Potential usability issues, user comprehension of features’ controls needed to be identified and formed into actionable insights, helping inform final launch & future states. 

Do users understand what the feature is and what value it provides?

Key Questions

The study was framed through these four core questions that targeted value proposition, comprehension, usability and sense of control:

01

Experience & Trust:

Do people like it? Why or why not?

02

Usability & Control:

What doesn't work as expected?

03

Comprehension

How would they describe it to a friend?

04

Value Proposition

What causes people to try it (or stop using it)?

COllaboration

During this process, moderation sessions & post-session debriefs were conducted to ensure stakeholders were engaged throughout the sprint.

01

Live Observation Streams

I helped facilitate private live streaming links for invited stakeholders to watch and provide Q&A feedback in real-time.

02

Daily Insights

Beyond the post-session debriefs, I also delivered 24-hour turnaround highlights and assisted with a mid-sprint topline report. 

METHODS

Recruitment

18 active Gemini users within an age range (confidential).

Users also must use 1 core Google service(ex. Gmail, Photos).


Protocol

Moderated usability sessions (60 minutes)

10 Desktop, 8 Mobile conducted via Google Meet.

If a certain element was not accessible, we pivoted to a slide deck of mock-ups.

Analysis

Used a structured coding sheet to tag and synthesize patterns.

Notes & broad insights were taken for daily reviews.

This is a recreation of the sheet we used to track key insights by screens per participant, tagging them and sorting them in the correct categories.

THe Prioritization Framework

The raw data coded into first user observation sheet was processed again through a detailed overarching system that captured broad overviews. We triaged observations based on a Severity Scale.

Results

Personal Intelligence has now launched to millions of Gemini users globally. Three of our research recommendations had already been incrementally implemented through the daily insights and live observation streams that we provided, giving the product team real-time access to user feedback throughout the sprint. The result was research that shipped successfully!

INSIGHT 01

Activation Gap

Summary

Lack of explicit activation cues left users unsure on what has changed.

Evidence

11

18

participants completed setup flow due to brand trust but was unclear about what the feature would do.

[To a friend] I’ll describe it as ... it connects everything, but I’m not sure what it does.

PARTICIPANT P1, Desktop

RECOMMENDATION

01

Demonstration & In-depth Explanation

Have ability to see more & review more within the opt-in flow to clarify users understanding.

02

Instructive Completion Overlay

Adding an orientation screen after opt-in with instructions & examples to help guide users with usage.

INSIGHT 02

Controls & Transparency Gap

Summary

Controls were clear once found, but groupings and intrusive phrases lead to hesitation regarding data boundaries.

Evidence

9

18

participants requested more granular control within the feature. A few more reacted negatively to the copy, which felt uncomfortable.

I didn’t realize [redacted] would be grouped into one. I was expecting [redacted] to be separate.

PARTICIPANT P21, Mobile

RECOMMENDATION

01

Granular Control & Freedom

Break a specific toggle into it’s constituent parts, giving users even more control over their permissions.

02

Factual & Assistive Tone Adherence

Rewrite descriptive copy to focus less on psychological framing (“ex. analyze you..) and more on assistance (find this..).

INSIGHT 03

Personalization Gap

Summary

When the feature works, users are positive. However, the inconsistency & lack of visual attribution prevents users from full satisfication.

Evidence

8

18

participants couldn’t tell when suggestions were personalized versus generic. Other users ran into generated mistakes, recommended content based on outdated data or tenuous links, lowering trust.

I don’t see what AI and automation has to do with football and basketball.. it feels like it’s making assumptions that aren’t there.

PARTICIPANT P10, Desktop

RECOMMENDATION

01

Inline Citations

Personalized claims should have clickable citations to allow for verification.

02

Relevance Adjustment Mechanism

An quick action to provide feedback for mistakes or tweaks to suggestions to retrain the model. (Ex. Thumbs up/down)

LIVE UI

Contextual Onboarding

A dedicated orientation screen was implemented to guide users through the new aspects of the feature.

LIVE UI

Copy Rewrite

Previously worded "show some interesting hidden patterns", the initial prompt now has a friendlier, non-invasive tone.

Video asset via Google Design Team

LIVE UI

Model Retraining

Users can provide feedback when encountering inaccurate responses or "overpersonalization" with a thumbs/up down mechanism.

Takeaways

Being my first AI related research sprint, I found that while the conduct remained the same, the areas being tested, team communication and iteration had key differences.

KEY LEARNINGS

01

In AI research, "Wizard of Oz" prototyping is critical. We can't just test the UI flow; we have to simulate the personality and latency of the model to get authentic trust signals.

02

To finish a project of this scale within three weeks, it takes detailed communication on a daily basis, future planning and on spot iterations. Several tweaks were made after one session, that were built on every time an error was found or to capture important data. These were communicated after every usability session.

03

Getting analysis done for the final report, it was important to build on it day by day and a detailed coding sheet helped researchers work in tandem - the sheet itself needs to be set up to track insights by categories & tags.

IMPACT

3/3

insights translated into shipped UI changes.

Contextual onboarding, copy rewrites, and a model retraining mechanism each landed in the live product before the final deck was delivered.

24hr

turnaround on insights for the product team.

Live observation streams, same-day debriefs, and per-participant summaries gave stakeholders structured findings to act on throughout the sprint.

Millions

of Gemini users impacted at launch.

Personal Intelligence is now live with research-backed adjustments to activation, transparency, and trust signals shaping the experience users see.

OUTCOME

3/3

insights translated into shipped UI changes.

24hr

turnaround on daily insights for the product team.

Millions

of Gemini users impacted at launch.

Research delivered during a 3-week pre-launch sprint for Gemini's Personal Intelligence feature, now live to millions of users globally.

Designer's Note

This was my first AI-related research sprint with a prestigious company. Coming from government and startup backgrounds, the general structure of moderation and synthesis was familiar, but the required rigor was a masterclass in speed. It was highly iterative and without the drag of waiting for multiple teams’ approval. Observations from one meeting were sent directly to engineering for immediate model testing, and specific user phrases were surfaced to solve problems instantly. If the product team decided something wasn't working, we iterated the moderator's guide on the spot. Within three weeks, we evaluated whether users could trust the feature, whether it processed personal information accurately, and whether it walked the fine line between helpful and invasive.

To evaluate successful AI personalization, we built a framework analyzing friction points, trust signals, transparency, and ethical flags. We needed a structure that captured both functional and relational layers, while accounting for how different contexts shifted user reactions. This mattered especially when the model's language began to breach trust—triggering years of internalized dystopian fear about what AI is for and who it serves. Above all, users needed to feel in complete control of their data and their experience, with the clear ability to step away at any moment.

Triggering years of internalized dystopian fear about what AI is for and who it serves.

The biggest takeaway was that evaluating AI requires clear frameworks for both the model and the user, given it is a two-way experience. I witnessed creative freedom and a large budget at work, but packaged in a constrained manner. I learned how to incorporate that into my personal flow at a smaller scale. Constant communication and immediate response loops allowed us to fix glaring issues and testing new versions throughout the sessions. Looking at Personal Intelligence today, I can see how the contextual onboarding and copy rewrites have directly shifted since our study. Designing for AI will continually require this intentionality from researchers as products become more disruptive, constantly balancing the finite resources we have with the infinite ideas we can build.

Research delivered during a 3-week pre-launch sprint for Gemini's Personal Intelligence feature, now live to millions of users globally.

Jaclyn Chin

©

2026

Gemini research sprint

Conducting rapid evaluative research to validate core journeys within Gemini's new personalization feature.

Timeline

Oct - Nov 2025

Role

UX Researcher

Team

3 Researchers

1 PM

Scope

Usability Testing

Synthesis

Video asset via Google Blog / Google Design Team

OUTCOME

3/3

insights translated into shipped UI changes.

24hr

turnaround on daily insights for the product team.

1MIL+

of Gemini users impacted at launch.

Research delivered during a 3-week pre-launch sprint for Gemini's Personal Intelligence feature, now live to millions of users globally.

Designer's Note

This was my first AI-related research sprint with a prestigious company. Coming from government and startup backgrounds, the general structure of moderation and synthesis was familiar, but the required rigor was a masterclass in speed. It was highly iterative and without the drag of waiting for multiple teams’ approval. Observations from one meeting were sent directly to engineering for immediate model testing, and specific user phrases were surfaced to solve problems instantly. If the product team decided something wasn't working, we iterated the moderator's guide on the spot. Within three weeks, we evaluated whether users could trust the feature, whether it processed personal information accurately, and whether it walked the fine line between helpful and invasive.

To evaluate successful AI personalization, we built a framework analyzing friction points, trust signals, transparency, and ethical flags. We needed a structure that captured both functional and relational layers, while accounting for how different contexts shifted user reactions. This mattered especially when the model's language began to breach trust—triggering years of internalized dystopian fear about what AI is for and who it serves. Above all, users needed to feel in complete control of their data and their experience, with the clear ability to step away at any moment.

Triggering years of internalized dystopian fear about what AI is for and who it serves.

The biggest takeaway was that evaluating AI requires clear frameworks for both the model and the user, given it is a two-way experience. I witnessed creative freedom and a large budget at work, but packaged in a constrained manner. I learned how to incorporate that into my personal flow at a smaller scale. Constant communication and immediate response loops allowed us to fix glaring issues and testing new versions throughout the sessions. Looking at Personal Intelligence today, I can see how the contextual onboarding and copy rewrites have directly shifted since our study. Designing for AI will continually require this intentionality from researchers as products become more disruptive, constantly balancing the finite resources we have with the infinite ideas we can build.

CONTEXT

I partnered with two researchers to evaluate a high-priority Gemini personalization feature. Our tasks included moderating sessions and synthesizing data under strict confidentiality.

The Stakes

Rapid end to end usability research (moderation, synthesis, final insight report) must be conducted within a three week period to inform a final launch. 

Rationale

While previous research showed positive sentiment, no formal usability testing had been conducted. Potential usability issues, user comprehension of features’ controls needed to be identified and formed into actionable insights, helping inform final launch & future states. 

Do users understand what the feature is and what value it provides?

Key Questions

The study was framed through these four core questions that targeted value proposition, comprehension, usability and sense of control:

01

Experience & Trust:

Do people like it? Why or why not?

02

Usability & Control:

What doesn't work as expected?

03

Comprehension

How would they describe it to a friend?

04

Value Proposition

What causes people to try it (or stop using it)?

METHODS

Recruitment

18 active Gemini users within an age range (confidential).

Users also must use 1 core Google service(ex. Gmail, Photos).


Protocol

Moderated usability sessions (60 minutes)

10 Desktop, 8 Mobile conducted via Google Meet.

If a certain element was not accessible, we pivoted to a slide deck of mock-ups.

Analysis

Used a structured coding sheet to tag and synthesize patterns.

Notes & broad insights were taken for daily reviews.

This is a recreation of the sheet we used to track key insights by screens per participant, tagging them and sorting them in the correct categories.

COllaboration

During this process, moderation sessions & post-session debriefs were conducted to ensure stakeholders were engaged throughout the sprint.

01

Live Observation Streams

I helped facilitate private live streaming links for invited stakeholders to watch and provide Q&A feedback in real-time.

02

Daily Insights

Beyond the post-session debriefs, I also delivered 24-hour turnaround highlights and assisted with a mid-sprint topline report. 

THe Prioritization Framework

The raw data coded into first user observation sheet was processed again through a detailed overarching system that captured broad overviews. We triaged observations based on a Severity Scale.

OUTCOME

Personal Intelligence has now launched! Three of our research recommendations had already been incrementally implemented through the daily insights and live observation streams that we provided, giving the product team real-time access to user feedback throughout the sprint. The result was research that shipped successfully!

INSIGHT 01

Activation Gap

Summary

Lack of explicit activation cues left users unsure on what has changed.

Evidence

11

18

participants completed setup flow due to brand trust but was unclear about what the feature would do.

[To a friend] I’ll describe it as ... it connects everything, but I’m not sure what it does.

PARTICIPANT P1, Desktop

RECOMMENDATION

01

Demonstration & In-depth Explanation

Have ability to see more & review more within the opt-in flow to clarify users understanding.

02

Instructive Completion Overlay

Adding an orientation screen after opt-in with instructions & examples to help guide users with usage.

LIVE UI

Contextual Onboarding

A dedicated orientation screen was implemented to guide users through the new aspects of the feature.

INSIGHT 02

Controls & Transparency Gap

Summary

Controls were clear once found, but groupings and intrusive phrases lead to hesitation regarding data boundaries.

Evidence

9

18

participants requested more granular control within the feature. A few more reacted negatively to the copy, which felt uncomfortable.

I didn’t realize [redacted] would be grouped into one. I was expecting [redacted] to be separate.

PARTICIPANT P21, Mobile

RECOMMENDATION

01

Granular Control & Freedom

Break a specific toggle into it’s constituent parts, giving users even more control over their permissions.

02

Factual & Assistive Tone Adherence

Rewrite descriptive copy to focus less on psychological framing (“ex. analyze you..) and more on assistance (find this..).

LIVE UI

Copy Rewrite

Previously worded "show some interesting hidden patterns", the initial prompt now has a friendlier, non-invasive tone.

INSIGHT 03

Personalization Gap

Summary

When the feature works, users are positive. However, the inconsistency & lack of visual attribution prevents users from full satisfication.

Evidence

8

18

participants couldn’t tell when suggestions were personalized versus generic. Other users ran into generated mistakes, recommended content based on outdated data or tenuous links, lowering trust.

I don’t see what AI and automation has to do with football and basketball.. it feels like it’s making assumptions that aren’t there.

PARTICIPANT P10, Desktop

RECOMMENDATION

01

Inline Citations

Personalized claims should have clickable citations to allow for verification.

02

Relevance Adjustment Mechanism

An quick action to provide feedback for mistakes or tweaks to suggestions to retrain the model. (Ex. Thumbs up/down)

Video asset via Google Design Team

LIVE UI

Model Retraining

Users can provide feedback when encountering inaccurate responses or "overpersonalization" with a thumbs/up down mechanism.

IMPACT

3/3

insights translated into shipped UI changes.

Contextual onboarding, copy rewrites, and a model retraining mechanism each landed in the live product before the final deck was delivered.

24hr

turnaround on insights for the product team.

Live observation streams, same-day debriefs, and per-participant summaries gave stakeholders structured findings to act on throughout the sprint.

Millions

of Gemini users impacted at launch.

Personal Intelligence is now live with research-backed adjustments to activation, transparency, and trust signals shaping the experience users see.

Takeaways

Being my first AI related research sprint, I found that while the conduct remained the same, the areas being tested, team communication and iteration had key differences.

KEY LEARNINGS

01

In AI research, "Wizard of Oz" prototyping is critical. We can't just test the UI flow; we have to simulate the personality and latency of the model to get authentic trust signals.

02

To finish a project of this scale within three weeks, it takes detailed communication on a daily basis, future planning and on spot iterations. Several tweaks were made after one session, that were built on every time an error was found or to capture important data. These were communicated after every usability session.

03

Getting analysis done for the final report, it was important to build on it day by day and a detailed coding sheet helped researchers work in tandem - the sheet itself needs to be set up to track insights by categories & tags.

Jaclyn Chin

©

2026

MENU

Gemini research sprint

Conducting rapid evaluative research to validate core journeys within Gemini's new personalization feature.

Timeline

Oct - Nov 2025

Role

UX Researcher

Team

3 Researchers

1 PM

Scope

Usability Testing

Synthesis

Video asset via Google Blog / Google Design Team

OUTCOME

3/3

insights translated into shipped UI changes.

24hr

turnaround on daily insights for the product team.

1MIL+

of Gemini users impacted at launch.

Research delivered during a 3-week pre-launch sprint for Gemini's Personal Intelligence feature, now live to millions of users globally.

Designer's Note

This was my first AI-related research sprint with a prestigious company. Coming from government and startup backgrounds, the general structure of moderation and synthesis was familiar, but the required rigor was a masterclass in speed. It was highly iterative and without the drag of waiting for multiple teams’ approval. Observations from one meeting were sent directly to engineering for immediate model testing, and specific user phrases were surfaced to solve problems instantly. If the product team decided something wasn't working, we iterated the moderator's guide on the spot. Within three weeks, we evaluated whether users could trust the feature, whether it processed personal information accurately, and whether it walked the fine line between helpful and invasive.

Triggering years of internalized dystopian fear about what AI is for and who it serves.

We found that the IRS is seen as a financial institution, yet the mental model that users apply universally to other platforms wasn't reflected in this Make a Payment experience. Bringing all the essential features internally within the flow was the minimum threshold to make it intuitive. I initially tried keeping it similar to the original layout of fill-in blocks, just structured more clearly. However, due to the density of the information and the technical limit of 5 payments per session, a better grouping mechanism had to exist. By employing Gestalt principles, we brought in a dynamic card system with a balance summary, allowing users to easily add and remove payment cards as needed.

Going forward, I realize having a multifaceted user base requires a lot of thought, mapping, and discussion between stakeholders. learned that adaptation is necessary for clean flows. By having different methods of payment that resonate with different life scenarios—a quick flow for those in good standing, a different flow for those on payment plans, and adapted approaches for elderly or ESL users—we can build experiences that actually work for everyone. Holding this holistic view gave me a better picture of how to manage projects with extremely diverse audiences going forward.

CONTINUE READING +

CONTEXT

I partnered with two researchers to evaluate a high-priority Gemini personalization feature. Our tasks included moderating sessions and synthesizing data under strict confidentiality.

The Stakes

Rapid end to end usability research (moderation, synthesis, final insight report) must be conducted within a three week period to inform a final launch. 

Rationale

While previous research showed positive sentiment, no formal usability testing had been conducted. Potential usability issues, user comprehension of features’ controls needed to be identified and formed into actionable insights, helping inform final launch & future states. 

Do users understand what the feature is and what value it provides?

Key Questions

The study was framed through these four core questions that targeted value proposition, comprehension, usability and sense of control:

01

Experience & Trust:

Do people like it? Why or why not?

02

Usability & Control:

What doesn't work as expected?

03

Comprehension

How would they describe it to a friend?

04

Value Proposition

What causes people to try it (or stop using it)?

METHODS

Recruitment

18 active Gemini users within an age range (confidential).

Users also must use 1 core Google service(ex. Gmail, Photos).


Protocol

Moderated usability sessions (60 minutes)

10 Desktop, 8 Mobile conducted via Google Meet.

If a certain element was not accessible, we pivoted to a slide deck of mock-ups.

Analysis

Used a structured coding sheet to tag and synthesize patterns.

Notes & broad insights were taken for daily reviews.

This is a recreation of the sheet we used to track key insights by screens per participant, tagging them and sorting them in the correct categories.

COllaboration

Personal Intelligence has now launched! Three of our research recommendations had already been incrementally implemented through the daily insights and live observation streams that we provided, giving the product team real-time access to user feedback throughout the sprint. The result was research that shipped successfully.

01

Live Observation Streams

I helped facilitate private live streaming links for invited stakeholders to watch and provide Q&A feedback in real-time.

02

Daily Insights

Beyond the post-session debriefs, I also delivered 24-hour turnaround highlights and assisted with a mid-sprint topline report. 

THe Prioritization Framework

The raw data coded into first user observation sheet was processed again through a detailed overarching system that captured broad overviews. We triaged observations based on a Severity Scale.

OUTCOME

By the time the Final Deck was delivered, the stakeholders had already been notified of recurring issues and were incrementally making fixes to both the UX and also mental model of Gemini. We were about to make these recommendations to the larger team based on our findings.

INSIGHT 01

Activation Gap

Summary

Lack of explicit activation cues left users unsure on what has changed.

Evidence

11

18

participants completed setup flow due to brand trust but was unclear about what the feature would do.

[To a friend] I’ll describe it as ... it connects everything, but I’m not sure what it does.

PARTICIPANT P1, Desktop

RECOMMENDATION

01

Demonstration & In-depth Explanation

Have ability to see more & review more within the opt-in flow to clarify users understanding.

02

Instructive Completion Overlay

Adding an orientation screen after opt-in with instructions & examples to help guide users with usage.

LIVE UI

Contextual Onboarding

A dedicated orientation screen was implemented to guide users through the new aspects of the feature.

INSIGHT 02

Controls & Transparency Gap

Summary

Controls were clear once found, but groupings and intrusive phrases lead to hesitation regarding data boundaries.

Evidence

9

18

participants requested more granular control within the feature. A few more reacted negatively to the copy, which felt uncomfortable.

I didn’t realize [redacted] would be grouped into one. I was expecting [redacted] to be separate.

PARTICIPANT P21, Mobile

RECOMMENDATION

01

Granular Control & Freedom

Break a specific toggle into it’s constituent parts, giving users even more control over their permissions.

02

Factual & Assistive Tone Adherence

Rewrite descriptive copy to focus less on psychological framing (“ex. analyze you..) and more on assistance (find this..).

LIVE UI

Copy Rewrite

Previously worded "show some interesting hidden patterns", the initial prompt now has a friendlier, non-invasive tone.

INSIGHT 03

Personalization Gap

Summary

When the feature works, users are positive. However, the inconsistency & lack of visual attribution prevents users from full satisfication.

Evidence

8

18

participants couldn’t tell when suggestions were personalized versus generic. Other users ran into generated mistakes, recommended content based on outdated data or tenuous links, lowering trust.

I don’t see what AI and automation has to do with football and basketball.. it feels like it’s making assumptions that aren’t there.

PARTICIPANT P10, Desktop

RECOMMENDATION

01

Inline Citations

Personalized claims should have clickable citations to allow for verification.

02

Relevance Adjustment Mechanism

An quick action to provide feedback for mistakes or tweaks to suggestions to retrain the model. (Ex. Thumbs up/down)

Video asset via Google Design Team

LIVE UI

Model Retraining

Users can provide feedback when encountering inaccurate responses or "overpersonalization" with a thumbs/up down mechanism.

IMPACT

3/3

insights translated into shipped UI changes.

From review to completion, we see a 5% drop-off rate compared to the previous 15%.

24hr

turnaround on insights for the product team.

Live observation streams, same-day debriefs, and per-participant summaries gave stakeholders structured findings to act on throughout the sprint.

Millions

of Gemini users impacted at launch.

Personal Intelligence is now live with research-backed adjustments to activation, transparency, and trust signals shaping the experience users see.

Takeaways

Being my first AI related research sprint, I found that while the conduct remained the same, the areas being tested, team communication and iteration had key differences.

KEY LEARNINGS

01

In AI research, "Wizard of Oz" prototyping is critical. We can't just test the UI flow; we have to simulate the personality and latency of the model to get authentic trust signals.

02

To finish a project of this scale within three weeks, it takes detailed communication on a daily basis, future planning and on spot iterations. Several tweaks were made after one session, that were built on every time an error was found or to capture important data. These were communicated after every usability session.

03

Getting analysis done for the final report, it was important to build on it day by day and a detailed coding sheet helped researchers work in tandem - the sheet itself needs to be set up to track insights by categories & tags.

Jaclyn Chin

©

2026