
Visual Prompting for Conversational AI
Visual Context for Conversational AI
Visual Context for Conversational AI
Visual Context for Conversational AI
Visual Context for Conversational AI
OVERVIEW
This project explores how drawings, annotations, and images can be integrated directly into AI chat to make commnunication clearer and more precise.
Visual changes in AI tools often require multiple prompts or switching to external tools to annotate images. This breaks the conversational flow and slows down iteration.
I explored how visual input could become part of the conversation itself by embedding a whiteboard directly inside chat.
COMPANY
tldraw
COLLABORATORS
Product
Engineering
GTM
RESPONSIBILITIES
Research & Analysis
Concept Development
UX Exploration
PROBLEM
Conversational AI relies heavily on text, which makes it difficult to communicate visual ideas.
When users want to reference specific parts on an image or explain a visual change, they often need to take screenshots, annotate them in another tool an re-upload them into the chat.
This breaks the conversational flow and makes iterative worklows slower and more error-prone. This limits the potential of conversational AI.
People prefer to show rather than tell when explaining complex ideas, but most existing AI chats rely heavily on text
Iterative prompting is common–users refine images over multiple steps and need to focus AI on precise areas within images
Visual context is fragmented across tools, so users have to switch between chat and external annotation tools
RESEARCH & INSIGHTS
To understand how users combine visuals and text, I explored common workflows across AI chat tools, visual collaboration platforms and messaging apps with annotation features.
These insights suggested an opportunity to integrate visual context directly into the chat interface.
This isn't a whiteboard
with AI–it’s an AI
with a whiteboard
THE SOLUTION
A chat interface with an integrated whiteboard that allows users to create visual prompts in real time.
Users can draw freehand, upload screenshots or images, annotate specific areas, or combine visuals and text in a single prompt.
This allows users to communicate visually without leaving the conversation.
HOW PEOPLE USE IT
(A) Instructional:
Give visual instructions to the AI or others e.g. collaborators (e.g. ‘Make this part bigger.’)
(B) Explanatory:
Clarify context in conversations (e.g. ‘Here is a hole in the fabric, as I mentioned.’ to help someone interpret an image)
(C) Expressive:
Enhance images for social sharing or personalisation (e.g. by adding drawings, stickers, or text)
Why it's valuable
Behavioural Familiarity
Matches real-world visual workflows–users naturally communicate by pointing, circling, layering notes directly on top of visuals
Communication Efficiency
Reduces reliance on text–users don’t have to explain everything in words–visuals help users ‘show’ instead of ‘tell’
Clarity of Interpretation
Visual cues reduce miscommunication, especially in complex images
Convenience
All tools exist in one flow, keeping users focused
Iterative prompting is the norm in AI chats
APPROACH
This was a conceptual project: I created a brief, low-fidelity wireframes and a vibecode demo to explore interaction ideas. The goal was to see how visual prompting could work in a conversational flow.
FEATURES
COMMAND FRAME
Highlight a specific part of an image and attach an instruction. This helps the AI focus on the exact part users are referring to.
MULTI-EDIT MODE
Compare generated versions side-by-side and annotate differences to guide further prompts.