Visual Prompting for Conversational AI
Visual Context for Conversational AI
Visual Context for Conversational AI
Visual Context for Conversational AI
Visual Context for Conversational AI
OVERVIEW
Most AI chat is text-only. But when users want to reference a specific part of an image, explain a visual change, or communicate something they can see but can't name, words alone are slow and imprecise. This project explores how visual input–drawings, annotations, and commands–can become part of the conversation itself, without leaving the chat.
COMPANY
tldraw
COLLABORATORS
Product
Engineering
GTM
RESPONSIBILITIES
Research & Analysis
Concept Development
UX Exploration
PROBLEM
Conversational AI relies heavily on text, which makes it difficult to communicate visual ideas. When users want to reference specific parts on an image or explain a visual change, they often need to take screenshots, annotate them in another tool an re-upload them into the chat. This breaks the conversational flow and makes iterative worklows slower and more error-prone.
RESEARCH & INSIGHTS
To understand how users combine visuals and text, I explored common workflows across AI chat tools, visual collaboration platforms and messaging apps with annotation features.
These insights suggested an opportunity to integrate visual context directly into the chat interface:
People prefer to show rather than tell when explaining complex ideas, but most existing AI chats rely heavily on text.
Iterative prompting is the norm. Users rarely get what they want in one prompt. They refine, focus, and redirect. Without visual precision, each iteration requires more explanation.
Visual context is fragmented across tools, so users have to switch between chat and external annotation tools
This isn't a whiteboard
with AI–it’s an AI
with a whiteboard
THE SOLUTION
A chat interface where users can interact visually in two ways: upload an image and annotate it directly, or open a blank canvas and draw freehand–then ask the AI about what they've drawn.
A canvas inside the thread: Allows users to draw a visual that's inside their head, when they don't have the words to describe it
HOW PEOPLE USE IT
(A) Instructional:
Give visual instructions to the AI or others e.g. collaborators (e.g. ‘Make this part bigger.’)
(B) Explanatory:
Clarify context in conversations (e.g. ‘Here is a hole in the fabric, as I mentioned.’ to help someone interpret an image) or just to clarify context without text
(C) Expressive:
Enhance images for personal use or social sharing by drawing, marking up, or layering notes directly on top
Why it's valuable
Behavioural Familiarity
Matches real-world visual workflows–users naturally communicate by pointing, circling, layering notes directly on top of visuals
Communication Efficiency
Reduces reliance on text–users don’t have to explain everything in words–visuals help users ‘show’ instead of ‘tell’
Clarity of Interpretation
Visual cues reduce miscommunication, especially in complex images
Convenience
All tools exist in one flow, keeping users focused
Iterative prompting is the norm in AI chats
APPROACH
This was a conceptual project: I created a brief, low-fidelity wireframes and a vibecode demo to explore interaction ideas. The goal was to see how visual prompting could work in a conversational flow.