Visual Prompting for Conversational AI

Visual Context for Conversational AI

OVERVIEW

Most AI chat is text-only. But when users want to reference a specific part of an image, explain a visual change, or communicate something they can see but can't name, words alone are slow and imprecise. This project explores how visual input–drawings, annotations, and commands–can become part of the conversation itself, without leaving the chat.

COMPANY

tldraw

COLLABORATORS

Product
Engineering
GTM

RESPONSIBILITIES

Research & Analysis
Concept Development
UX Exploration

PROBLEM

Conversational AI relies heavily on text, which makes it difficult to communicate visual ideas. When users want to reference specific parts on an image or explain a visual change, they often need to take screenshots, annotate them in another tool an re-upload them into the chat. This breaks the conversational flow and makes iterative worklows slower and more error-prone.

RESEARCH & INSIGHTS

To understand how users combine visuals and text, I explored common workflows across AI chat tools, visual collaboration platforms and messaging apps with annotation features.

These insights suggested an opportunity to integrate visual context directly into the chat interface:

People prefer to show rather than tell when explaining complex ideas, but most existing AI chats rely heavily on text.

Iterative prompting is the norm. Users rarely get what they want in one prompt. They refine, focus, and redirect. Without visual precision, each iteration requires more explanation.

Visual context is fragmented across tools, so users have to switch between chat and external annotation tools

This isn't a whiteboard
with AI–it’s an AI
with a whiteboard

THE SOLUTION

A chat interface where users can interact visually in two ways: upload an image and annotate it directly, or open a blank canvas and draw freehand–then ask the AI about what they've drawn.

A canvas inside the thread: Allows users to draw a visual that's inside their head, when they don't have the words to describe it

HOW PEOPLE USE IT

(A) Instructional:
Give visual instructions to the AI or others e.g. collaborators (e.g. ‘Make this part bigger.’)

(B) Explanatory:
Clarify context in conversations (e.g. ‘Here is a hole in the fabric, as I mentioned.’ to help someone interpret an image) or just to clarify context without text

(C) Expressive:
Enhance images for personal use or social sharing by drawing, marking up, or layering notes directly on top

Why it's valuable

Behavioural Familiarity

Matches real-world visual workflows–users naturally communicate by pointing, circling, layering notes directly on top of visuals

Communication Efficiency

Reduces reliance on text–users don’t have to explain everything in words–visuals help users ‘show’ instead of ‘tell’

Clarity of Interpretation

Visual cues reduce miscommunication, especially in complex images

Convenience

All tools exist in one flow, keeping users focused

Iterative prompting is the norm in AI chats

APPROACH

This was a conceptual project: I created a brief, low-fidelity wireframes and a vibecode demo to explore interaction ideas. The goal was to see how visual prompting could work in a conversational flow.