VCCAI

Visual Prompting for Conversational AI

Visual Context for Conversational AI

Visual Context for Conversational AI

Visual Context for Conversational AI

Visual Context for Conversational AI

OVERVIEW

This project explores how drawings, annotations, and images can be integrated directly into AI chat to make commnunication clearer and more precise.

Visual changes in AI tools often require multiple prompts or switching to external tools to annotate images. This breaks the conversational flow and slows down iteration.

I explored how visual input could become part of the conversation itself by embedding a whiteboard directly inside chat.

COMPANY

tldraw

COLLABORATORS

Product
Engineering
GTM

RESPONSIBILITIES

Research & Analysis
Concept Development
UX Exploration

PROBLEM

Conversational AI relies heavily on text, which makes it difficult to communicate visual ideas.

When users want to reference specific parts on an image or explain a visual change, they often need to take screenshots, annotate them in another tool an re-upload them into the chat.

This breaks the conversational flow and makes iterative worklows slower and more error-prone. This limits the potential of conversational AI.


People prefer to show rather than tell when explaining complex ideas, but most existing AI chats rely heavily on text

Iterative prompting is common–users refine images over multiple steps and need to focus AI on precise areas within images

Visual context is fragmented across tools, so users have to switch between chat and external annotation tools

RESEARCH & INSIGHTS

To understand how users combine visuals and text, I explored common workflows across AI chat tools, visual collaboration platforms and messaging apps with annotation features.

These insights suggested an opportunity to integrate visual context directly into the chat interface.


This isn't a whiteboard
with AI–
it’s an AI
with a whiteboard

THE SOLUTION

A chat interface with an integrated whiteboard that allows users to create visual prompts in real time.

Users can draw freehand, upload screenshots or images, annotate specific areas, or combine visuals and text in a single prompt.

This allows users to communicate visually without leaving the conversation.

 

HOW PEOPLE USE IT

(A) Instructional:
Give visual instructions to the AI or others e.g. collaborators (e.g. ‘Make this part bigger.’)

(B) Explanatory:
Clarify context in conversations (e.g. ‘Here is a hole in the fabric, as I mentioned.’ to help someone interpret an image)

(C) Expressive:
Enhance images for social sharing or personalisation (e.g. by adding drawings, stickers, or text)

Why it's valuable

Behavioural Familiarity

Matches real-world visual workflows–users naturally communicate by pointing, circling, layering notes directly on top of visuals

Communication Efficiency

Reduces reliance on text–users don’t have to explain everything in words–visuals help users ‘show’ instead of ‘tell’

Clarity of Interpretation

Visual cues reduce miscommunication, especially in complex images

Convenience

All tools exist in one flow, keeping users focused

Iterative prompting is the norm in AI chats

APPROACH

This was a conceptual project: I created a brief, low-fidelity wireframes and a vibecode demo to explore interaction ideas. The goal was to see how visual prompting could work in a conversational flow.

FEATURES

COMMAND FRAME

Highlight a specific part of an image and attach an instruction. This helps the AI focus on the exact part users are referring to.

MULTI-EDIT MODE

Compare generated versions side-by-side and annotate differences to guide further prompts.

View
Verified by MonsterInsights