
Exploring multimodal AI at Design Outlook 2025, Naarm Melbourne
This is Part 1 of a 2-part series
Part 1: The conceptual framework and testing multimodal AI prototypes
Part 2: Teaching designers to build with Cursor →
A Year Makes All the Difference
Last year, in 2024, I ran a workshop on multimodal AI. I was very pleased to return this year and revisit the topic at Design Outlook 2025, and my oh my, a lot has changed in one year.
Multimodal AI is very promising to designers as it gives us ways to use a user’s microphone, speaker, and video camera to create interfaces. With services like OpenAI’s GPT-4o and Google’s Gemini now offering fast, affordable multimodal capabilities, the technical barriers have dramatically lowered.
Last year the focus of the conversation was on how fast, cheap, and good are these AI services, and how reliable they are to create a multimodal experience. This year, with the cost and speed of AI dropping rapidly in the last year, the conversation for this year was really about what makes it good.
What does good AI look like?
This shift in focus became clear through my own journey with AI. Like many passionate home cooks, I love cooking but struggle with the mental load of planning meals for my family. When ChatGPT first arrived, I thought I’d found the perfect solution.
The initial results were impressive - quick suggestions, organized shopping lists, and recipe ideas. But as I used it more, the cracks began to show. The AI-generated meal plans looked good on the surface but lacked substance. They were what’s become known as “slop” - output that appears helpful but doesn’t account for real-world complexity:
- My kid’s food preferences changing daily
- The half-used jar of tahini in the back of the fridge
- The fact that Wednesdays are always chaotic
- That moment when a hospital visit throws all plans out the window
This frustration became the catalyst for deeper exploration: How do we build AI applications that actually work in the messy reality of life?
Beyond the Chat Box: What does good AI sound like?
Through the workshop, we explored this question together. My disappointment with ChatGPT’s meal planning had led me down a rabbit hole of prototyping. I built various tools, each teaching me something new about what was missing. Eventually, this journey resulted in Tiny Talking Todos - a broader productivity app designed to handle the interconnected mental load of family life.
Through months of building and testing, three crucial concepts emerged that separate meaningful AI applications from “slop” - what I call the three C’s:
The Three C’s Framework
🗂️ Context
All the data that goes into AI conversations - not just the current prompt, but the full picture of your life’s complexity
🛠️ Capabilities
What the AI agent can actually do with that context - beyond just generating text to taking meaningful actions
🤝 Consent
How to bring other people into AI workflows meaningfully - because life is multiplayer, not single-player
These three C’s became the foundation for exploring multimodal AI at Design Outlook 2025.
Testing Multimodal Prototypes: What We Learned
With the three C’s framework established, I had built three prototypes to help workshop participants experience what multimodal AI looks like in practice. Each prototype was designed to illuminate different aspects of Context, Capabilities, and Consent through hands-on interaction:
- Recipe Generation with Vision AI: How context shapes AI understanding
- Voice-Enabled Shopping Lists: Capabilities beyond text generation
- Real-time Recipe Guidance: The complexity of consent in always-listening systems
These weren’t just demos - they were conversation starters about what makes AI genuinely useful versus merely impressive.
📸 Visual Recipe Generation: The Context Challenge
Participants used AI to generate recipes from food photos, immediately encountering the importance of context:

The recipe generator prototype - turning photos into personalized recipes
Key Insights:
- • Users struggled to provide personalized instructions without scaffolding
- • Metric vs imperial measurements revealed localization needs
- • Dietary preferences, spice tolerance, and nutritional goals all mattered
- • Side-by-side comparison between photo and recipe helped verification
The exercise revealed that context isn’t just about the immediate input (the photo), but encompasses user preferences, cultural background, and specific constraints that make a recipe actually useful.
🎤 Voice Shopping Lists: Capabilities in Action
When participants tested voice-controlled shopping lists, the focus shifted to what AI could actually do:

Voice-controlled shopping lists demonstrated AI's capability to understand and organize spoken items
Capability Discoveries:
- • Duplicate detection and item consolidation
- • Synonym recognition (e.g., "tissues" vs "kleenex")
- • Multilingual support with mixed results
- • Audio feedback for confirming actions
Technical Challenges:
- • Latency issues impacted usability
- • Background noise handling varied by model
- • Response time expectations from Alexa/Siri experience
👨🍳 Real-time Recipe Assistant: The Consent Conundrum
The most revealing prototype was the real-time recipe assistant that could listen and help while cooking:

The always-listening recipe assistant raised important questions about privacy and consent
Consent Considerations:
- • Privacy concerns about always-listening devices
- • Need for clear data collection transparency
- • Hesitation from users about home integration
- • Questions about who has access to conversations
Functional Insights:
- • Proactive timer suggestions at recipe steps
- • Helpful for emergency queries (e.g., burn treatment)
- • Multiple timer management challenges
- • Potential to outperform existing voice assistants
The Three C’s in Practice
Through these exercises, participants experienced firsthand how the three concepts interconnect:
Context Makes AI Relevant
Without rich context, AI produces generic “slop.” The recipe generator needed to know not just what dish was in the photo, but who would eat it, what ingredients were available, and what cooking skills the user possessed.
Capabilities Make AI Useful
Knowing something and doing something are different. The shopping list that could consolidate duplicates and understand synonyms was far more valuable than one that just transcribed words.
Consent Makes AI Trustworthy
The moment we introduced always-listening features, the room’s energy shifted. Participants immediately asked: “Who hears this? Where does it go? Can my family opt out?” These aren’t technical questions - they’re human ones.
From Understanding to Building
After exploring these concepts through testing existing prototypes, participants were eager to build their own. But this wasn’t just enthusiasm - it was a direct response to feedback from last year’s workshop. When I asked participants what they wanted most, the answer was clear: “How can I build my own AI prototypes?”
Last year, I didn’t have a good answer. This year, I came prepared with a concrete approach using Cursor to bridge the gap between understanding AI concepts and actually building them. This led naturally to Part 2 of our workshop: Teaching designers to actually create these experiences using Cursor.
The progression from testing to building was intentional. By first understanding the challenges through hands-on experimentation, participants were better equipped to appreciate why tools like Cursor are essential for keeping pace with AI’s rapid evolution. More importantly, they now had a path forward to continue their learning beyond the workshop.
Key Takeaways for Designers
🎯 Moving Beyond "Slop"
- • Most people's AI experience is limited to chat interfaces
- • Real value comes from understanding user complexity
- • Multimodal interfaces offer richer interaction possibilities
- • The best AI acknowledges life's messiness
🚀 The Design Opportunity
- • Frontier AI capabilities are evolving weekly
- • Traditional design tools can't keep pace
- • Designers who understand the three C's can guide better products
- • The gap between idea and prototype is shrinking
What’s Next?
The Design Outlook 2025 workshop revealed that designers are hungry to move beyond conceptual understanding to hands-on creation. The energy in the room was palpable as participants discovered they could not only critique AI experiences but actively shape them.
In Part 2, we explore how the same group of designers went from testing prototypes to building their own Next.js applications in just three hours - a transformation that surprised even the most experienced participants.
The future of AI design isn’t about choosing between human creativity and machine capabilities - it’s about understanding how Context, Capabilities, and Consent work together to create experiences that genuinely improve people’s lives. This workshop at Design Outlook 2025 proved that designers are ready to lead this charge.
Continue to Part 2
Ready to see how designers learned to build their own AI prototypes?
Read Part 2: Teaching Designers to Cursor with AI →
This workshop was presented at Design Outlook 2025 in Melbourne.
Interested in running a similar workshop for your team?
Explore our workshop offerings →