From Artifacts to Agents: Building a Better Todo List Generator

The Challenge

Creating AI-generated React components that actually work in production is harder than it looks. This is the story of how I evolved from a simple artifacts-inspired approach to building a proper agentic system.

The Solution

A Cloud Run service powered by Claude Code SDK that can iteratively generate, validate, and refine React components before they reach the user's browser.

The Artifact That Started It All

When Claude introduced artifacts, I was immediately captivated. The idea that you could prompt an AI to generate a complete, functional React component was revolutionary. It felt like the future of development had arrived—until I tried to build something real with it.

I discovered a repository by Sully that had recreated Claude artifacts, and it became my foundation. What fascinated me most was how it used the original system prompt with XML tags that encouraged the model to “think”:

<thinking>
The user wants a todo list with categories. 
I should create a component with state management 
for todos and filtering by category...
</thinking>

This was before reasoning models became mainstream—a glimpse into how thoughtful prompting could dramatically improve code generation quality. The model would generate these thinking tags to build context, improving the final output without showing the reasoning to users.

The One-Shot Problem

My initial proof of concept worked… sometimes. I could prompt the system to generate a todo list component, and it would produce something that looked right. But more often than not, the component would be broken:

React Hook violations - The classic “Rendered more hooks than during the previous render”
Missing imports - Especially for icon libraries
State management issues - Race conditions and stale closures
Type errors - When TypeScript was involved

For a developer, these are solvable problems. But asking a non-technical user to debug React errors? That’s a non-starter. The experience felt like handing someone a half-assembled IKEA furniture kit without instructions.

Error: Invalid hook call. Hooks can only be called inside of the body of a function component.

The dreaded error that every React developer knows—but your users shouldn't have to.

Incremental Improvements

As models improved—particularly with GPT-4 and Claude 3.5—the success rate increased. My requirements were deliberately simple: just generate a single React component for a todo list. This constraint helped, but challenges remained:

The Icon Hallucination Problem

AI models love to hallucinate icon names. They’d confidently import CheckCircleIcon when the library only had CheckIcon. My solution was a dynamic icon component:

const DynamicIcon = ({ name, fallback = "QuestionMarkIcon" }) => {
  try {
    const Icon = iconLibrary[name];
    return Icon ? <Icon /> : <iconLibrary[fallback] />;
  } catch {
    return <iconLibrary[fallback] />;
  }
};

This gracefully handled hallucinations, but it was a band-aid on a larger problem.

The Error Message Challenge

Browser-rendered components produce minified errors. Asking users to copy-paste cryptic error messages felt wrong. I needed a better approach—one that would validate components before they reached the user.

The Agentic Revelation

Watching tools like Cursor and other agentic development environments, I realized what was missing: iteration. These tools don’t just generate code once—they refine it, test it, and fix errors automatically.

My one-shot approach was like asking a developer to write perfect code on the first try with no ability to test or debug. Real developers iterate, and AI should too.

“I want to replicate the agentic experience I’m seeing in development tools, but in the browser for end users.”

Building the Solution: Claude Code Service

The solution became clear: build a dedicated service that could:

Generate components iteratively - Up to 10 turns of refinement
Validate before delivery - Catch errors before users see them
Learn from examples - Reference a catalog of working components
Run asynchronously - Let the AI take its time to get things right

Architecture Overview

┌─────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Client    │────▶│  Cloud Run API  │────▶│  Claude Code    │
│  (Browser)  │     │   (Express.js)  │     │     (SDK)       │
└─────────────┘     └─────────────────┘     └─────────────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │  Cloud Storage  │
                    │   (File Store)  │
                    └─────────────────┘

Key Implementation Details

The service is deliberately simple:

// Start a job
POST /jobs
{
  "prompt": "Create a todo list with priority levels and due dates"
}

// Check status
GET /jobs/:jobId
{
  "status": "processing",
  "turns": 3,
  "currentStep": "Validating component syntax..."
}

// Get the result
GET /jobs/:jobId/file
// Returns the validated React component

The magic happens in the Claude Code configuration:

const query = `
Generate a React component based on this request: ${prompt}
Rules:
- Self-contained with all imports
- Use only standard React hooks
- Include error boundaries
- Validate all props
- Test the component logic
`;

const { status } = await client.codeQuery({
  query,
  config: { maxTurns: 10 }
});

The Catalog System

One breakthrough was adding a /catalog directory of example components. Claude Code can search through these for patterns and best practices:

catalog/
├── TodoWithFilters.tsx
├── PriorityTodoList.tsx
├── KanbanBoard.tsx
└── ShoppingList.tsx

This gives the AI concrete examples to reference, dramatically improving the quality and consistency of generated components.

Results and Learnings

The shift from one-shot to agentic generation has been transformative:

Success rate: From ~60% to >95% for working components
Error handling: Users never see React errors
Iteration time: 10-30 seconds for a fully validated component
User confidence: No more “will this work?” anxiety

What’s Next

The MVP is deliberately minimal, but the foundation enables exciting possibilities:

Real-time updates via WebSockets
Component testing with automated test generation
Version history to track iterations
Collaborative editing for team environments
Performance optimization through caching

Reflections on AI Development

This journey taught me several lessons about building with AI:

One-shot is rarely enough - Real-world applications need iteration
Context is king - Examples and references dramatically improve output
Validation matters - Never trust, always verify
User experience trumps technology - Hide complexity, deliver value
Simple infrastructure wins - Start minimal, expand based on needs

The evolution from artifacts to agents represents a fundamental shift in how we think about AI-assisted development. It’s not about getting it right the first time—it’s about creating systems that can refine, improve, and deliver reliable results.

Try It Yourself

The Cloud Run service code is designed to be minimal and deployable. With just an Anthropic API key and a Google Cloud project, you can have your own agentic component generator running in minutes.

Code and Resources

The implementation focused on simplicity and reliability:

No complex databases - Just Cloud Storage for persistence
No Redis queues - Simple in-memory job tracking
No authentication - Add it when you need it
No fancy frameworks - Just Express.js and the Claude SDK

This minimal approach means you can understand the entire system quickly and extend it based on your specific needs. Sometimes the best architecture is the one you don’t have to explain.

The journey from Claude artifacts to a production-ready agentic system shows how AI development is evolving. We’re moving beyond simple prompts to building intelligent systems that can iterate, learn, and deliver consistent results. The future isn’t just about AI that can code—it’s about AI that codes like a developer: iteratively, thoughtfully, and with the end user always in mind.