From a Box to an Intelligence Layer

Building something for myself and discovering what actually needs fixing.

Feb 14, 2026

I’ve had this thought for a while now.

I’ve been building products for others for more than 8 years, almost 9, very soon. Maybe it’s time to start building something for myself. It doesn’t have to be something big or complex, but it should make me genuinely happy while building it.

Even though we’re all trying to speed up everything and launch our next app in 45 minutes, I’m not sure I want that. Maybe that’s one of the reasons I put this idea on hold for a while.

I started contouring this idea months ago and after questioning it a lot, I left it sitting in a corner.

But a note I wrote last week brought it back to me.

Before going too far, I needed to know: would this actually work?

What I’m Actually Building

I’m building a tool that lets teams capture thoughts, ideas and opinions throughout the sprint and automatically synthesizes it before the retrospective.

Not a digital whiteboard.
Not another sticky note clone.

A lightweight space to drop notes in real time, paired with an intelligence layer that surfaces themes, patterns, and suggested actions before the meeting even starts.

The goal isn’t to replace retros.

It’s to make them sharper.

To shift time away from clustering and organizing, and toward actual problem-solving.

That’s the direction at least.

But it didn’t start there.

The First Idea Was Much Simpler

The original idea wasn’t AI.

It was a box.

A simple space where I could drop notes during the sprint. Not during retro. Not when everyone else was typing. Just when something happened.

→ A deployment that felt chaotic.
→ A code review that worked unusually well.
→ A moment of friction I couldn’t quite name yet.

I didn’t want structure. I didn’t want templates. I wanted something low-pressure. A place to capture thoughts while they were still fresh, before they got rationalized away.

Because that’s what happens.

By retro time, we remember the catastrophic failures and whatever happened one or two days ago the most. But what about the subtle patterns that could actually improve our workflow? What about what happened in the middle of the sprint? Those require a lot of effort to be remembered and tend to disappear most of the times.

And I’ve sat through enough retros to know this isn’t a facilitation problem.

It’s a memory problem.

The Thing About Retros

I didn’t start this to create another retro tool.

I started because I kept noticing the same pattern across multiple teams and companies over the years: twenty minutes of sticky note writing, ten minutes of clustering and organizing, ten minutes deciding what to discuss, and maybe, if we’re lucky, ten - fifteen minutes actually talking about solutions.

There can be, of course, different structures of retro sessions, but most of the time, the real value comes from the last ten - fifteen minutes. Everything else is ceremony.

And the problem isn’t the meeting.

It’s that we wait until the meeting to do all the thinking.

Throughout a sprint, things happen.

→ A CI pipeline breaks for the third time.

→ Someone improves the review process.

→ You’ve fixed a critical bug.

→ A deployment creates tension.

→ A miscommunication led to a rollback of an entire feature.

We notice these moments.

Then we forget some of them.

Stress-Testing the Idea

At first, I thought the box alone would fix this.

Then I started questioning it.

Would people actually use a sprint note box?
Would I?

Or would it become another well-intentioned tool that only gets attention five minutes before the meeting?

And even if notes were captured, they would still be messy. Emotional. Context-light. Duplicated. Contradictory.

That’s when I realized something important.

Raw notes are just ideas.

The real value wouldn’t be in capturing them.

It would be in synthesizing them.

Why I Built a POC

Before designing flows or thinking about integrations, I needed to validate one thing:

If I dump messy sprint notes into an AI model, can it extract something teams would actually discuss?

So I simulated a sprint.

I created realistic notes from different “team members.” Mixed technical details with frustration. Added repetition on purpose.

Here’s what the raw notes looked like:

Then I wired up a simple AI integration.

A single API route calling Claude with a structured prompt. Nothing fancy. Just clear instructions about grouping themes, citing sources, and suggesting concrete actions.

Instead of manually testing prompts in a playground, I embedded the synthesis directly into the app. Click “Generate Summary,” send the notes to the model, render the result.

This is what came back:

The result wasn’t perfect.

But it wasn’t generic either.

It can definitely be improved.

But it delivered something feasible:

It grouped deployment delays into a clear theme.
It connected context switching across different roles.
It separated wins from friction.
It suggested concrete actions.

That’s when I knew the intelligence layer was viable.

Not because it looked impressive.

Because it felt usable and useful.

What Actually Needs Intelligence

Most retro tools are either digital whiteboards or structured templates. They give you a place to write sticky notes.

They don’t give you insight.

The intelligence layer doesn’t really exist yet.

That’s the layer I validated in the POC.

And that’s when this stopped being “a box for notes” and started becoming something else.

The Tech Stack (And Why It Matters)

I went with Next.js, Supabase, and Claude API.

Not exotic, but deliberate.

Next.js because I wanted one codebase for frontend and API routes. No context switching between services.

Supabase because PostgreSQL with built-in auth and real-time capabilities meant I could focus on the product, not infrastructure.

And the AI layer sits exactly where it should: behind a single API route.

No orchestration framework.
No complex agent setup.
Just structured input, structured output, and clear boundaries.

The interesting part wasn’t choosing a model.

It was designing the contract between human input and structured team insight.

→ The system prompt defines expectations.
→ The output format enforces clarity.
→ The UI renders it without decoration.

The model will evolve, but the core idea will remain the same - a shared space for conversation.

What I Learned About Product Decisions

The first version was bare-bones: add notes, generate synthesis, done.

Then I started thinking more about it.

The tool shouldn’t replace retros. It should enhance them.

I briefly considered going fully async. Skip the meeting. Just read the synthesis and vote on action items.

More efficient.

But, in my opinion, wrong.

Retros aren’t just about identifying problems. They’re about psychological safety. Shared context. The moment someone says, “I felt that too.”

You can’t automate that.

So the product evolved.

Use AI to eliminate the busywork.
But keep the human conversation.

Before: 40 minutes writing and grouping notes, 20 minutes discussion
After: 10 minutes writing additional notes, 30 - 40 minutes discussion .

That’s the goal.

Where This Goes Next

Phase 1 validated that AI synthesis works.

Phase 2 adds the template system and polish. Better synthesis. Different themes for different team preferences. Better rendering. Better mobile experience.

Phase 3 brings third party tools integration. Capture notes where teams already work. Post synthesis before the meeting.

Phase 4 gets more interesting: tracking patterns across sprints. Identifying recurring issues that never get resolved. Showing whether action items actually get done.

But right now, I’m focused on one question:

Do team members actually capture notes during the sprint, or do they still wait until retro time?

Because that behavior change matters more than any feature.

What This Shift Means to Me

This isn’t my first side project.

But it’s the first one I walked away from and came back to.

That pause mattered.

The first time, I was solving a problem I understood.
The second time, I was asking better questions.

Not just “Can AI improve retros?”
But “What part of retros actually needs fixing?”

Somewhere along the way, I stopped chasing something technically impressive and started focusing on something genuinely useful.

RetroBox sits in that space now.

Smart enough to handle the busywork.
Humble enough to know that AI can’t replace a team talking to each other.

I’ll keep building.
And I’ll keep documenting what I learn.

Until next time,
Stefania

Thanks for sticking around until the end! If you found this post helpful, I’d appreciate it if you’d share it. 🫶

Currently in Phase 2. If you want to follow along or try it when it’s ready, I’ll share updates here.

Articles from the ♻️ Knowledge seeks community 🫶 collection: https://stefsdevnotes.substack.com/t/knowledgeseekscommunity

Articles from the ✨ Dev Shorts collection:

https://stefsdevnotes.substack.com/t/frontendshorts

Articles from 🚀 The Future of API Design series:

https://stefsdevnotes.substack.com/t/futureofapidesign

👋 Get in touch

Feel free to reach out to me, here, on Substack or on LinkedIn.

Discussion about this post

Ready for more?