From Fragile to Flexible: Rebuilding Our Slotting Brain from Scratch

Written by Evgenii KorobkovJul 29, 2025 07:1123 min read

🧩 Introduction

Building the best milkman on Earth requires thinking fast. You come up with a bold idea, trial it with a PoC, convince the stakeholders – and bam! Your service is now helping hundreds of people across dozens of warehouses decide where to put each product.

That’s how the original Slotting Service (SLS) came to life: a system designed to assist with making thousands of tiny decisions every day, all of which add up to something huge. We figure out where each product should live within a warehouse so that warehouse workers can grab it faster, restock it smarter, and move through their shifts with fewer steps and less hassle. Whether it’s placing fast-moving items close to the outbound gates or keeping heavy items at waist height to avoid injuries, every decision is made with purpose. It’s all about speed, safety, and efficiency – and SLS became the brain behind it.

As the years passed, the service began to show its age. It started to resemble an old man – once sharp and nimble, now slower, fragile, and set in his ways. It groaned under pressure when more data came in. A simple code change could throw it off balance. It was no longer fit to keep up with the demands of a rapidly evolving warehouse landscape. For example, implementing a new capacity calculation strategy for a subset of products required careful tweaks within a thousand lines of code module consisting of all potential options mixed together.

Developers began to feel the burden. Investigating a bug meant navigating a maze of assumptions, legacy logic, and unclear boundaries. The service had become not only fragile but frustrating.

*We mean it when we say that the SLS was an old and slow service*

Finally, the team made its case, and leadership agreed: it was time for a new service. Hooray! Pop the champagne!

But after the celebration, a sobering question emerged:

We’re building a new service. Now what?

In this post, we’ll walk through how we approached designing the new Warehouse Slotting Service (WSLS): the decisions we made, the traps we avoided (or fell into), and what we learned along the way.

*More shiny on the surface, much more flexible behind the scenes. A sneak peak to the end result of WSLS*

🌀 The Design Spiral

Designing WSLS wasn’t straightforward. Initially we believed that it would be a simple task. After all, we had a working slotting service already. This was just a cleaner, more flexible version, right?
That illusion lasted about a week.

📦 When the Domain Fought Back

We tried modeling it the straightforward way: define core entities (Article, Location, Assignment, etc.), draw some arrows, and clean it up in post.

But nothing stayed clean.

Every concept blurred into three others. For instance:

A Candidate Location wasn’t a real location – it was a possible future, maybe dependent on other futures.
A Task wasn’t a work unit – it was a container for trees of dependency-ordered moves.
The “same” warehouse had multiple simultaneous truths: what planners believed, what the main Warehouse Management System held, and what a human walking the aisles would see.

We weren’t designing a system. We were negotiating ontologies.

Each time we thought we’d pinned things down, new edge cases or cross-cutting concerns would emerge. To make progress, we started drafting proposals – rough cuts of what individual services might look like, how they might relate, and where boundaries could sit. Each draft clarified some parts while raising new questions about ownership, state, and sequencing. The more we iterated, the more we realized how much we didn’t know yet.

So we changed tactics.

One of the early sketches of service interaction within the system. Guess how much of it made it to production?*

🐣 The Birdhouse Breakthrough

When regular meetings stopped being helpful, we literally left the building. If the problem was stuck in our heads, maybe the solution needed a change of air. We took a day off from the usual rhythm and drove out to a quiet spot somewhere in the countryside near Amsterdam. The place was modern enough, with big windows and clean whiteboards, but the air conditioning picked the worst possible day to give up. By mid-morning, it felt more like a greenhouse than a meeting room. Still, it had what we needed: four walls, a bit of quiet, and no distractions.

We split into pairs and revisited the services we’d loosely sketched in earlier sessions. This time, each pair took ownership of a single service. It wasn’t a pure blank state. We carried baggage from earlier brainstorms – assumptions, mental models, rough ideas – and poured them out onto the tabula rasa of each service. The goal was to extend our knowledge of the services and understand where their boundaries really lay. It was structured brainstorming with permission to forget just enough to see things differently.

*How it felt explaining your service to the peers*

Somewhere between the heat, the whiteboards, and a steady flow of scribbled diagrams, something clicked. The mess of overlapping concerns began to settle. Concepts that had felt blurry in isolation made more sense when framed by neighboring responsibilities. By the end of the day, we had a rough MVP of the service design and, more importantly, a shared mental map of how the system held together. It was the first time the whole thing felt buildable.

🤝 Defining the Edges

Once the fog started to lift, we still weren’t ready to build. We had emerging concepts and cleaner boundaries, but it was time to test whether they held up under pressure – not in theory, but through actual flows.

So we started working the system.

We mapped out core user journeys like loading slotting, making a slotting move, canceling changes, submitting assignments, and starting a task. If a concept didn’t show up clearly in a diagram, it probably wasn’t ready. If two services needed the same data for different reasons, we had to ask why and who should really own it.

We created sequence diagrams for each flow, reviewing and iterating on them until the handoffs felt clean. We defined each component’s role in terms of behavior and data: what it did, what it owned, and how it talked to others. We turned these into RFCs and refined them in parallel, constantly checking for friction at the seams.

From this, the responsibilities crystallized.

The Assigner service mastered the slotting states. It handled assignments from the moment of creation to execution on the floor. It provided structure and sequence, making the rest of the system feel grounded.

The Validator service enforced the rules like “don’t put glass products too high” or “don’t tear down packages if we can pick from them. It ran checks for strict correctness, soft warnings, and hard violations. This was a huge leap from the old service, where such rules lived in scattered corners, if at all. Now they shared a proper home, and we could flexibly toggle specific checks per warehouse or adjust their configuration with ease. If Assigner was the backbone, Validator was the gut check.

The Tasker service managed the choreography. It translated the updates from state to state and propagated them to downstream systems.

These services didn’t emerge fully formed. They were shaped by the flows they needed to support. A concept like “submit slotting” wasn’t just a button in the UI; it carried a trail of meaning behind it. It meant validation had to be strict, transitions atomic, and task generation downstream-safe – no edge cases left to chance.

We could finally trace a full journey from user intent to system action to physical movement. But even then, something still felt tentative. Our diagrams were cleaner. Our boundaries were tighter. But did the model actually work?

⚙️ When the Model Clicked

There was a moment when things finally began to click. Not all at once – more like puzzle pieces locking into place, one at a time. The flows made sense. The data models weren’t fighting us anymore. The roles of each service had become coherent, and we could talk about system behavior without running into mismatched assumptions every ten minutes.

The clearest signal? We could finally fully embrace this layered architecture diagram:

Each layer in it – from Angular frontend to SQL state – represented a set of contracts we could finally define, separating our concerns:

The FE team knew exactly what data to expect, and from where.
The API surface had stabilized enough to be documented and versioned.
Each route mapped cleanly to one or more service responsibilities.
Core services and resource owners could be implemented independently and tested in isolation.
Repositories respected the shape of the data models instead of working around them.
And the state layer no longer had to carry the burden of business logic.

Once this emerged, we could finally dig into writing real contracts: endpoints, input/output schemas, edge case handling, and test expectations. And perhaps more importantly, we could explain the system – to each other, to new joiners, and to stakeholders.

This was the moment we shifted from modeling reality to building with confidence.

🛠️ Tips, Tricks, and Pitfalls: What Worked and What Didn’t

Every project teaches you something. WSLS taught us a lot, sometimes the hard way. Here are the lessons we’d pass on to any team building a new service in a complex domain.

Know your users. All of them.

Our first rollout at a warehouse looked like a success. No errors, no complaints, and nothing unexpected in the logs during working hours – a perfect launch, at least on paper. But then came the evening shift.
In the old setup, 95% of usage came from product slotters – the planners working during the day. Operators on the evening shift simply interacted with the output, usually outside of the old system. In WSLS that changed: a key step in the process to preview and accept instructions before pushing changes to the floor moved inside the system. This fell directly under the warehouse operators scope, and we hadn’t accounted for them.

*When your PO is happy that your system didn’t work properly*

The oversight cost us more than a few bug fixes. We needed to escalate support, patch workflows on the fly, and rethink how we structured our release planning. What we took away from that experience was simple but lasting: understanding your core user group isn’t enough. You need to know the full cast, especially those whose use cases aren’t visible in your daily test cycle.

Brainstorms are gold. Until they go stale.

One of the most productive days on this project wasn’t spent coding or debugging. It was spent away from our desks – laptops open, diagrams flying, deep in a birdhouse offsite.

That session let us question our assumptions, explore the unknowns, and sketch out the actual shape of the system with just enough structure to stay focused, and few enough distractions to actually think. It was a breakthrough. Compare that to our earlier pattern of weekly brainstorms, which slowly lost focus and energy over time. The same topics kept resurfacing, the same ideas kept getting rehashed, and forward momentum gave way to fatigue.

We realized that the best brainstorms aren’t just about time or frequency, but timing and intent. Used well, they unlock clarity. Used routinely, they create noise. Brainstorms are like seasoning – powerful, but best used sparingly or under the hand of a great cook.

Think big. Build small.

We started with bold ideas: guided workflows on handheld scanners, a rule engine that planners could tweak on the fly, and more. These weren’t far-fetched – just a clear picture of where the system could eventually go. But good systems don’t start that way.

An MVP should prove the value, not showcase the vision. Ours used matplotlib to render instructions on how to implement the changes to assignments on the floor. This was a direct copy paste from the old service. It was painfully slow, not fit for scale, but enough to test core assumptions. That restraint paid off: six months later, with real data and confidence, we swapped in an HTML renderer that was 24x faster without breaking anything.

Ambition is healthy. But build small first, so your system can grow safely when the time comes.

Define contracts. Not just code.

At Picnic, we lean heavily on component tests. They are fast, focused, and work well with our pace of iteration. We use the human-readable format of Python’s behave framework, which not only speeds up development but also makes tests accessible to non-engineers. At least in theory.

In practice, we often chose to move fast rather than spend time with users aligning on expectations up front. Tests were written after the fact, more as safety nets than shared agreements. If we had involved stakeholders earlier and used tests as contracts, we might’ve spotted critical flow gaps sooner, like the one we missed during the warehouse rollout.

We also don’t have full end-to-end tests for the flow, and that was a deliberate choice. E2E tests come with significant complexity and cost, and not every system needs them. Observability, manual QA, and strong component test coverage carried us far. That said, as our system becomes more interconnected, we’re beginning to see this as a valuable piece of tech debt worth addressing.

We’re learning to treat tests not just as technical checks, but as tools for communication. That shift isn’t about writing more tests. It’s about writing the right ones, at the right time, with the right people.

Define your language.

It’s not about choosing Python over Java, or vice versa. It’s about aligning terminology.

A “shelf-ready article” can mean one thing to a developer, another to a warehouse planner, and something else entirely to a supply chain analyst, if there’s no shared vocabulary in place. We learned that the hard way, repeatedly misunderstanding each other despite good intentions and detailed specs.

One of the biggest improvements over the old service is that we can now “trace” the full lifecycle of assigning a product to a location: from a draft created by a planner, through a target awaiting implementation, to a factual physical placement. We went through 69 versions of a glossary page where we, among other things, tried to define “assignments” and its states as unambiguously as possible. Painful? Absolutely. But necessary. That single word touched nearly every component in the system, and without a shared definition, we were building software for three different realities.

Agreeing on language didn’t just clean up documentation. It changed how we structured our models, how we tested our rules, and even how we explained failures. Sharing a language turned into sharing ownership – and that’s when things really started to work.

Perfect is the enemy of good.

We know this one. We say it a lot. But we’re still learning how to live it.

There’s a temptation to get things “right” the first time. That pressure makes it easy to delay delivery in favor of “just one more improvement,” and to overlook debt that builds up beneath the surface. The challenge isn’t just technical, it’s cultural. The more success we have, the harder it becomes to justify shipping something rough around the edges, even when it’s safe to do so.

We’ve found ourselves struggling with tech debt lately. It piles up quietly, often disguised as small decisions or shortcuts taken in good faith. What we lack is a reliable way to measure it, something that helps us see when the cost of change is creeping up or where complexity is starting to calcify. Without that, it’s hard to have honest conversations about when to invest in cleanup, and when it’s okay to let it ride.

So we’re working on it. Not just to make things perfect – but to make it safe not to.

Do not optimize too early.

We didn’t just learn this late. We’re still learning it now.

In one of our early versions, we introduced a cache to make the system feel faster. It worked, sort of. But the moment we needed to evolve the logic underneath, we realized how tightly we had bound ourselves to a shortcut that was meant to be temporary. What started as a performance win quickly became a maintenance trap.

Looking back, the cache wasn’t the problem – the timing was. We added it before we had a full picture of how the system would be used, before the logic was stable, and before we had proper instrumentation to tell if performance was actually an issue. It was a case of building for imagined scale, not real needs.

The result was avoidable complexity, and it made future iterations harder than they needed to be. Now, we remind ourselves: if your MVP needs a cache to function, that’s not a performance problem – it’s a design problem. And optimization should follow understanding, not precede it.

Measure what matters. Early.

One of the best ways to avoid getting surprised by your system is to make it tell you how it’s doing, before someone else does.

We learned that the hard way. We shipped a seemingly harmless improvement to our logging, which quietly turned into a flood. The system started slowing down. But no alerts went off. Instead, users started reporting lag. Then came a message from our observability team: we’d blown through our entire monthly log budget in just a few hours. Turns out, the PR had turned our log stream into a firehose, and it overloaded both our system and our logging infrastructure.

That’s the thing about observability: it’s not just a nice-to-have, it’s your smoke detector. The earlier you wire up good alerts, the more you can steer with confidence, both technically and operationally. Are assignments being generated at the right rate? Are suboptimal placements decreasing over time? Is performance degrading silently under load? These aren’t questions you want to answer when the fire starts.

Don’t wait until your users notice. Let your system speak up first.

🔚 Conclusion

Building a new service isn’t just about writing code. It’s about reshaping workflows, making implicit knowledge explicit, and getting a group of people to see the same picture at the same time.

WSLS taught us that thoughtful design pays off, but only if you’re willing to stay flexible when reality shows up with a plot twist. The glossary drafts, system diagrams, and early flow mappings gave us a solid starting point. The MVP constraints (and the times we broke them) helped keep us focused.

It also wouldn’t have been possible without the solid platform foundations we stood on. The ability to spin up a Python service quickly, provision infrastructure with confidence, and rely on stable tooling meant we could focus on the hard stuff – the domain, the flows, the humans.

Rolling out WSLS was a coordinated dance: multiple warehouses, different systems, staggered testing, lots of Slack channels. But it worked. And even though it’s “just a service,” it now sits at the heart of a daily operation – helping real people work more safely and efficiently.

The biggest impact, though, may be what it’s unlocked.

In a fast-paced environment like Picnic’s – especially within the warehouse domain, where new types of warehouses and fulfillment models are constantly being explored (like our Automated Fulfilment Center) – iteration speed is everything. With WSLS in place, we’ve gained a stable foundation to try new ideas faster. Whether it’s testing alternative placement logic, refining workflows, or scaling to new formats, we’re no longer rebuilding the basics each time. We’re iterating at the business level and not just the technical one.

In the end, every service is a mini-startup. You need a vision, a team, a plan, and a launch. Success didn’t come from nailing it up front. It came from building something that could adapt, improve, and grow – just like the warehouses it serves.

*Approximately 30% of the early sketches made it to production, which I would consider a strong outcome.

From Fragile to Flexible: Rebuilding Our Slotting Brain from Scratch was originally published in Picnic Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Want to join Evgenii Korobkov in finding solutions to interesting problems?

‌