Taming the baratsuki

When I am asked to describe what changed when we added kanbans to our work practices, I usually start explaining the difference between driving the pace in the office with a due date (« this piece of work should be done by Friday afternoon ») or a start date (« you should start working on this piece of work on Tuesday »).

In the first context, the not-so-subtle advice is you can work overtime and/or over the week-end to finish it off. The manager will only see it on his desk on the following Monday early morning anyway. Unleashing the compliance spirit if necessary, trying to find to whom the blame can be attributed.

In the second context, an andon will be triggered as soon as Tuesday if work isn’t started. The chain of help will be activated and the manager should be at hand, trying to understand the problem, its sources and its consequences. And then trying to get everything back in shape before the customer is impacted : he has until the Friday afternoon to mitigate the problem.

The role of the manager changes completely. The « power over » is gone. In the words of Masaaki Imai in his Strategic Kaizen, he becomes an entropy fighter:

Shop floors are fraught with abnormalities that disrupt smooth flow. Every time it happens, it is management’s task to bring it back to normalcy. Even when no problem seems to exist and everything seems to be under control, one should be reminded that anything and everything that goes on the shop floor is destined to deteriorate on its own if left alone.

And since no Lean concept would be entirely adequate without its Japanese word, Masaaki Imai introduces baratsuki (ばらつき), ie. scattering or dispersion. In the Nassim Nicholas Taleb’s tradition of Via Negativa, it becomes taming the baratsuki.

Baratsuki Control

The Lean tools become a mean to ensure the entire team can produce good quality work. And kanbans are no exceptions.

Missed a kanban ? Asking why five times...

Me — Did you manage to make any progress last night on your first deployment for the iOS app?

X — Unfortunately not. And I’m pretty sure I'm not going to be able to have another go today.

Me — But I thought your code was fine and your team leader was happy with the results…

X — Unfortunately Apple won’t let us deploy the end result to the App Store. Apparently because the version numbers for xcode and the sdk are too low. Anyway that’s the current blocking message. And it's always possible that a new error message will appear later in the process.

Me — Can you remind me why we’re discovering this problem when there are already 5 kanbans to deploy?

X — Because I've never put any iOS related stuff into production before, so this is the first time I've seen these errors. This exact setup (hardware and software) worked last year with the last intern but apparently Apple decided it wasn’t good enough now.

Me — But why are we discovering this today? When there are already 5 kanbans that are "finished". Why wasn't it discovered just after the first ticket was ready for deployment?

X — Because when a kanban is finished, I just push it to the main repository and stop there. It’s very different from the continuous deployment we use for the web related stuff. Since those 5 improvements were quite small, I thought I would finish them first and then send everything live with a spoonful. And that’s when the errors appeared.

Me — I'm not sure why you didn't pushed your iOS work in production after each kanban.

X — Well I'm sure it's not a good idea to have the App Store update the application 5 times in 2 days. That's why I was aiming for 1 or 2 updates per week : it makes more sense.

Me — Let’s deep dive a little bit here : how many updates could have been possible this week?

X — Well I did the 5 kanbans in 3 days so the first update would have been on Tuesday with two small features, the second on Wednesday with another pair of features and the third on Thursday with the last one.

Me — So imagine if we had tried to deploy the first ticket on Tuesday. What do you think would have happened next? You can do the exercise with two scenarios in mind: 1/ everything goes smoothly and 2/ reality bites and it stalls because of a configuration mismatch as we’ve just seen.

X — I see where you’re trying to go… In the first scenario, we know on Tuesday evening that everything is going according to plan and we can decide with the marketing team or even with our users if we deploy a second time on Wednesday, on Thursday or even on the following Monday. In the second scenario, we stumble on the actual errors early in the week but we can still deploy the first batch of features by Friday, possibly as soon as Wednesday.

Me — How does that sound?

X — I’m still not quite sure if it's a good idea to have two updates per week, but I'm pretty sure it's a better idea to have one update instead of none. It’s somewhat frustrating to realize that even though we've been working on this stuff all week, the end users can’t see anything at all.

The App Store deals successfully with our Opentime app
The App Store deals successfully with our Opentime app

Before using a kanban, rule 1

The first prerequisite of a kanban tool is expressed very explicitly in the book Kanban / Just-In-Time At Toyota.

Do not send defective products to the subsequent process
Making defective products means investing materials, equipment and labor in something that cannot be sold. This is the greatest waste of all.
Rule 1 before using a kanban

To illustrate this unappreciated aspect of kanban management, nothing beats a little anecdote. About ten years ago I taught a programming class to non-scientific students : they all wanted to be librarians of some sorts and with the digitalization already well underway, the university had marked out a coding course. On the third session, after an explanation of what an algorithm is, I set out to write down a simple one on the blackboard. Their following task what to translate it into actual code. After 10 minutes a girl shouts happily : « eureka, it works » and everyone else felt envious at her speed to finish this piece of work. When I got to her desk, I realized how much my teaching was off the mark : the result she was so proud of was a perfectly white screen. She had tediously and painstakingly manage to delete all the notices and warnings the compiler had thrown at her after copying word for word my pseudo-code on the blackboard. And while her program did literally nothing at all, she felt it was OK.

To be able to make the distinction, the operator (or the developer or the student in this case) needs to be able to get some help, on the spot. In a classroom, it should be easy : the teacher is at hand.

In Lean, the way to get some help is the Andon : a simple mechanism to bring the manager to the worker’s Gemba (usually a sound or a light turned on), forcing him to have a conversation and engaging him in the task of undertaking measures against recurrence.

That’s when the learning curve can kick in : when you don’t know if your work is OK or KO, it’s impossible to improve. But when you’ve learn to distinguish the two, you can start experimenting at your own pace and gradually get better at it. Until this chain of help is materialized, the worker is condemned to subpar work or to luck.

The kanban is kicking me back to reality

X — Isn’t the kanban a pain?

Me — It sure can be… Like right now : there’s a kanban card publish a new article on the very top of my own board. And I haven’t got any at hand : I was too busy skiing last week during a well deserved holiday.

Back from the Alps

X — But it’s not just you, I’ve always felt it was too difficult for anyone : it’s like being pinned to the wall and not being able to escape.

Me — My sensei would say it’s the entire point. And he would add « make sure you pull the andon cord: it’s there precisely for that moment ».

X — But I guess that being your own boss, you don’t have any andon cord to look out to. And you can’t rely on a team leader ready to help you out either.

Me — Obviously. I’m back to WWSD - aka What Would Sensei Do. First don’t let the customer down, ever : just do what you have to do, stop complaining and write the article. Then think hard and through about how you can avoid the pain next time.

X — And…

Me — Well, someone is reading the aforementioned article (ie. this one). And it was written on the spot in less than 60 minutes. Maybe I can let my stock go down to 0 from time to time and feel the adrenaline. Maybe I can prepare rough drafts to rely on and feel the serenity. It’s all about tradeoffs.

Red bins for the digital world

X — I know Lean started in manufacturing. Why on earth are you trying to use this system in the digital world?

Me — That’s simple enough, because Lean provides a full set of tools to frame the world differently. And Toyota (among others) has shown they can be useful.

— Example?

Red bins, the place where operators put defective parts.

— Why not use a regular scrap bin directly? We’re piling up bad stuff : surely a case of muda.

— Not so fast, we want built-in quality. So you need to make waste visible and to train operators detecting good and bad parts: when a part is put in the red bin, it’s actually an opportunity to check with the team or line manager if the part is indeed not OK.

— Alright. But why not simply have one red bin per plant. Surely it should be enough for highly automated and perfectly functional plants.

— That’s because those plants don’t exist! There’s always some deviation and you want to start your investigation as close as possible to the crime scene both in time and in space. Remember: whenever you have unexpected things happening, you have an opportunity to learn.

— But we’re back in hard stuff zone. I thought we’d be talking about code and developers and digital stuff.

— We’re getting there, we do use red bins in our digital world as well. But first we needed to ask, who’s the operator here.

— It can’t be the computer engineers: surely they should know better!

— Remember we’re trying to project a vision of the world in another universe, so we need to keep an open mind. But if you consider we’re operating a SaaS product, then it’s indeed the server that’s doing the valuable work to the end user, so you can just as well assume the operator is the actual server. And each server has a simple built-in mechanism to identify bad requests : the log file.

— Are you actually considering the log file as a red bin?

— A layer just above it: we filter out lines we’ve found irrelevant over the years. And aim for the lowest number of errors possible, i.e. zero, from there.

— With Pareto tables to manage the overwhelming onslaught of notices and warning.

— Maybe. At the beginning. To clean up the existing mess. What you really want is to be fast at patching: so you really need close interactive loops between developers and sysadmins. This gap is one of the space of the digital world. And one of the insights leading to DevOps by the way.

— You mean DevOps is related to Lean? I thought it was more Agile-like.

— Please don’t make me angry. I was on the XP bandwagon when it lost all its momentum to Scrum, that’s when I first jumped ship actually and scrapped both sprints and retrospectives.

— Sorry, didn’t want to make it personal. Going back to new entries in the log file, are you implying you’re fixing every error as they come in? It surely sounds a hell of a job to me.

— It could also be an entry point to the holy grail of one piece flow, aka sell one, make one.

— With the server creating a ticket in the bug tracker for each error? At our place, they would get a Won’t fix / Didn’t replicate answer 99 times out of 100.

— Here, we know that every time we postpone a fix, we’re faced with two problems : 1/ the original problem of course and 2/ the difficulty of recreating the initial conditions.

X, interrupting — Again we’re back to lead time stuff.

Me, mumbling — Funny how we switched from Jidoka to Just-in-Time. Not quite sure what to make of it though…

  • page
  • 1