Red bins for the digital world

X — I know Lean started in manufacturing. Why on earth are you trying to use this system in the digital world?

Me — That’s simple enough, because Lean provides a full set of tools to frame the world differently. And Toyota (among others) has shown they can be useful.

— Example?

Red bins, the place where operators put defective parts.

— Why not use a regular scrap bin directly? We’re piling up bad stuff : surely a case of muda.

— Not so fast, we want built-in quality. So you need to make waste visible and to train operators detecting good and bad parts: when a part is put in the red bin, it’s actually an opportunity to check with the team or line manager if the part is indeed not OK.

— Alright. But why not simply have one red bin per plant. Surely it should be enough for highly automated and perfectly functional plants.

— That’s because those plants don’t exist! There’s always some deviation and you want to start your investigation as close as possible to the crime scene both in time and in space. Remember: whenever you have unexpected things happening, you have an opportunity to learn.

— But we’re back in hard stuff zone. I thought we’d be talking about code and developers and digital stuff.

— We’re getting there, we do use red bins in our digital world as well. But first we needed to ask, who’s the operator here.

— It can’t be the computer engineers: surely they should know better!

— Remember we’re trying to project a vision of the world in another universe, so we need to keep an open mind. But if you consider we’re operating a SaaS product, then it’s indeed the server that’s doing the valuable work to the end user, so you can just as well assume the operator is the actual server. And each server has a simple built-in mechanism to identify bad requests : the log file.

— Are you actually considering the log file as a red bin?

— A layer just above it: we filter out lines we’ve found irrelevant over the years. And aim for the lowest number of errors possible, i.e. zero, from there.

— With Pareto tables to manage the overwhelming onslaught of notices and warning.

— Maybe. At the beginning. To clean up the existing mess. What you really want is to be fast at patching: so you really need close interactive loops between developers and sysadmins. This gap is one of the space of the digital world. And one of the insights leading to DevOps by the way.

— You mean DevOps is related to Lean? I thought it was more Agile-like.

— Please don’t make me angry. I was on the XP bandwagon when it lost all its momentum to Scrum, that’s when I first jumped ship actually and scrapped both sprints and retrospectives.

— Sorry, didn’t want to make it personal. Going back to new entries in the log file, are you implying you’re fixing every error as they come in? It surely sounds a hell of a job to me.

— It could also be an entry point to the holy grail of one piece flow, aka sell one, make one.

— With the server creating a ticket in the bug tracker for each error? At our place, they would get a Won’t fix / Didn’t replicate answer 99 times out of 100.

— Here, we know that every time we postpone a fix, we’re faced with two problems : 1/ the original problem of course and 2/ the difficulty of recreating the initial conditions.

X, interrupting — Again we’re back to lead time stuff.

Me, mumbling — Funny how we switched from Jidoka to Just-in-Time. Not quite sure what to make of it though…

Kanbans, and the art of not choosing

X — Look, this kanban is really easy, I’m sure I can do it in less than fifteen minutes.

Me — I’m sure you can but the kanbans board tells you something else, doesn’t it?

— I know, it’s in second position. There’s another kanban at the top.

— And…

— I guess it’s the one I should be working on next. But the problem is I don’t know how long it’s going to take : it’s one to those « red bin » kanban, we never know if it’s a 5 seconds fix or a 2 hours deep dive into old and obscure code.

— And from which one are you going to learn more ?

— Easy… The 2 hours deep dive. The other type is usually a simple condition someone forgot about : just looking at the log trace is usually enough to have an idea of the method that needs updating.

— Is this « 5 seconds / 2 hours » categorization the same for you and the rest of the team ?

— Of course not. For example, if it’s related to Javascript, it’s usually « 5 seconds » stuff for M.

Me, laughing
— And a « 2 hours » nightmare for me.

— But what if I’m really stuck.

— We’ve talked about the other button, haven’t we ? The orange « andon », right next to « red bin ».

— But if I click on it, I’ll be interrupting somebody else…

— That’s exactly the point : making sure you can draw the attention of anybody in the company while you’re dealing with your share of potentially difficult problems or bugs, improving your skills and learning new ones along the way.

— You mean it’s a privilege to be assigned those « red bin » kanbans?

Me, smiling — I hadn’t thought of it that way, but I guess it is…

  • page
  • 1