Value stream analysis is conducted typically for one specific product or service family at a time. In order to identify and distinguish families, lean practitioners use what is called a product family analysis matrix (a.k.a. product quantity process matrix (PQPr)). Many times the families can be easily discerned once the matrix is populated, other times, it is more difficult. The application of a dendogram or binary sort, can be helpful in these situations.

Value stream analysis and, with it, flow kaizen, is central to any lean transformation and is specific to product or service families. This means typically one map set, both current and future state, per family.

Clearly, it is important that the lean practitioner properly identify and discriminate between families before beginning any value stream mapping effort. Families are represented by products or services that share, more or less, the same common processing steps. The product family analysis matrix, also known as a product family matrix, process routing matrix or product-quantity-process (PQPr) matrix is a common tool for product family identification.

At their most basic levels, the matrices reflect the product or process offerings on the y-axis and the process steps on the upper x-axis (Figure 1). The intersection between the x’s and y’s is indicated by an “x” or checkmark, a zero or one, or the frequency/quantity (i.e., the products annual volume through a given process). The intersections, or “clusters,” represent product family candidates.

There are three methods of identifying product clusters – sorting by inspection; cluster identification using dendrograms; and binary sorting. We’ll cover these topics in future postings.

Born February 29, 1912 Taichii Ohno envisioned a way of working that would evolve into the Toyota Production System (TPS), now known widely as Lean. Several key elements of TPS include takt (interval or pace of customer demand), muda (the elimination of waste), jidoka (the injection of quality) and kanban (the cards used as part of a system of just-in-time stock control).

Ohno was born in Dalian, in eastern China, he joined Toyota Automatic Loom Works in 1932. In 1943 Ohno switched to work as a production engineer for the Toyota car company, at a time when its productivity was far below that of America’s mighty Detroit car industry. In 1956 Ohno visited US automobile plants, but his most important discovery was the supermarket, something unknown in Japan at the time. He marveled at the way customers chose exactly what they wanted and in the quantities that they wanted, and the way that supermarkets supplied merchandise in a simple, efficient, and timely manner. He took these observations back and incorporated then in further enhancements to the TPS kanban system.

[1] History of Toyota Motor Manufacturing Kentucky (TMMK)
[2] Guru: Taiichi Ohno, in The Economist, July 3, 2009

How many different ways can this work schedule be filled out?

How many different ways are there to arrange your books in a bookshelf?

These are all examples of combinations and permutations. And knowing how to calculate them is a helpful tool for decision making. The basic equations are:

.

Permutations (with repetitions)

Example: How many different ways can this work schedule be filled out?

This is very straightforward. Suppose there are three operators (i.e. n equals 3) who have to cover of two shifts (i.e. r equals 2), then 9 possible work schedules that can be created.

Shift 1

Shift 2

1

A

A

2

A

B

3

A

C

4

B

A

5

B

B

6

B

C

7

C

A

8

C

B

9

C

C

The number of permutations is simply:

Permutations (without repetitions)

Example: How many different ways are there to arrange your books in a bookshelf, if you have 3 books (called A, B, and C)?

Again this is a straightforward question. it is easy to quickly determine that the correct answer is 6. For the first book, there are three choices, for the second there are only two choices and only one choice for the last choice. So the total number of permutations in this case is

This simple example gives insight into constructing the equation for the general case. For the first choice there are options, for the second choice there are options, and this process repeats times. This can be written as:

Combinations (with repetitions)

Example: How many different pizzas can be made? Assuming that are n different kinds of pizza and you are ordering r pizzas.

The six pizza orders (pizza pies come in flavors A, B, or C) that can be made with two pies are:

A

A

A

B

A

C

B

B

B

C

C

C

Combinations (without repetitions)

Example: How many different poker hands are there?

Pitch interval (I_{p}) can be thought of in two ways: 1) as a unit of time representing the (usually) smallest common pitch shared among a range of products, services, or transactions that are being produced, conveyed, performed, or executed by a given resource(s), and 2) as a count of the number of intervals of a common pitch over a period of time, typically a shift or day.

I_{p} often serves as the time intervals reflected in the typical design of heijunka, leveling, or scheduling boxes or boards in which instruction or withdrawal kanban are loaded within the heijunka sequence (as accommodated by actual demand).

Figure 1 captures the I_{p} math. This post is specific to products that share the same pitch. A future post will address I_{p} for products that have different pitches. Figure 2 provides some insight into heijunka box design and loading in the context of I_{p}.

Figure 1. Pitch interval formula tree

Where: T_{a} = available time for the period, typically a shift or day and expressed in seconds or minutes P = pitch for the resource(s) related to T_{a} and expressed in the same unit of time.
GCD notation represents the greatest common divisor, also known as greatest common factor or highest common factor, for a given set (a, b…) P_{n} = each non-equal pitch amongst the various products for the resource(s) related to T_{a} and expressed in the same unit of time.

Same Pitch Example:
There are three products (A, B, and C), all of which share the same 20 minute pitch. See table below for the pitch calculation.

Table 1. Pitch calculation

See Figure 2 for example heijunka box as loaded in an ABACABACAB sequence (a.k.a. heijunka cycle).

Remember, life is messy…and sometimes the math is too. The lean practitioner often needs to use his or her judgment when determining whether or not to round and how to round (up or down). Unfortunately, math rarely comes out perfect (as it magically does in most lean books). When addressing things like pitch intervals, know that rounding has practical implications. For example, rounding up the number of pitch intervals may require either the shortening of the pitch (remember I_{p} x P should closely approximate T_{a}) equally across all or some intervals. In the example above, by rounding up to 21 intervals, we are artificially speeding up takt time by two seconds per unit, and thus our pitch by 20 seconds. The cumulative effect is that the last pitch interval of the day actually finishes up 5 minutes early. The lean practitioner has some options here: 1) don’t sweat it and do nothing about the 5 minutes early thing, 2) use a 21 minute pitch after every third interval, or 3) tinker with something else. As you may discern from Figure 2, we think option one is fine. Know that the road to figuring out the best option often requires a good bit of PDCA.

Life happens. Sometimes it rains on wedding days. Sometimes a supplier misses a ship date, and sometimes there are glitches in our processes. The challenge for lean practitioners is what to do about this – especially since all of these things could happen.

One good answer is to constuct a Failure Modes and Effect Analysis (FMEA). This can be done for a process, or a new or existing product. The basic concept is to identify all of the potential failure modes and then rank them according to risk. Risk Priority Numbers (RPNs) are used to assess the risk of each failure mode.

A Risk Priority Number (RPN) is calculated as follows:

That is, the risk associated with any failure mode is comprised of three parts:

Severity – The more severe the failure mode, the higher the risk.

Likelihood – The more likely the failure mode, the higher the risk.

Detectability – The harder it is to detect and control the causes and conditions of a given failure mode, the higher the risk.

Usually each of these factors is graded on a 0 to 10 scale. For Severity and Likelihood, the more severe the consequences or the more likely the failure mode is to occur, the higher the score. With Detectability, the harder the cause is to detect and control, the higher the score.

Once Risk Priority numbers have been established for all of the failure modes, the next step is to develop a risk mitigation strategy for the high risk items. That way these risks can be addressed proactively. The risk mitigation strategies are given a predictive Risk Priority Number (pRPN) which estimates what the risk priority number will be once the risk mitigation strategy has been put in place. These risk priority numbers can be helpful for identifying which risk mitigation strategies to implement first.

It all started when my colleague and I noted that we had used the same data to calculate Cpk, but ended up with different results. This led us down an Alice in Wonderland-like path of Google searching, Wikipedia reading, and blogosphere scanning.

After several days of investigation, we determined that there was no consensus on how to properly calculate estimated standard deviation.

Knowing that there must be a misunderstanding and that this should be purely an effort based on science, we decided to get to the bottom of this. My colleague and I decided that there was a need for a simple, accurate tool that anyone could use and afford. We wanted to break the economic and educational barriers that got in the way of conducting needed process capability studies. More on that in a bit.

Our investigation revealed that the biggest confusion out there was with the following two symbols.

Or, regular sample standard deviation vs. estimated standard deviation (sporting that little hat over the sigma).

Regular sample standard deviation is used to calculate process performance, or Pp/Ppk. It is based on the actual data that your process has actually proven to perform in current reality (overall performance).

Estimated standard deviation is used to calculate process capability, or Cp/Cpk. In other words, what is your process capable of when at its current “best” state (within subgroups)?

This leads us to the simple tool that I referenced above.

There’s an App for that

The creation of “Cpk Calculator App” has been a long and winding road with a lot of research and validation (also known as PDCA). But, in the end we created a tool that automatically calculates standard deviation in 1 of 3 ways depending on data set characteristics (The biggest dilemma on the web):

1. If data is in one large group, we use the regular sample standard deviation calculation:

Many people use the calculation above to calculate standard deviation and call it Cpk, when in reality what they are calculating is Pp, or Ppk as they are not using estimated standard deviation. Ppk is definitely the more conservative of the two as it’s based on the actual standard deviation, but for whatever reason Cpk has become the more famous of the two.

And, they are often confused.

2/3. If you collect your data in subgroups, there are two preferred methods of estimating standard deviation using unbiasing constants:

Rbar / d2 is used to estimate standard deviation when subgroup size is at least two, but not more than four. The average of the subgroup ranges is divided by the d2 constant. This calculation is best when you tend to have many small sub groups of data.

The calculations shown above reflect another way to estimate standard deviation that should be used when calculating estimated standard deviation of uneven sub groups, or sub groups larger than 4 data points.

Pp, and Ppk are based on actual, “overall” performance regardless of how the data is subgrouped, and use the normal standard deviation calculation of all data (n-1). Cp and Cpk are based on variation within subgroups, and use estimated standard deviation. Cp and Cpk show statistical capability based on multiple subgroups. Without getting into too much detail on the difference in calculations, think of the estimated standard deviation as the average of all of the subgroup’s standard deviations, and ‘regular’ standard deviation as the standard deviation of all data collected.

Cp (process capability). The amount of variation that you have versus how much variation you’re allowed based on statistical capability. It doesn’t tell you how close you are to the center, but it tells you the range of variation. Note that nowhere in this formula is the average of your actual data referenced.

Cpk (process capability index). Tells you how centered your process capability range is in relation to your specification limits. This only accounts for variation within subgroups and does not account for differences between sub groups. Cpk is “potential” capability because it presumes that there is no variation between subgroups (how good you are when you’re at you best). When your Cpk and Ppk are the same, it shows that your process is in statistical control.

Pp (process performance). The amount of variation that you have versus how much variation you’re allowed based on actual performance. It doesn’t tell you how close you are to the center, but it tells you the range of variation.

Ppk (process performance index).Ppk indicates how centered your process performance range is in relation to your specification limits (how good are you performing currently).

What’s a “Good” Cpk?

A Cpk of 1.00 will produce a 0.27% fail rate, or a theoretical 2,700 defects per million parts produced. A Cpk of 1.33 will produce a 0.01% fail rate, or a theoretical 100 defects per million parts produced. In reality, the Cpk that is acceptable depends on your particular industry standard. As a rule of thumb a Cpk of 1.33 is traditionally considered a minimum standard.

Confidence Interval

Confidence interval shows the statistical range of your capability (Cpk) based on sample size. Basically the larger the sample size, the tighter the range. The confidence interval shows that there is an x% confidence that your capability is within “a” and “b.” The higher the confidence interval, the wider the range.

For example, if we report a Cpk of 1.26, what we are really saying is something like, “I don’t know the true Cpk, but based on a sample of n=145, I am 95% confident that it is between 1.10, and 1.41 Cpk.” This tells us that the larger your sample size, the tighter the range. Therefore, the more data you collect, the more accurate your measurement, and the more accurate your actual process capability, or performance. In most calculations 90 or 95% confidence is required, but confidence interval can be calculated at any %, just remember the fewer data points, the wider the confidence interval range.

Real Life Application

During the creation and testing of the Cpk Calculator App, we had the opportunity to test every scenario that we encountered in the real world. One of the real life scenarios that we ran into included a routine hourly check of a “widget’s” thickness that determined that the part was out of specification. After 15 minutes of data collection and testing on the floor using the app, we found that our process that normally had a Cpk of 1.3, now reflected a Cpk of 0.80. This led us to discover that the cutting machine cycle time had been reduced in an attempt to improve throughput and productivity by the machine operator. With that in mind, we reset the machine to original settings to confirm that we had found the root cause. Subsequently, we used the Cpk calculator as we gradually reduced cycle time as much as possible without negatively affecting process capability. In the end, we confirmed root cause, and implemented a new and improved cycle time for the piece of equipment.

________________________________________________________ This post was authored by Levi McKenzie, a continuous improvement kind of guy who enjoys exploring new facets of lean methodology, facts, data, and making things faster and better. Levi Is a co-founder of Brown Belt Institute, a mobile app development companythat focuses on providing useful lean six sigma tools that are inexpensive and easy to use for the “blue collar brown belt” sector.

Did first-class passengers on the Titanic get preferential treatment during the evacuation? James Cameron’s movie certainly seems to suggest so, but let’s look at the data.

Survived

Died

First class

203

122

Third class

178

528

The data is compelling. 75% of the third-class passengers perished compared to only 38% for the first-class passengers. The statistically inclined among you might run a Chi-squared test to confirm these observations, and not surprisingly the results will be statistically significant. The difference in the proportion of first-class passengers that perished versus the proportion of third-class passengers that perished is unlikely to have occurred by chance. Well, that must be the end of the story. An analyst might create some pie charts or some stacked bar charts to illustrate the results, but that is the end of the story…right???…not quite.

Consider a more detailed breakdown of the same data:

Survived

Died

First class

Men

57

118

Women and children

146

4

Third class

Men

75

387

Women and children

103

141

Now the data suggests a possible different story. With this data, it is now evident that 79% of the men died, compared to 37% of the women and children. So which was it? Was it class privilege or chivalry? Or was it something else?

These are questions of history, and there are many lessons to be learned from history. And there is much to be learned from the over simplistic analysis that suggested the cause was class privilege:

Just because the numbers are overwhelming, it doesn’t mean your hypothesis is true.

When analyzing data, t is wise to remember the words of Sherlock Holmes; “when you have eliminated the impossible, whatever remains, however improbable, must be the truth”.

The failure in the initial analysis was not a failure of mathematics or statistics, but a failure of the analyst. They failed to consider other alternatives. Richard Feynman described this error in his essay: Cargo Cult Science, in which he recommends, among other things, that we should not fool ourselves and we should not fool others. We accomplish these goals with a profound honesty, by challenging ourselves to look for other explanations, and by carefully performing and re-performing experiments. And while the systems that we study may be more complex and more dynamic than the systems that a physicist studies, there is no excuse for cargo cult statistics.

Days inventory on hand, also known as a days of supply, along with inventory turns, is a measure of inventory investment. While turns may be one of the most basic measures of an organization’s “leanness,” days inventory on hand perhaps helps lean practitioners better visualize the magnitude of (excess) inventory and its impact on a value stream’s lead time. This is especially applicable when the notion of inventory extends beyond parts and finished goods to transactional (i.e., files, contracts, etc.) and healthcare (i.e., tests, reports, etc.) value streams.

There are two basic approaches to calculate days inventory on hand: 1) divide the number of days that the value stream is operating by the inventory turns, or 2) divide average inventory by daily usage. Mathematically, it gets you to the same place. It is often more actionable and meaningful if the days inventory on hand is not only calculated with total inventory, but also by raw material and finished goods and even by other inventory sub-categories.

Like with many of the Lean Math entries, some math convention considerations bear discussion:

Number of days. Financial folks will often use 365 or 360 days as their nominator. That is reflective of reality IF the value stream is in operation virtually every day of the year, like Walmart®. However, most value streams are working something less than that – often 250 days a year or so. The purpose of the measure is to provide insight into how much cholesterol is really accumulating in the value stream. Use a number that mirrors the value stream’s available days during the year or use the second basic approach of dividing average inventory by daily usage (of course, apply the same logic when determining daily usage). Bottom line – understand your math convention and those against whom you might be benchmarking.

Inventory value versus inventory units. Inventory value is often used to calculate inventory turns and, as reflected in the separate inventory turns entry, it has its pros and cons. A unit-based approach does eliminate much of the “noise” that inventory valuation methods and high mix may introduce. Furthermore, units, especially in the area of finished goods, are what the customer “feels,” and the value stream experiences. See below for examples using value and units.

Full time equivalent(s), commonly referred to as FTE(s), represents the number of equivalent employees working full time. One full time equivalent is equivalent to one employee working full time. Typically, FTEs are measured to one or two decimal points.
FTEs are NOT people. Rather, FTEs are a ratio of worked time, within a specific scope, like a department, and the number of working hours during a given period of time. As such, an FTE often does not equate to the number of employees actually on staff.
FTEs are a mathematical tool used to compare and help understand workloads, and the fragmentation of those workloads, across processes, teams, departments, value streams, and businesses. This is especially relevant in environments where employees work multiple processes, are shared among multiple teams, work odd schedules, and/or work part time. Managers use FTE insight for things like:

calculating current and future state staffing requirements,

calculating real or potential labor savings from process improvement,

understanding resource requirements for projects, and

normalizing staff count for the purpose of generating performance metrics such as revenue per person and productivity per person per hour (where the FTE is used as a mathematical proxy for “person”)

Some FTE math follows.

As with any measure (no pun intended) of Lean Math, there are at least a few things to be mindful of:

Just because the math “works” doesn’t mean that FTE-based conclusions are sound. As previously stated, FTEs are NOT people, they are ratios. Real people do the work. As lean practitioners try to understand improvement opportunities, and the chance, for example, to redeploy a worker to a process that needs additional capacity, they must pragmatically and respectfully consider things like how the work is/will be designed, standardized (which means understanding steps, sequence, and cycle time), balanced among team members, “levelized” in the context of volume and mix, apportioned given cross-training gaps and gap closure opportunities and limitations, political dynamics.

The math required to determine optimal staffing is specific to balancing line staffing for a given product or service (or family of products or service) and is rarely the same thing as FTEs. This is, among other things, because optimal staffing can be calculated for multiple “playbook” scenarios based upon different demand rates and, simply put, optimal staffing is often different than actual staffing.