What Osterwalder gets wrong (and right) about innovation metrics

What does the firm behind the business model canvas have to say about innovation metrics?

Strategyzer, the innovation services firm co-founded by Alexander Osterwalder, has popularized several innovation tools and methodologies (e.g., the business model canvas). So, we were intrigued to see what they had to say about innovation metrics in their most recent book The Invincible Company. What we found was a focus on some of the right problems—but also a fundamental flaw in their solutions.

Strategyzer’s Approach to Innovation Metrics—Overview

Strategyzer introduce their approach to innovation metrics by highlighting a key difference between the purpose of measurement in an innovation context versus a traditional “execution” context. Innovation metrics need to help you [determine] whether you are reducing the risk and uncertainty of new business ideas.” This contrasts with the focus, in an execution context, on tracking if a project is on-time and on-budget.

At a high level, Strategyzer point to two main jobs of innovation metrics:

  • Informing investment decisions–i.e., which new projects should be funded, and which existing projects should continue to be funded or retired.
  • Guiding innovation teams’ activities to ensure they de-risk an innovation as quickly and cost-effectively as possible.
Measure at three levels

To generate this insight, Strategyzer describe an approach to measuring innovation at three levels—the hypothesis level, the business model level, and the portfolio level (see table). Below we describe the innovation metrics Strategyzer recommend collecting at each of these three levels.


A hypothesis is any assumption that must be true for an innovation to be successful. This might include, for example, hypotheses about each of the components of the business model canvas.

Business model

An individual innovation project that, if successful, would result in a new product, service, experience, etc. being brought to market.


Any set of innovation projects that are collectively managed—could be defined by organizational structure (e.g., business unit), technology area, market segment, etc.

Hypothesis Level Metrics
Tracking progress at the hypothesis level is core to Strategyzer’s approach to measuring whether the experiments you are conducting are helping to reduce risk and uncertainty. For each hypothesis, they suggest monitoring (and recording in an experiment log and a learning log):

  • Success Metric—i.e., a technical metric, specific to this hypothesis, you use to establish whether you’ve proven it.
  • Cost—i.e., how much the experiment has cost.
  • Time Running—i.e., how long the experiment has taken (so far)
  • Confidence Level—i.e., a score between 0 and 1 measuring how confident you are that the evidence supports your conclusion.

Business Model Level Metrics
At a business model level, a composite picture of progress is provided by collating data from each of the business model’s hypotheses. For each hypothesis Cost and Confidence Level are reported and two new metrics to monitor de-risking are added:

  • Risk­—each hypothesis’ contribution to the overall risk of the project (measured in percentage terms).
  • Risk Reduction—a measure of how much your experiment(s) have contributed towards de-risking the innovation. Mathematically, Risk Reduction is the hypothesis’ Risk multiplied by the Confidence Level from your experiment.

Next, Strategyzer calculates an innovation project’s overall Innovation Risk Level. This metric quantifies the cumulative de-risking achieved by all of the individual experiments you’ve undertaken. Strategyzer calculate this metric by adding up the individual hypotheses’ Risk Reduction and subtracting that number from 100%.

Finally, the following metrics are also reported for the overall project:

  • Project Duration—i.e., the total elapsed time for the project.
  • Overall Cost—i.e., the sum of all costs across the project.
  • Expected Return—what you choose to measure to track “expected return” should fit with the innovation’s context e.g., profitability, growth potential, environment impact, lives saved, etc.

Source: Strategyzer (click here to download from their website)

Portfolio Level Metrics​
At a Portfolio Level, Strategyzer focuses on a visualization of the innovation portfolio’s Business Model metrics. The authors intend for this visual to inform resource allocation decisions—i.e., which new projects to fund and which existing projects to continue to support or kill.

Strategyzer’s recommended visualization is a standard risk versus reward matrix. They plot each innovation project’s Expected Return on the vertical axis and Innovation Risk Level on the horizontal (displayed with an inverse scale, so the highest risk projects are at left, while the lowest risk projects are at right). The authors also recommend including the following as data labels for each project on the matrix:

  • Overall Cost
  • Project Duration

The authors do not mention any aggregation or synthesis of Business Model Level metrics and/or calculation of new metrics for the Portfolio Level.

Source: Strategyzer (click here to download from their website)

Project Scorecard

Apparently unrelated to the above collection of innovation metrics, Strategyzer also introduce their Project Scorecard, which they position as a tool to both inform project selection and evaluate project progress.

The scorecard enables assessment of an innovation’s current state against a standardized set of criteria. These criteria cover each aspect of Osterwalder’s business model canvas (e.g., value proposition, key resources, revenues, etc.). Additional criteria include consideration of the innovation’s strategic fit, the scale of the opportunity and the adaptability of the innovation to changes in the external environment (e.g., competitive forces, technology trends, macroeconomic trends). Users assign their innovation a score, from 0 (worst) to 10 (best), for each criterion, based on the strength of evidence and their confidence each criterion has been proven.

Source: Strategyzer (click here to download from their website)

Strategyzer’s Approach Has Some Key Strengths…

Based on our experience designing and researching innovation metrics and measurement systems, Strategyzer’s approach has several strengths:

1. It’s anchored in “functions”—that is, each metric collected has a specific job or function, to inform a specific decision (e.g., which innovation projects should we fund?). This ensures the approach is clear, efficient, and cost-effective. We’ll come back to whether Strategyzer’s approach covers all necessary functions.

2. It uses “levels.” By explicitly separating out innovation metrics into different levels you make it easier to communicate more clearly and efficiently. Critically, Strategyzer have recognized the need for coherence across levels — that is, they don’t suggest reporting entirely different metrics at different levels.

3. It seeks to measure learning—that is, the extent to which learning (i.e., insight from experiments) is de-risking an innovation. This marks an applaudable move away from the common focus on activity metrics and other output indicators (e.g., milestones) which are either poor proxies for learning and/or don’t enable effective communication (and therefore transparency) with stakeholders outside of the project team.

4. It seeks to enable prioritization at the hypothesis level. The Risk metric helps the team identify which hypotheses contribute the most to the overall risk of the innovation. This information can guide prioritization of time and effort across different hypotheses.

…but there is a fundamental flaw

When I first looked at Strategyzer’s approach, I really liked the apparent ability to measuring de-risking at the hypothesis level (as opposed to relying on traditional milestone-based reporting) and to use the same information, synthesized, to communicate risk of the individual project—e.g., in the risk / reward visualization. Closer inspection reveals a fairly fundamental flaw.

Recall that, at a portfolio level, Strategyzer recommends using their Innovation Risk Level metric to map risk on their Portfolio Level visualization. However, the Innovation Risk Level, as defined by Strategyzer, does not actually enable comparison of “risk” (in the general sense of the word) across projects. Instead, Innovation Risk Level is set at 100% at the beginning of a project, then the project team estimates the level of de-risking achieved (see above). It simply measures what percentage of that initial 100% remains today (based on how much you’ve learned from your experiments). So, for example, if you were to map a group of proposed innovation projects onto the visualization, the Innovation Risk Level for all projects would be 100%. Similarly, if your portfolio included a mix of new and in-progress projects, a brand-new project with an excellent probability of success could well appear to have a much higher risk level than a much more speculative project that the team has made good progress on addressing the initial risks.

Clearly this is inaccurate (and I don’t for a moment think it was Strategyzer’s intention!) and would render the Portfolio Level visualization worthless for at least one of its intended jobs—informing resource allocation decisions.

The simplest solution to this apparent oversight would be to just use a different metric to assess innovation risk for the Portfolio Level visualization. However, that would mean you lose the direct link between an experiment which proves your hypothesis (to a certain degree of confidence) and the resulting reduction in the risk reported for the overall innovation. That lack of coherence between levels can create a gap between the information being used to inform decision making at the project team level and at a leadership level.

It’s worth highlighting a related issue at this point: Strategyzer’s approach does not define how to assess risk. Somewhat confusingly, their hypothesis level Risk metric is not actually a measure of risk at all. Instead, it’s a measure of the contribution that hypothesis makes to the total risk of the project. But on what basis are innovation teams meant to assess the risk of individual hypothesis and/or the total risk of the project? Perhaps the intent is to rely on intuition. Intuition of experienced practitioners is a valuable input in many innovation decisions. However, where consistency across projects is necessary, it’s critical to enable practitioners to consider the same set of factors using a common basis.

A couple of solutions to these two problems immediately come to mind.

Firstly, you could use a scorecard to calculate a composite risk score e.g., using Strategyzer’s Project Scorecard1. That risk score could be the (arithmetic) mean of all the individual scores. Or you could calculate a weighted average, taking into consideration what experience tells you are more or less important criteria in your business.

This approach would still be compatible with a hypothesis-based approach to planning and executing your project. You would simply need to map each hypothesis to the Scorecard criterion it should impact. As you complete experiments, you would update the scoring for the mapped criterion from the Scorecard and recalculate the overall risk score. This approach would still help with prioritization—e.g., based on where risk can be most quickly and cost-effectively reduced. And, finally, it would make it easier to achieve a degree of consistency in how different innovation team assess risk.

Secondly, you could quantitatively assess the uncertainty within your innovation projects. This would require constructing a simple quantitative model that links the intended outcome (e.g., 5-year sales) to your underlying hypotheses. The uncertainty inherent in these hypotheses could then be modeled via some basic scenario analysis (Monte Carlo analysis might be appropriate for projects requiring larger investment) to determine both the overall uncertainty and the contribution of each hypothesis (which helps with prioritization). The project’s initial uncertainty could be estimated, and then, as experiments are run and hypotheses are tested, the estimates could be updated. This approach would also help to enable consistency in risk assessment across projects.

Other concerns with Strategyzer’s approach

As we reviewed Strategyzer’s approach, several additional concerns emerged, including:
   1. Potential for inconsistent application of scoring-based indicators
   2. An incomplete set of innovation metrics

Potential for inconsistent application of scoring-based indicators
Strategyzer’s approach includes some “scoring-based” indicators—that is, the user selects a score (e.g., between 0 and 10). Strategyzer uses these for the Project Scorecard and for estimating Confidence Level at the hypothesis level. Scoring-based indicators are helpful for converting qualitative insight into a quantitative format. However, a weakness we often encounter is a lack of consistent application—across teams and/or over time. When only minimal qualitative guidance is provided (see examples below), the chances of varying interpretations is high. This weakness can be somewhat overcome by providing more detailed qualitative descriptions of each score that “anchor” the user—see the example below.

An incomplete set of innovation metrics ​
Strategyzer’s approach does not describe the full set of innovation metrics an organization would need to achieve all the necessary functions of an innovation performance measurement system. While they don’t portray it as such, it’s still important to acknowledge that their approach is only a subset of what you need.

In our guide Innovation Metrics: How to Measure Innovation Performance we identify basic needs that a complete set of innovation metrics should address. The table below illustrates which aspects of this set Strategyzer’s approach covers. Coverage at the project level (equivalent to Strategyzer’s Business Model Level) is excellent, while coverage is only partial at the portfolio level and non-existent at the organizational level.

Source: Commodore

1 — It would probably make sense to invert the scale such that 0 = least risk and 10 = highest risk.