Cost Accounting with Scrum

Here is another in my occasional series on the finer points of scrum.  See also Sprint Planning with Time Separation.  Cost accounting seems inimical to scrum, philosophically, and also infeasible.  We use story points for a reason, and then we let the team discover its velocity through experience.  Neither of these numbers is readily convertible into dollars, but that’s exactly what we’re going to do.

If the latest sprint delivered 70 story points, those points are worth $402 each.

Our goal is to calculate how much was spent to develop a certain feature.  This can be to support a cost-benefit analysis, to track development as a capital investment, or to claim an R&D tax credit.

The team’s velocity is the number of story points it can complete in one sprint, typically two weeks.  Velocity changes from one sprint to the next, but the key is – we know the velocity of the most-recent sprint, and that’s the one we need to account for.

We also know how much it costs to run a sprint.  Let’s say that we have a seven-person scrum team with an aggregate base salary of $677,500.  Adding an 8% burden rate and dividing by 26, we calculate that the cost per sprint is $28,142.

So, if the latest sprint delivered 70 story points, those points are worth $402 each.  Now, let’s say that two capital projects absorbed 65 of the 70 points, plus a stray five-point story that fixed a bug or something.  It was a regular expense.  Here is the cost allocation:

It’s easy enough for the scrum master to load these figures into an accounting system at the end of each sprint, but it does require each user story to be tagged with the project it represents.  If you’re using Jira, it’s best to group the stories into epics, which represent new features, and include the capital project identifier (i.e., an account number) on the epic.

Rethinking Electric Cars

While battery-electric (BEV) vehicles may help reduce atmospheric greenhouse (GHG) gases, their production damages the environment in other ways, including water pollution.  Furthermore, BEV production is so energy-intensive that, on a lifecycle basis, they produce almost as much carbon emissions as traditional internal combustion (ICE) vehicles.

The lithium-ion battery in an Audi e-Tron weighs 1,500 pounds, making this “green” vehicle heavier than a Dodge Ram pickup truck.  The same goes for Tesla.  Upscale electric cars are monstrously heavy, with minimum 1,000-pound battery packs.  It’s easy to see how this extra weight must entail extra mining, milling, and manufacturing.

More plebeian vehicles, like the Chevy Bolt, still carry 400 pounds of battery.  Depending on the battery type, this might include 10-15 pounds of lithium, similar amounts of cobalt and manganese, and maybe 100 pounds of aluminum.  These elements all come from nasty, toxic, open-pit mines in places like Mongolia, Chile, and the Congo.

“In Chile’s Atacama salt flats, mining consumes, contaminates and diverts scarce water resources away from local communities”

Sixty percent of the world’s cobalt comes from “artisanal” mines in the Congo.  That’s a fancy way of saying that African children dig for it in the mud.  If you don’t believe me, believe the photos from Amnesty and the UN.  The Katanga region has been named one of the world’s ten most polluted areas.  As one Twitter wag put it, “electric cars transfer pollution to poor communities and sanctimony to rich ones.”

Lithium is produced either by mining or from brine evaporation.  The latter process is cheap and effective, but uses roughly 500,000 gallons of water per ton of lithium.  This has been a problem for local farmers in Chile.  Apart from direct water consumption, both processes have the potential to leak toxic chemicals into the water supply.

Lithium production has been growing rapidly to meet the demand for electric vehicles, and now stands at 100,000 tons per year.  Demand forecasts to 2030 range from 2 to 3 million tons – that is, 20 to 30 times current production capacity.

The mine at Thacker Pass in Nevada sheds some light on the economics.  Lithium recently hit a record $71,000 per ton.  Producing one ton of lithium entails strip mining 500 tons of earth, and Thacker Pass has the potential to produce 60,000 tons of lithium per year.

You may think that we have no choice but to despoil the planet in search of battery metals, because climate change is the greater threat.  Consider, though, how much diesel fuel is burned by all of these mining operations.  On a lifecycle basis, electric vehicles barely improve upon the GHG emissions of a traditional ICE vehicle.

“We estimate the GWP from EV production to be 87 to 95 grams carbon dioxide equivalent per kilometer (g CO2-eq/km), which is roughly twice the 43 g CO2-eq/km associated with ICEV production”

An electric vehicle begins its service life with roughly double the carbon footprint of an ICE vehicle.  Thereafter, it will produce less GHG depending on the local power source.  As of this writing, 60% of electricity in the U.S. is generated from fossil fuels.

Lifecycle emissions for the BEV break even with the ICE vehicle around 80,000 miles of use.  Here is a recent research note placing the breakeven point at 124,000 miles, and here is an ambitious study which calculates the total global warming potential (GWP) along with other forms of ecological damage.

Below is the chart from the study.  You can see that the various electric vehicles improve slightly on ICE vehicles for GWP, but look at those other metrics!

In case you don’t have the legend in front of you, those four metrics where the electric vehicles far exceed ICE vehicles are:

  • Human toxicity
  • Freshwater eco-toxicity
  • Freshwater eutrophication
  • Mineral resource depletion

It’s the water pollution that bothers me.  Water scarcity is one of the principal threats from global warming, already a clear and present danger, and yet here we are polluting tons of it to make batteries.

The great hope here is recycling.  To the extent that minerals can be reclaimed from batteries at the end of their service life, this could reduce demand for new mining.  Unfortunately, current capabilities for recycling are not great.  They have low yields, and they’re energy-intensive.

So, it’s that catch-22 again, where we burn a load of fossil fuel to recycle our “green” batteries.  If car makers really had faith in recycling, they would not be pressing the government to relax environmental protections around lithium, cobalt, and nickel mining.

The tragedy is that ICE vehicles were making good progress toward the fabled circular economy.  When I worked for BMW, there was a goal to make cars 95% recyclable.  Our engineers designed everything to be removed, refurbished, and recycled into new cars.  If someone ever figures out “net zero” recycling, I’m sure it will be BMW, but meanwhile we are facing a growing pile of battery waste.

I have kept this post short by focusing only on the ecological dangers of battery-electric vehicles, and overlooking other challenges, like grid capacity.  Nor have I discussed alternatives, of which there are many.  There are social solutions, like mass transit and remote work, as well as engineering solutions.

“Electrification is a technology chosen by politicians, not by industry.”

In this interview, Carlos Tavares alludes to EV hybrids.  Other solutions, like fuel cells and hydrogen combustion, have received a fraction of the attention and investment given to electric vehicles.  As Tavares says, this is a result of politicians’ need to be seen taking action, even if that action is ill-advised.

With mandates in one hand, and billions of incentive money in the other, politicians are stampeding the industry toward their chosen technology.  This is not the right way to stimulate innovation.  Regulators should specify a carbon-emissions target, taking the full lifecycle into account, and then allow industry R&D to find the best solution.

Data Lakes Explained

Last month, I wrote an explainer on AI and it was well-received, so here is one on data lakes.  If you already know the concepts, you may still find this framing helpful in client discussions.  Our audience this time is the CFO, or maybe the CMO, and our motivation is that their analytical needs are not well-served by the transactional database.

Transactional Processing with a Relational Database

The data that runs your business – most of it, anyway – is probably stored in a relational database like Microsoft’s venerable SQL Server.  Without going into details about the “relational” structure, the key is that this database is optimized for the daily operations of the business.

New policies are booked, premiums collected, and claims paid.  These are transactions that add, change, or delete records.  There are also “read only” operations, like producing invoices, but the database is designed primarily for transaction processing.

A well-designed transactional database will resist anomalies

A well-designed transactional database will resist anomalies, like a line item with no invoice, or two sales of the same item.  The database designer will have used a technique called normalization, breaking the data up into smallish tables with relationships that enforce integrity.

Think of how your chart of accounts is organized.  Everything you need to account for is broken down to the lowest relevant level, and then rolled up for reporting.  Every journal entry hits two accounts, debit and credit, so that they’re kept in balance.  Your meticulously normalized database is kind of like that.

When a customer places an order, a row is added to the Order table.  You don’t need to open the Customer table unless there’s a change to the customer.  Built around these normalized tables is the machinery of indexes, clusters, and triggers, which support speed and integrity.

Pro Tip: Take time to confirm that the transactional database is stable and supporting the business satisfactorily.  You don’t want to start building pipelines and then discover there’s a problem with your data source.

Analytical Processing with a Data Warehouse

Transaction processing involves adding and changing data, with carefully limited scope.  Analytical processing, by contrast, is mostly reading data – not changing it – and holistic in scope.  To support this, the data must be copied into a separate database and denormalized.

Let’s say you want to know whether Dent protection sells better as a standalone product, or as part of a bundle – corrected for the number of dealers who don’t offer the bundle, and segmented by the vehicle’s make and price range.

You could run this query against the transactional database, but it would be difficult.  The query is complicated enough without having to piece together data from multiple tables.  The normalization which served so well for transaction processing is now an obstacle.

Confession: I am a normalization bigot.  I bought C.J. Date’s textbook, read the original papers in the ACM journal, and even coded Bernstein’s algorithm.  To me, organized data is normalized data, and de-normalizing is like leaving your clothes on the floor.

So, this is a good guide to denormalization.  Everything we learned not to do in relational databases – wide tables, nested data, repeating groups – is useful here.

Analytical data is stored in cubes, stars, snowflakes, hearts, and clovers

Analytical work requires not only a new database design, but a new database system.  Out goes SQL Server and now we have Big Query, Redshift, and Snowflake.  You may hear this buzzword, OLAP, which means “online analytical processing.”  This concept was invented for marketing purposes, to describe the new category of software.

Analytical data is stored in cubes, stars, snowflakes, hearts, and clovers (see sidebar).  Just kidding about the hearts and clovers.  Also, while your transactional database may be running SQL Server “on premise,” the analytical database will almost certainly be on a cloud service from Amazon, Microsoft, or Google.

To be honest, not everyone needs an OLAP database.  As CIO for BMW Financial Services, I did not recommend one because our analytical workload was small, at the time, and could be served adequately without a lot of new gear and expensive consultants.  Since then, I have gone over to the side of the consultants.

Sidebar: What’s an OLAP Cube?

In the early days of analytical processing, software vendors thought it would be a good idea to use a multidimensional data structure called a hypercube. Think of a typical spreadsheet, with rows representing an income statement and one column for each month. That’s two dimensions. Now, add a stack of spreadsheets, one for each region. That makes three dimensions, like a cube. I put myself through grad school working at Comshare, one of the first OLAP software vendors. It supported seven dimensions. That’s a hypercube. Nowadays, there are better data structures, and this leads to some confusion. Older analysts may assume that if they’re doing OLAP, then they must be using a cube. They may use the term “OLAP cube” to mean any analytical database, even though cubes have largely been replaced by newer structures.

Pooling Data in a Data Lake

You can think of the data lake as a way station between the transactional database and the data warehouse.  We want to collect all the data into a common repository before loading it into the data warehouse.

Why not simply extract, transform, and load data straight from the transactional database?  Well, we could, but it would be brittle.  Any change on either side would require an update to the pipeline.  The data lake decouples the OLTP and OLAP data stores.

The data lake serves the very important function of storing all the data, in whatever format, whether or not it’s amenable to organization.  The term’s originator, James Dixon, wanted to suggest a large volume of data with no preconceived organization.

The key thing is to collect all the data in one place, and think about organization later.  This calls for an “object data store,” like Google Cloud Storage.  GCP and AWS both use “buckets.”  You get the idea – this is where you leave your clothes on the floor.

Most of your data will indeed be structured data coming from the transactional database, and on its way into the OLAP database – but not all of it.  Here are some real-life examples I have encountered:

    • Logs of API traffic. Details of who is using our ecommerce API, including copies of the payload for each request and response.
    • Text snippets. A file of the several paragraphs that make our standard Texas contract different from the one in Wisconsin, so that we can produce new contracts automatically.  Same goes for product copy on the web site.
    • Telephone metadata. A list of timestamps, durations, phone numbers, and extensions for all calls in the call center, both inbound and outbound.

These examples are better served by special-purpose databases like Hadoop, Bigtable, and Mongo.  It’s best to take stock of all the data your analysts might need, broadly speaking, and start collecting it before you go too far with designing the OLAP database.

The Power of Experience

I have been rereading Gary Klein’s landmark book on decision-making, Sources of Power.  Klein’s genius was something other sciences take for granted: field work.  Klein and his team spent years studying how experts make high-stakes decisions in real life.  This is truly “what they don’t teach you in business school.”

The short version is that formal methods for decision making are rarely used in real-life conditions.  Indeed, the people studied by Klein were not even conscious of making decisions.  They just knew what to do.  When a surgeon must make a snap decision, with someone’s life on the line, there’s no time for a weighted-factor analysis.

Most research on decision-making bleaches out the importance of prior experience

Klein points out that most psychology research, in an effort to produce controlled conditions, bleaches out the importance of prior experience.  If you do all your research in a laboratory, then you will only learn how people make decisions in a laboratory – not in combat, say, or a forest fire.

Like his better-known colleagues Kahneman and Tversky, much of Klein’s research was funded by military organizations.  They would like their gunners and squadron leaders not to make fatal blunders under fire.  Also included are doctors, firefighters, and nuclear power plant operators.

The power of experience seems obvious enough, but Klein figured out exactly how it works, in a framework called the Recognition-Primed Decision Model.  This consists of using imagination plus experience to generate possible courses of action, and then conducting mental simulations to predict the likely results.

Sources of Power

Various “sources of power” follow from the model:

  • Expert Intuition
  • Mental Simulation
  • Finding Leverage Points
  • Detecting Anomalies
  • Reasoning by Analogy
  • Anticipating Intentions

What we think of as intuition is really expert recognition.  One firefighter recounted a narrow escape because he’d had a “premonition” the building he was working in was about to collapse.  This might have been a warning from God – or it might have been the million subtle cues he was unconsciously observing.

This may seem like a different realm from business, where we have ample time to make decision trees, compute expected values, perform cost-benefit analyses, and – there’s always time for one more Big Four consulting study.  This is an illusion, however.  Whether they know it or not, managers are under constant pressure to make decisions and take action faster than their competitors.

A good plan, executed right now, beats a perfect plan executed next week.

My mentor at AutoNation, Kevin Westfall, had a plaque in his office with this quote from General George S. Patton, “a good plan, executed right now, is far better than a perfect plan executed next week.”  Kevin and I had both arrived from our previous employer with some impatience over their decision protocols.

Recognition-Primed Decision Making

In an area that could easily devolve into pop psychology, I was impressed by Klein’s scientific rigor.  Every study is cross-checked, blind, double-blind, sanitized, etc.  Every result is turned into a training program, and then the trainees are tested.  In one project, his team redesigned the user interface for a computerized weapons system, making its operators 20% more effective.

Since experience is so powerful, Klein takes up the question of how best to gain it.  That is, what are the key lessons from the old-timers in various domains?  In the infantry, this might mean knowing how fast your squad can move over terrain, what their best range is for engagement, and being able to gauge those distances by eye.

The cornerstone of the book is the RPD framework, and then Klein spends a chapter on each “source of power,” plus his research methods and training programs.  If that sounds like too much psychology for you, skip the text and just read the case studies.  They’re amazing.