# Error control in mathematical thinking and practice

I’m going next week to have what I expect will be a very stimulating discussion with Dirk Schlimm and Valeria Giardino, and in preparation am thinking through some of Valeria’s arguments about mathematical proof and diagrams.  I’ve realized, in doing so, how some of my perspectives on mathematical practice have changed. (I haven’t read enough of Valeria’s work to know how much what I’ll say replicates her ideas–which in any event are well-worth checking out).

1) On the one hand: Symbols vs. Diagrams

One of the classic arguments in philosophy of mathematics is over the epistemic value of symbolic proofs vs. diagrammatic arguments. Many people in the classical period (say, the early 1900s) argued that symbolic proofs were primary, because they conceptualized mathematics in terms of absolute proof and certainty, and symbolic proofs seemed to them to provide a kind of certainty that diagrams could not–diagrams, after all, can easily mislead.

2) On the other hand: Cognitive Psychology and common sense

More recently, decades of research in cognitive science have established what should be very obvious: symbols can also mislead!  Learners and experts can be misled by the surface form of symbols, and can make plentiful errors even when the symbols are helpfully configured. Indeed, keeping track of symbolic rules seems to be extremely difficult, even with training. My daughter and I capture this in a basic rule of doing mathematics: first you write down your solution, then you look at it to see what you got wrong. (not if you got something wrong, but what).

3) Where does this leave us?

My perspective now is that error control mechanisms–the same kind that guarantee quality control in factories and form the intellectual foundation of null hypothesis testing–are massively important in mathematical practice, and under-appreciated in philosophy of mathematics. Many of our actual bodily practices (such as writing one line of a derivation carefully below another) exist to facilitate comparison and minimize error risk; error risk is one of the key components that must be managed when turning oneself into a mathematical doer.  I guess when I looked last at the symbols-diagrams debate I had not really thought this through, and spent my time taking seriously arguments about in-principle difference in linguistic, visual persuasion, etc.

4) Why did the ‘classical’ mathematicians mess this up?

On my reading now, the early 20th century debate about persuasion and certainty was, at heart, a practical craft-based conversation about error-management. The reason some people have the intuition that diagrams are less secure (more risky) than symbols is that, well, sometimes they are!  But this is a difference in cognitive profile, not philosophical type: e.g., diagrams sometimes involve angles which may appear similar but be crucially different, or involve exemplars (take an arbitrary triangle…) that may yield incorrect generalizations. On the other hand, though, symbols also provide error opportunities that may have been underappreciated: each symbol looks very similar to other symbols, and specific complex transformations are hard to memorize without error. This is part of why kids hate algebra!

# Graspable Order of Operations

Note: this is really designed as a sample page for my fifth grade daughter to practice order of operations on. If you don’t have a fifth grader studying order of operations, you may not find this to be riveting information:

My daughter is studying order of operations right now, and I made her a page of problems in Graspable Math to solve, and thought I’d share them publicly, in case someone finds them useful someday.

If you don’t know Graspable Math, the big thing you should know is that it’s a dynamic software–it let’s you control the actions taken in an algebraic setting, but you don’t have to write them yourself. The program was written and conceived by Erik Weitnauer, Erin Ottmar, myself, and some other folks. It allows you to do actions that aren’t strictly PEMDAS, as long as the action yields the PEMDAS-approved answer, so YMMV, depending on your personal pedagogical goals. Here’s a video demonstrating the basics of this particular page:

And here’s the page itself:

And here’s her actual homework; she’ll use this to check her answers:

# A "Graspable" Proof of Viviani’s Theorem

Viviani’s theorem is this really cool proof about equilateral triangles.  I’m not sure who first proved it (it’s named for Viviani, obviously). It states that if you take any point inside the triangle, and draw the shortest straight lines to the sides, the sum of the three resulting lines is equal to the height of the triangle.

I’ve been playing around with making representations and demos in Graspable Math, a research project and free teaching tool my post-doc Erik Weitnauer, Erin Ottmar, and I are making together (all beta disclaimers apply).  I thought I’d make a quick proof of Viviani’s theorem, and I was pretty pleased with the result. If you want to play with it yourself, here’s a link to the canvas–but be forewarned, as of Dec 1, 2016, there’s some glitch with our saving and loading, which breaks some of the links. You’re better off deriving the proof yourself next to the proof that loads. I demo that here.

Viviani’s theorem has some funny implications. For instance, barycentric coordinate plots–the coolest way to plot three values constrained to a constant sum–couldn’t work without them.

Beauty powered by Viviani

These plots come up all the time in my work, because we often have tasks where subjects have to choose among three items. They are also the best way to think about how you pay your money to humble bundle: you gotta pay all your money, so the sum is fixed, but the values are ‘free’ to vary.

Giving money to humble bundle is a good idea, no matter where in the triangle you end up.

# An exploration of the beauty and value of algebra in apparently geometric problems

What fraction is shaded?

It’s pretty clear from counting triangles that the answer is 1/8, but I like algebra, so I made a quick solution in Graspable Math, and learned something from it, so I thought I’d share.

Here’s the basic idea:
So at first glance, this is one of those proofs that feels like drudgery–didn’t the geometric flipping thing make more sense?–But the great thing about algebra is that it gives you new insights you wouldn’t have probably had otherwise. Let me ask you this: why is it 1/8th? Why did this construction lead to that ratio? Counting provides little insight–it just is. now look at the algebraic proof, and you get a quick hint: The 8 came from three factors of 2.

Two of these 2’s appeared from the size difference between the large and small squares, and one appeared because we used the pythagorean theorem–essentially because we rotated the square while keeping it bounded in extent.

We can picture it like this: we started with the big square, then we shrunk it down, and rotated it in a particular way (keeping its total width fixed). The first of those operations took the area down by a factor of four, and the second by a factor of two.

This immediately suggests, and tells us the answer to, two related problems, both of the "find the shaded area" type:

and

Some people say that algebra ‘proves things rigorously’. Maybe, but the real advantage of algebra is that it gives you the opportunity to see things in new ways, that help you understand why things are the way they are. Of course there are lots of answers to the ‘why’ question, and indeed lots of ways to get to the three magical 2’s in this example.  But the insights we get from algebra tend to be compelling, non-obvious, and powerful. And that’s what algebra is for (okay, that and a lot of other things too).

# Millions and Billions and Obamacare

For a long time now, I’ve been writing journal articles and blog posts about large number innumeracy. A basic take-away from my research has been that untrained members of the general population have a wide range of skills for dealing with large numbers.  Basically, people are very competent at arithmetic up to about 1,000; they can order the basic scale words thousand, millionbillion, and trillion, and can generally even write them correctly. What people struggle to do is to relate across orders of magnitude. In one study, for instance, 65 of 67 college undergraduates represented one thousand, one million, and one billion correctly as numerals, even as one third of them made huge errors estimating their relative magnitudes. This week, a popular internet meme has made my point beautifully.

# Every Abstraction is A Concreteness, Somewhere Else

I have often argued that the idea of an ‘abstraction’, as usually conceived by Cognitive Science, is a myth. The specific notion of abstraction that I’m accusing of mythological status about is one in which concepts can be divided into those which primarily encode specific features or instances or particulars (concrete concepts), and those which do not (abstract concepts); it’s epitomized in the ‘schema abstraction’ notion of category learning.  On this notion, initial encounters with an abstract concept involve encoding lots of features that are fundamentally irrelevant to some (abstract, relational) concept.  Later, you prune away those irrelevancies, and are left with just the pure relational entity.

Of course, something can be a myth either because it is an unattainable ideal, or because it is fundamentally wrongheaded.  Some people have suggested that sure, it might be that even abstract ideas still retain some concrete features, but have fewer of them, somehow.  So ‘abstraction’ is, itself, an abstract principle which maybe never happens purely in practice, but is still the limit of real abstractions in the same way that a Platonic circle is the limit of real circle-like shapes. In contrast I’ve usually argued that the myth of ‘abstraction’ is the latter–fundamentally wrong-headed, at least an an explanation of interesting complex reasoning in supposedly abstract fields like math or physics. Maybe this feature-stripping thing happens, but it’s not what we call "abstraction" in mathematics.

# What is Bedtime Math Really Good for?*

Attention conservation notice:  Short (500 words), but entirely misleading, though honest, criticism of actually really solid work on a great idea.  Take whatever time you were going to spend reading this, and use it to read Bedtime Math with your kids instead.

# No, the pentagon didn’t lose \$8.5 trillion.

Reuters has an interesting article which has been making the rounds recently, which is summarized by ignorant inattentive articles asserting that the Pentagondoesn’t have a clue" how it spent  \$8.5 trillion.  Now, I’m no fan of military spending—but I’m an implacable enemy of lying about numbers.  This one is total crap. Worse, it’s obvious crap. the number is too large for it to have been ‘lost’ by the Pentagon.  In fact, a moment’s calculation will let you know, as it did me, that \$8.5 trillion is the total spending by the pentagon since 1996. This is confirmed by reading the actual article. We have a pretty good idea of what the Pentagon has spent nearly all of that money on; if we didn’t we couldn’t make any of the nice graphs of pentagon spending the reuters article has in it.
What’s weird to me is how many news outlets seem unable to reason for themselves about this, and to realize that this money just can’t be ‘lost’. All it takes is knowing the single-year budget for the pentagon, roughly, and multiplying by, roughly, 20.  I understand how intelligent readers can be mislead, but how can so many headline and content writers not stop to evaluate or think about the numbers they are writing about?
The reuters article would like you to consider the entire pentagon budget "unaccounted for”, which is true in the sense that the article details, that the Pentagon’s myriad accounting practices are very shoddy—but it’s nothing like the common person’s understanding of the word ‘lost’.
For instance, from the original article:

Q: How much taxpayer money has the Defense Department spent that has never been audited since the 1996 deadline?
A: About \$8.5 trillion.

True, but not exactly transparent.  The article doesn’t give any estimate of plausible bounds on the total error, leaving the reader in a murk of uncertainty. Not giving any summary of the total error allows a particular sort of dishonest move, common to large-number conversations: In this move, we leap from tiny uncertainty to complete ignorance.  The trouble is that the practical on-the-ground truth is that accounting is always filled with small uncertainties–even post-auditing.  This is just the truth of life, and by itself, isn’t problematic.  Lot’s compare a few statements which are comparable in terms of relative uncertainty:
• What did the DOD spend it’s last 10 billion dollars on?: It isn’t known
• How old is your ten-year old child, in minutes?  What’s that? You don’t know? Then your child’s age isn’t known.
• Exactly how many words did Shakespeare write, including in his letters, journal, diary, and receipts? Oh dear, the literary output of Shakespeare is totally unknown-it’s all gone.
• What’s Newton’s constant of gravitation?  According to the National Institute of Standards and Technology, and the dishonest logic I’m mocking here, we have absolutely no idea. We lost it.
You see my point? We just can’t turn errors on the scale of 10^-5 into complete failures of data tracking. Both may be bad, but they aren’t the same.
To be fair, the accounting practices do appear to be pretty bad. Unfortunately, the authors of the Reuters article do, as far as I can tell, no summary estimate of how bad the problem might be. . The individual errors reported actually mostly seem very small, from the particular accounting-relevant perspective of tracking how much money is lost.   For instance, the article notes " In the Cleveland DFAS office where Woodford worked, for example, “unsupported adjustments” to “make balances agree” totaled \$1.03 billion in 2010 alone, according to a December 2011 GAO report.”  Well, that sounds bad—but DFAS handles, apparently, the full federal budget of \$500 billion. This would be like finding that a moderately wealthy family could not report \$150 of spending over a year, and concluding that they had squandered their whole \$80k income. Caveat: the Cleveland office is just one office, so the total may be more like \$1000 out of \$80k, say.  This is rather more than I spend in a year on coffee, but not much more.
The article is inflammatory, fascinating, important, hugely misleading, and I think largely right in its broader message. It’s important.  Read it.  What bothers me, though, is just that if anyone thought about the magnitudes of the numbers they were talking about, they would come to very different conclusions than anyone in fact seems to be coming to.  Don’t come away thinking that the Pentagon spent half our national debt on corruption and graft: come away thinking that the Pentagon has standards of accounting that are fairly typical of your average small business, and that we probably want them to be higher than that (and the budget itself to be lower!!).
Just to be really specific:
The watertown daily times says :"The Pentagon can’t account for \$8.5 trillion it spent (Reuters), money that might better have covered the Department of Veterans Affairs’ \$2 billion budget shortfall".  This is double-counting nonsense. The \$8.5 trillion already includes whatever was spent on DoVA, and would not matter what. It’s the whole budget, as common sense would have told you.
Daily kos asserts "Combine "Known" Pentagon waste (like the 1.5 Trillion dollar F35) with missing pentagon money and you have a good chunk of our entire national debt represented. "  Only, you guessed it, you can’t combine any money paid for the F35 with the \$8.5 trillion,  because the \$8.5 trillion already includes it.  Actually, I guess this might be okay, because as far as I can tell, the cost referred to includes about \$1 trillion in costs that would be paid over the next 30 years.  Still can’t really include it in our national debt.
Daily kos also says ""Oh really, you’re concerned about deficit spending and the debt?  Fully 1/3 of the national debt it is money we sent the Pentagon and they can’t tell us where it went.  It’s just gone." This is right, if by ‘just gone’ you mean ‘tracked only quite a lot more accurately than anyone I know tracks their own expenses."
Sigh.

# What makes a mathematical representation "grounded"?

I’m just leaving the 2015 CogSci conference, and am pondering what I saw there.  I’m particularly dismayed at the lack of sophistication surrounding the notion of an external representation, especially the role and nature of ‘grounding’ in a representation—so I’ll focus on that in this post.

First, though, some quick impressions: It was exciting to see so much mathematical and numerical cognition going on. There were many fascinating posters, and also several great talks, a symposium, and a keynote focused on mathematics.  People are saying exciting, new things, and it was fascinating to hear.  I particularly enjoyed, as did many people, Kevin Mickey’s demonstration of the power of the unit circle as a central representation for trigonometry.

# Short version

Scientists have done a spectacularly poor job explaining to the taxpayers what we do in many ways. One, which is perhaps not entirely our fault, is that we have done a poor job explaining just how cheap our research is.  Here I tell you about a project my lab conducted, which suggests (a) that people vary dramatically in how they map the cost of objectively small budget items onto a number line, even when given numerical information about costs, and (b) that support for these budget items is elastic in terms of psychological relative cost—people who are better at mapping the true cost of the programs into number lines view them more favorably than those who don’t them.

## Budgeting Science

It’s that time of year again.  House Republicans have noticed that the National Science Foundation still exists, and have once again demanded that science research—and social science in particular—be cut substantially.  It’s actually not as bad this time around as in some past years: social science is facing a 42% proposed cut; in past years, the starting proposal has been even higher. The proposal also puts heavy restrictions on climate change research.

And, once again, it’s time to face the fact that we scientists have done a spectacularly bad job explaining what we do, and why it is worth public investment. Some of the reason for our failing is perhaps that we scientists feel entitled to do our work; some is that objectively, science is an amazingly good investment, and social science has arguably led to growth in GDP, as well as in outcomes for veterans, escape plans in the face of natural disasters, and educational practice.

Nevertheless, support for public research is relatively low, and funding for the public universities that are the major site for this research is under pressure. One problem is that there is a widespread misconception that professors spend the majority of their effort teaching in classrooms. Of course, teaching is an important part of our job, but classroom-related teaching is about 20-30% of most faculty member’s efforts. The bulk of our time is doing research—research that creates much of the new knowledge we go on to teach in our courses. As a result, students end up paying for research that benefits the entire tax base, and taxpayers don’t realize how this value is achieved.

But over the last few years my lab has been researching another likely cause of opposition to the NSF and other research budgets*. Budgets for NIH, NSF, IES, DARPA, and other large, famous federal research foundations are typically expressed as numbers.  For instance, the NSF budget is about \$7 billion dollars annually.  And people don’t know how much that is.  Worse, they work with those numbers incorrectly, and when they do, they tend to end up making predictable bad judgments that likely mismatch their real desires.

# Perception of Cost

I give a super-fast overview of our methods here.  There are a lot more details in the published papers. If what you want is less detail, here’s the one-sentence version: About 40% of people are biased on number lines such that they systematically and hugely overestimate the value of smaller ‘big’ numbers relative to much larger ones, when those numbers cross between millions, billions, and trillions

The major—but not the only—way we have examined large number use is using the number to position task.  Here, we ask people to put a number on a number line. For instance, we might ask people to put 280 million on a line from 1 thousand (or 0) to 1 billion.  There is quite a bit of complex structure in how people respond to this task, and I won’t explain it all in detail (but see our papers).  The short version is that people divide the line up into ‘chunks’ based on the scale word used—for instance a line from 1 thousand to 1 billion would be divided into a ’thousands’ chunk and a ‘millions’ chunk**, like this:

The thing is, the way I just drew this, it’s very wrong.  You see, there are 1,000 millions in a billion (that’s what a billion is, right? 1,000 million, at least here in the US (link)).  But about 40% of our subjects do something quite like this, placing "million" somewhere between 20% and 50% of the way across the line.  The other half also seems to divide the line up, but they do it more or less at the right place, which is about here:

So about half of people not only get big numbers wrong, but get them systematically grossly wrong.  Does that matter? It might: if these behaviors reflect something that is happening when we compares costs.  Let’s look at how this might work in an example: the \$11 million that was budgeted at one point in 2013 for political science research, out of the \$7 billion total the NSF was getting that year.

If you’re one of the more accurate, linear responders, that looks (on the line!) like not that much money.  But if you’re one of the non-linear people, then \$11 million looks
like a lot.  I collected data from 50 mechanical turkers, to verify this. First I asked them to place 11 million on a line from 0 to 7 billion. (the NSF was not mentioned). Then I gave them 8 other number line judgments on our standard thousand to billion" line. I used the latter 8 judgments to bin people into the two groups, which I’ll start calling linear (that’s the people who get it right) and categorical.  The difference is large and right on track with our model predictions. Categorical responders rate \$11 million as 20% of \$7 billion, while linear responders are closer to 1%.

# Does this affect actual views on funding federal programs?

We don’t know yet whether number line judgments actually causally impact people’s political views. But we have some evidence that they might at least correlate with them***. Last summer Brian Guay conducted a research study in my lab, through Time-Sharing Experiments in Social Science (or TESS) . TESS conducts nationally representative online surveys using standard polling methods on important topics for social scientists, and that’s just what they did for us. The survey consisted of about 2100 adults.

Here’s what we did: First, we gave each person 4 number line judgments, and used those to divide them into two groups. Then we asked people to make 4 judgments about the federal budget ****.  In each, we gave people a total budget for an entity, and an amount allocated to some particular program in that budget. These were actual spending figures that had been recently reported in the media. Then we asked whether the agent should spend “a lot less”, “a little less”, “about the same”, “a little more” or “a lot more” on that particular program.

The four items were:  spending on climate change research in the NSF (\$133.53 million of a \$5 billion NSF research budget); spending on weapons systems by the federal government (\$114.9 billion of a \$3.45 trillion federal budget); spending on unmanned drones by the U.S. Customs & Border Protection agency (\$88.6 million of a \$10.35 billion CBP budget), and US federal government foreign aid (\$52 billion of a \$3.45 trillion federal budget, and fairly notorious.

# The results

Obviously the details depend on exactly how you measure things. We had decided to add together***** the numerically coded ratings to get a ‘total support measure’, because that seemed simple, and also to analyze separate effects for each question, because that seemed interesting.  We included only people who answered all the questions. The graphs present something slightly easier on the eyes, but basically tell the same story. What they indicate is that, overall, acting linearly on the number line task was associated with a shift support for maintaining or increasing funding for these government programs, i.e., who gave a response of at least “about the same". The total raw shift in support was about 4 percentage points, from 59% supporting these programs among linear responders on average to 55% among categorical (standard error around 0.9%).

Of course, some of that is explained by correlations between the groups: accurate number line responding was moderately correlated with income, education, and gender. However, even when these were included as covariates in a multiple regression, linearity continued to carry unique variance; perhaps more importantly, a preliminary SEM analysis suggests that linearity is affected by overall education level, but also mediates education’s effect on these judgments. There are lots of ways that education probably influences support for cheap government programs, of course; however, our mechanical turk studies suggest a possible causal intervention—training people on the number line affected immediately posterior number line judgments.

Nor does political affiliation explain easily the results: more linear people take a more liberal position by supporting increased NSF spending on climate research, but a more conservative one, approving more spending on drones to secure the US border.

If you want the full details, here’s the same graph as above, broken down by question.  You can see that support for climate change research and spending on drones are much more sensitive to these phenomena than foreign aid and weapons—is that a real difference? I don’t know. It would be interesting to see how elastic people are to the programs, but until these patterns are better replicated****** we won’t really know for sure.

# The Moral

The main moral is this: Giving people context to help them understand the significance of large numbers may lead a fairly large proportion of them to misinterpret the relative values involved in predictable ways. Practically, this matters, because contextualizing information is often used by the media to frame values and often crosses scales in just this way. It’s important for people to realize that, even when it doesn’t shift their position all that much, it may have a larger impact on how they interpret these statements. Saying that the NSF spends \$11 million of their \$7 billion budget on political science******* may sound either of two very different ways, depending on how the reader interprets the numbers.

This failure to correctly deal with large numbers impacts our support for cheap programs, but as my friend John Opfer points out, it also plausibly impairs our ability to cut deficits appropriately. Politicians  often propose budget cuts which are objectively tiny—but are probably accepted as moderate progress by a fairly large proportion of the population. Again, we think it’s important to carefully express numerical in a context in a way that avoids these typical misinterpretations.

The second moral is more fraught, but relates to the question of how we should present large numbers.  We don’t know—we don’t have all the right data to determine what methods of presentation will be most effective.  Here are some guesses, though, some of which are informed by data. 1) Present all your numbers using the same base.  That is, don’t say “the proposal cuts \$300 million in climate research, from \$1.4 billion to \$1.1 billion. Do say “The proposal cuts \$300 million in climate research, from \$1,400 million to \$1,100 million” (as in xkcd).  Do present linear visualizations of your quantities.  Do remind people of how the number system works, every time.  Do give percentages where meaningful and possible.  Fanny Chevalier has collected a large number of scale representations, and done some interesting analysis of the kinds of scales people use. You might find her analyses helpful too.

* This research was partially funded by the NSF.  We weren’t NSF-funded (and I never have been yet), but TESS, the group who funded and conducted our survey, is funded by the NSF.

** Before you ask, it doesn’t seem to matter much whether numbers are printed as numerals (e.g., 280,000,000) or hybrid number words (e.g., 280 million).

*** Full disclosure: this data has not yet been published in a peer-reviewed journal, or even presented at a peer-reviewed conference.  (For that, we’ll do structural equation modeling, so the analysis won’t even be the same). You heard it here first.  Lots of things you hear on the internet turn out to be wrong.  Caveat emptor.

**** We were unable to counterbalance the order of the number line and political judgments in this experiment, though the internal scales were presented in random order.  There is, of course, some possibility that mere exposure to the number lines changed people’s views. No study is final. As I said in ***, caveat emptor.

***** This treats the Likert scale as a fully metric scale, which is inappropriate.  Better techniques exist, and our results generalize to most.  But they are harder to describe, so here I’m sticking with the simple.

****** In the lab, as part of piloting these materials, we have replicated an effect of linearity on NSF results three times with mechanical turk populations during pilot work—that’s three out of three attempts. In each, we also included Foreign Aid spending, and as I recall in two there was a significant effect, but not the third.  These are NOT preregistered trials, and mix those intended as exploratory and those intended as confirmatory. As I said, more care is needed.

******* Just to fully connect the dots here, this research was itself funded partially by NSF funding to TESS, a social sciences project! I wouldn’t call that a conflict of interest, necessarily (I am not on the TESS grant, nor have I received any federal dollars through NSF, for any project—though heaven knows I’ve tried!), and I’m not claiming that these data by themselves, say, demonstrate the intrinsic valuableness of public funding for research.  However, if you are inclined to see self-interest in this research line, well, I can only state that that wasn’t my conscious motivation, and that I want to be clear and up-front with my readers about the concerns that they might have. Nobody is free from implicit biases, and I want you to be able to scour my behavior for it.