The average of numbers, defined to be the sum of the numbers divided by the count of numbers being summed, is a familiar way of extracting a single-number summary of a numerical dataset. While popular, averaging can often lead to misleading observations about the data at hand.

As an example, let us take a look at the income statistics across the world. The following table shows per capita income of three countries:

YearZimbabweRussiaSingapore
2000$502.95$1304.7$20318 2002$471.56$1822.8$17942
2004$409.57$3203.2$21349 2006$363.79$5392.9$27791
2008$265.51$9279.1\$32379

Let's take a look at the income statistics of Russia and Singapore in relation to Zimbabwe's income statistics. Typically, this is done by declaring Zimbabwe's statistics to be 1 and scaling the other statistics accordingly—a process known as normalization. In this case, we divide each year's income statistics by Zimbabwe's income per capita:

YearZimbabweRussiaSingapore
200012.5940.4
200213.8738.0
200417.8252.1
2006114.876.4
2008134.9122.0

Now, let's take the average of each country's income data, so we can compare them with ease:

ZimbabweRussiaSingapore
Average112.865.8

The above summary statistics suggest that Singapore's levels are approximately $$65.8 \div 12.8 \approx 5$$ times higher than those of Russia.

Is this true? Let us compare the income statistics of Russia and Singapore by computing the ratio of income data from Sigapore to those from Russia, both unnormalized

YearSingapore $$\div$$ Russia
200015.6
20029.84
20046.66
20065.15
20083.49

The above computations reveal that Siganpore's income levels are, in fact, far above 5 times the income levels of Russia in most of the years in our dataset. Taking the average of the ratios yields 8.13, a more realistic summary statistic.

What is going on here? Normalizing with respect to Zimbabwe's income data assigns different weights to the income statistics of Russia and Singapore. Since Zimbabwe's per capita income in 2002 is higher than that in 2008, the 2008 data in the normalized data set has more weight than the 2002 data—dividing by a larger number results in smaller numbers. This results in averages that do not quite reflect the true income levels of Russia and Singapore.

To explore this phenomenon further, we consider a substantially simpler dataset:

MondayTuesdayWednesdayThursdayFriday
Coffee3 cups1 cup4 cups5 cups8 cups

Here, the average number of cups of coffee consumed is

$(3 + 1 + 4 + 5 + 8) \div 5 = 4.4.$

Now, let's say a cup of coffee is usually 2 dollars. On Tuesdays, the café near work serves coffee brewed from special beans, so a cup of coffee costs twice as much. On Fridays, the café serves extra cheap coffee, at 50 cents a cup. So, the average amount of money spent on coffee is

\begin{align*} (&3 \times \2.00 + 1 \times \4.00 + 4 \times \2.00 \\ & +5 \times \2.00 + 8 \times \0.50) = \6.40, \end{align*}

which is closer to 3 cups than 4 and half cups. This, as you can see, is the result of assigning different weights to Tuesday and Friday. For this reason, the average obtained by assigning (potentially different) weights to each item in a dataset is called the weighted average.

What if we're only given a normalized dataset? Let's go back to the income dataset and assume that

YearZimbabweRussiaSingapore
200012.5940.4
200213.8738.0
200417.8252.1
2006114.876.4
2008134.9122.0

is all we have available. How do we compare the income levels of Russia and Singapore?

The answer is to compute the geometric mean instead of the average. The usual average, also known as the arithmetic mean, of $$N$$ nubmers is computed by taking the sum of all $$N$$ numbers and then dividing the sum by $$N$$, the total count of the numbers. Since adding a number $$x$$ $$N$$ times is the same as multiplying $$N$$ to $$x$$, the division by $$N$$ makes sense.

In contrast, the geometric mean is computed by multiplying all $$N$$ number and then taking the $$N$$th root of the product. This is to be understood as the mulplicative analogue of the arithmetic mean. Indeed, multiplying a number $$x$$ $$N$$ timse is the same as taking the $$N$$th power of $$x$$, and so the $$N$$th root operation, which reverses the exponentiation operation, is appropriate here.

RussiaSingapore
Geometric mean8.3559.5

Now, the ratio of the two summary statistics is $$59.5 \div 8.35 \approx 7$$, a more reasonable value than the ratio of arithmetic means.

The difference lies in the fact that multiplication plays nicely with itself, whereas addition does not mix as well with multiplication.

As we have seen, taking the arithmetic mean of normalized values is equivalent to taking the weighted average, which is a multiply-then-add operation. Once the weights are assigned and the resulted weighted values averaged away, there is no easy way to get rid of them. Removing the weighted requires division—which is a form of multiplication, after all—and we cannot switch the order of multiplication and addition.

On the other hand, computing the geometric mean of normalized values is a divide-then-multiply operation. Since division is equivalent to multiplication (by the reciprocal), the entire operation consists of a sequence of multiplications, whose orders we can swap without changing the final answer. This is known as the commutative property of multiplication.

In summary, it is best to resist the temptation to take averages right away when faced with a task of comparing data about multiple items. Normalization can easily render arithmetic means meaningless, and geometric means perform far better in such cases.

As a matter of fact, it is possible to prove that the geometric mean is the only correct mean to use when averaging normalized values. If you are interested, take a look at Fleming/Wallace, "How Not To Lie With Statistics: The Correct Way To Summarize Benchmark Results" (Communications of the ACM, 1986) for details.

Thanks to Ahn Heejong for corrections!