Thursday, October 26, 2017

A minus times a minus is a plus

The reason why, we really should discuss.

So, why is the result of multiplying two negatives numbers together a positive number? There are various answers to that.
First, let's think a little about why we care.

The rules of the (arithmetic) game

We start off with the non-negative integers \[ \mathbb{N} = \{0,1,2,3,\ldots\} \] which we all learn to manipulate from a very young age. We know how to add and multiply them, and (at least sometimes) to subtract, at least as long as the number we want to subtract is no bigger than the one we want to subtract it from.
Then somebody comes along and says that when you subtract \(2\) from \(1\), the answer is \(-1\): a bit like \(2-1\), but with a minus sign in front of it. This is a new kind of number called a negative number, and they're useful for situations where you have to subtract a bigger number from a smaller one, and in certain unpleasant circumstances to describe the amount of money you have in the bank.
But now we have to ask: when we do arithmetic with them, how do they work?
One solution is just to present some rules, including the rule that the product of two negative numbers is the same as the product of their positive counterparts. In other words:

That's just the rule, isn't it?

Ooh, not at all satisfactory. The rule seems completely arbitrary. But did we really have any choice? Are there any constraints that determine this? What if we'd chosen the rule that the product of two negative numbers was negative?
Let's hold this thought for the moment and think about how \(\mathbb{N}\) behaves.
So, let's take a careful look at the non-negative integers, and what we know about them. In fact, there is a bunch of stuff that we know. I'm too lazy to write in an explicit multiplication symbol, so I'll use \(mn\) to represent the product of \(m\) and \(n\).
In the list below, \(m,n\) and \(p\) are arbitrary elements of \(\mathbb{N}\). Then
  1. \(m+n = n+m\) (addition is commutative)
  2. \(m+(n+p) = (m+n)+p\) (addition is associative)
  3. \(mn = nm\) (multiplication is commutative)
  4. \(m(np)= (mn)p\) (multiplication is associative)
  5. \((m+n)p = mp+np\) (multiplication distributes over addition)
  6. \(0+n = n\) (\(0\) is the additive identity)
  7. \(1n = n\) (\(1\) is the multiplicative identity)
  8. If \(m+n = p+n\) then \(m=p\) (cancellation law for addition)
  9. If \(mn=pn\) and \(n \neq 0\) then \(m=p\) (cancellation law for multiplication)
There are lots of other rules which we could write down, but these are all familiar, and the others follow from them, so we'll take them as a useful working collection.
In particular, there's one very familiar one which is conspicuous by its absence. It is the rule \(0n = 0\). I left it out on purpose, because we can show that it follows from the others, by the following argument:
We know that \(0+0=0\), because \(0\) is the additive identity. But that means that \((0+0)n=0n\), no matter what \(n\) is. By the distributive law, then, \(0n+0n=0n\), so that \(0n+0n=0n+0\), and then by the cancellation law, \(0n=0\).
There is an important lesson hiding in here. It is that our familiar rules aren't independent: once we have agreed on some of them, others are forced upon us. (So it's a good idea to choose a useful collection of rules to start with!)
This should raise an ugly possibility. What if my collection of rules isn't even consistent to begin with? How can I check that?
That's actually quite hard. Let's settle for the moment by accepting that there really is a set of non-negative integers with operations of addition and multiplication which satisfies the above rules. So there are lots of consequences from them that I haven't written above, but at least it all makes sense.
All this doesn't really help yet, but I can make use of it. Whatever the negative integers are, I want arithmetic to keep obeying the same rules as before. I want \(0\) and \(1\) to have their familiar properties, and I want the usual laws of algebra to continue to hold. So for every positive integer \(n\) I introduce a negative integer \(-n\) with the property that \(-n+n=0\). The set of all integers, positive, zero, and negative, is denoted \(\mathbb{Z}\). Now we can investigate what happens if we insist that all the rules above continue to hold.
Then \(\ldots\)

It follows from the algebra

Once we have all the stuff above, it then follows by a fairly short argument that \((-1)(-1)=1\), if we insist that all the above rules continue to hold when we allow \(m,n\) and \(p\) to be negative.
First, we all agree that whatever we mean by \(-1\), it has the property that \(-1+1=0\). Then we know that \[ (-1+1)(-1) = 0(-1) = 0 \] and so that \[ \begin{split} (-1+1)(-1)&=(-1)(-1)+1(-1)\\ &=(-1)(-1)+(-1)\\ &=0 \end{split} \] But now adding \(1\) to each side gives \[ (-1)(-1)+(-1)+1=0+1 \] so that \[ (-1)(-1)+ 0 = (-1)(-1)= 1 \] So if we insist that all the algebraic rules for non-negative numbers continue to hold, we have no option. A minus times a minus is a plus.
Well, now we can see that if we decide that a minus times a minus is minus, then the ordinary rules of algebra must break somewhere. So this isn't really an arbitrary rule, it's forced upon us by the requirement that the algebraic properties of \(\mathbb{Z}\) are the familiar algebraic properties of \(\mathbb{N}\).
We can see various other things, too: one very important one is that \((-1)n = -n\). The argument is fairly similar: \[ \begin{split} (-1)n & =(-1)n + 0\\ &= (-1)n + n + (-n)\\ &= (-1)n+1n + (-n)\\ &=(-1+1)n + (-n)\\ &=0n + (-n) \\ &= -n \end{split} \]
So, this tells us that if we extend the non-negative integers to include negative ones, and insist that algebra works as before, there are certain logical consequences, including the familiar a minus times a minus is a plus. If. But how do we know that these rules describe anything at all? What are these negative numbers that we have introduced, and how do we know that it is even possible to introduce negative numbers while preserving the algebraic rules?
You might be tempted to think I'm about to kick up a lot of dust and then complain about how hard it is to see, but when the dust settles the view is wonderful.
So here's the awkward question \(\ldots\)

What is a negative number anyway?

Negative numbers are there to be the solutions to equations of the form \[ m+x = n \] when \(m>n\). In particular, \(-n\) is the solution to \(n+x=0\), if this makes sense.
So, let's make sense of it.
We're going to do this a bit indirectly, thinking about pairs of integers \((m,n)\). Secretly we want to think of this pair as \(m-n\), but we don't yet have a meaning for this when \(m \lt n\).
So we're going to lump together certain pairs.
We say that the pairs \((m,n)\) and \((p,q)\) are equivalent if \(m+q=n+p\). A collection of pairs which are all equivalent to each other is called an equivalence class.
This is actually quite a familiar idea: it's analogous to thinking of \(3/4\) and \(6/8\) as different ways of representing the same fraction. There are infinitely many pairs of integers \((m,n)\) which we can use to do it, namely all the ones satisfying \(3n=4m\). We don't (often) talk explicitly about a fraction being an equivalence class of pairs of integers, but that's what's really going on.
We're going to do the same trick, but with pairs where it's the difference than matters, not the ratio.
We need to do three things now:
  1. Define addition on these equivalence classes of pairs.
  2. Define multiplication on these equivalence classes of pairs.
  3. Show how we can think of these equivalence classes as the familiar integers.


This is the easy one.
We define addition on pairs by \[ (m,n)+(p,q)=(m+p,n+q) \]
But we're not done yet, because the collection that the answer lies in should only depend on the collections of the original pairs, not on the specific pairs. We can check this.
So suppose that \((m,n)\) and \((M,N)\) are equivalent to each other, as are \((p,q)\) and \((P,Q)\). Then we need to check that \((m+p,n+q)\) is equivalent to \((M+P,N+Q)\).
First, we know that \[ m+N=n+M \] and \[ p+Q=q+P \] so adding these together we have \[ m+N+p+Q=n+M+q+P \] i.e. \[ (m+p)+(N+Q)=(n+q)+(M+P) \] which is exactly what we need.
So this operation of addition works in the sense that the equivalence class of the sum depends only on the equivalence classes of the pairs we add.
Multiplication is a bit tougher.


This is not so obvious, so let's cheat a little. We are thinking of \((m,n)\) and \(m-n\), and \((p,q)\) as \(p-q\). Then the product ought to be \[ \begin{split} (m-n)(p-q) &= mp-mq-np+nq\\ & = (mp+nq)-(mq+np) \end{split} \] So we use the definition \[ (m,n)(p,q)= (mp+nq,mq+np) \] We can quickly check that \[ \begin{split} (p,q)(m,n) &= (pm+qn,qm+pn)\\ &=(mp+nq,mq+np)\\ &= (m,n)(p,q) \end{split} \] so that this is commutative, but now we have to check that the result is also independent of the choice of pairs.
So we consider \((m,n)\), \((p,q)\) and \((P,Q)\) where \((p,q)\) and \((P,Q)\) are equivalent. Then \[ (m,n)(p,q) = (mp+nq,mq+np) \] and \[ (m,n)(P,Q) = (mP+nQ,mQ+nP) \] and we see that \[ \begin{split} mp+nq+mQ+nP &= m(p+Q)+n(q+P)\\ &=m(P+q)+n(Q+p)\\ &=mP+nQ+mq+np \end{split} \] so that the two answers are equivalent.
Nearly there.
In just the same way, if \((m,n)\) is equivalent to \((M,N)\), then \((m,n)(p,q)\) is equivalent to \((M,N)(p,q)\). Putting this together, \((m,n)(p,q)\) is equivalent to \((M,N)(p,q)\), which is equivalent to \((M,N)(P,Q)\), so the equivalence class of the product only depends on the equivalence classes of the pairs we are multiplying.
Now we know that we have a well-defined arithmetic on our (equivalence classes of) pairs. But we still need to see how we can think of this as supplementing the non-negative integers with the negative integers.
We make use of the fact that we can choose any representative we like for each equivalence class. So we use the representative of the form \((n,0)\) for all pairs of the form \((p+n,q)\) where \(n \geq 0\), and the representative of the form \((0,n)\) for the pairs of the form \((p,q+n)\) where \(n \gt 0\).
Using this, we can now see that \((m,0)+(n,0)=(m+n,0)\) and \((m,0)(n,0)=(mn,0)\), so the non-negative integers live inside this strange new object.
We also see that \((n,0)+(0,n)=(n,n)\), which is equivalent to \((0,0)\). If we think of \((n,0)\) representing \(n\), then \((0,n)\) represents \(-n\).
It is then tedious but straightforward to check that all the rules of algebra for the non-negative integers still work. (This is mathematics code for 'I can't be bothered to write it all out, but I don't see any reason why you shouldn't suffer.')
So, now we know that is is, indeed, possible to add negative integers into the mix while preserving the well-known algebraic structure of the non-negative integers, and that the argument up above for why \((-1)(-1)=1\) does actually apply.
Of course, now that we have defined multiplication of our pairs of positive numbers, we can also do this explicitly and see the same answer from an entirely different perspective. Given than \(-1\) is represented by \((0,1)\), we have \[ (0,1)(0,1)=(0+1,0+0)=(1,0) \] and \((1,0)\) represents \(1\) in our model.

What was the point of all that again?

After all this, you might wonder what the point of all that peculiar stuff about equivalence classes was. It just made a complicated picture of the positive and negative numbers, which everybody was happy with anyway. The snag is that you can't always just add a new kind of number into the mix and preserve the algebra that you like, but now we know that it can be done in this particular case of interest.
Maybe the most famous example of this is Hamilton's search for a three-dimensional version of complex numbers. You can start off with the real numbers, then include a new quantity \(i\) which satisfies \(i^2=-1\), and just do algebra with combinations of this \(i\) and the real numbers, and it all works: complex numbers make sense, and we can think of their algebra as a way of adding and multiplying two-dimensional vectors.
Hamilton spent some time trying to add in another quantity like \(i\) so get a three-dimensional version of algebra. In fact, that isn't possible. He eventually realised that it could almost be done in four dimensions, but you had to give up on the multiplication being commutative.
So there are problems with adding the new kind of numbers in and just assuming that the old rules of algebra will continue to hold. But now, after quite a lot of work, we can say with confidence that it is possible to introduce negative integers in a way which is compatible with the algebra of the non-negative ones. This is a Good Thing.

Food for thought

  1. I wrote down a multiplication rule which was motivated by me knowing how the answer should behave. Could I have chosen another rule with difference consequences?
  2. I wrote down a bunch of rules above, which are followed by the non-negative integers. But are there other sets of quantities which satisfy those same rules? What happens if I try this trick on these other sets?

Wednesday, August 2, 2017

Schroedinger's cat: not one thing and the other.

Schroedinger's cat is not alive, is not dead, and is not simultaneously alive and dead. It's more interesting than that.

Schroedinger's Cat

We're all familiar with the Schroedinger's cat (thought) experiment. A cat is placed in a box along with a sample of a deadly poison and a radioactive sample all hooked together in such a way that at a certain time the probability of the sample decaying, and releasing the poison resulting in a dead cat is exactly \(1/2\). The question then is, what is the state of the cat at just that time? The standard understanding of quantum mechanics tells us that if we have a lot of identical boxes of this type and we look inside them all, then half of them will contain a live cat and the other half will contain a dead cat. But what is the state of the cat before the box is opened?

It is quite common to see statements to the effect that the cat is simultaneously dead and alive up until the moment the box is opened and it is observed. But observing the cat forces it into one state or the other, and so we never see a cat simultaneously alive and dead: but if we do the experiment many times, we expect just to see a live cat half of the time and a dead cat the other half.

There are a couple of questions which spring pretty irresistibly to mind. The first is, what on earth does it mean for the cat to be simultaneously dead and alive? And the second is, if it is in fact simultaneously dead and alive, why don't we ever see that?

Decoherence-a digression

Let's address the second question first. One current understanding is that when the interior of the box interacts with the outside world, the state of the interior is forced into a well-defined classical state by a phenonemon known as decoherence. This gives a purely physical explanation for the 'collapse of the state', a central aspect of the Copenhagen interpretation of quantum mechanics, avoiding the earlier speculations about a conscious observer being required, and the consequent problems raised by Wigner's friend.

But this is not an easy question, nor one with a universally accepted answer. The issue has been vigorously debated for many years and will almost certainly continue to be for many to come. I am going to take the coward's way out, and avoid the question entirely. Instead, I will consider the notion of a state and what kind of state describes the cat in the box.

From now on, then, when I talk about 'observation', you can think about 'what I see when I look at the system' or 'the state of the system when decoherence due to interaction with the exterior world results in a classical state', as you prefer.

Quantum states

In the quantum mechanics picture of the world, the state of a system is described by a vector: this contains all the information that there is about the system, and the results of any measurement that can be made. In (fairly) general, the state of a system can be written \[ |\psi\rangle = \sum_i \alpha_i | \psi_i \rangle \] where each of the \(|\psi_i\rangle\) is a vector corresponding to a classical state of the system, such as 'alive' or 'dead' for the cat, or 'decayed' or 'not decayed' for the radioactive sample. The \(|\psi_i\rangle\) have unit length and are orthogonal to each other, and the \(\alpha_i\) are complex numbers whose squared moduli add up to \(1\). Finally, the probability of the system being in the \(i\)th state when observed is \(|\alpha_i|^2\).

The use of the notation \(| \;\; \rangle\) to represent a state-vector is due to Dirac, and is part of a fairly awful pun he was very fond of. If you think of \(|\psi\rangle\) as a column vector, then the corresponding row vector is \(\langle \psi |\). Dirac called the row vector a bra, and the column vector a ket, so that when you take the inner product of two vectors \(|\psi\rangle\) and \(|\phi\rangle\) you get \(\langle \psi | \phi \rangle\), a bra-ket or bracket. I said it was awful.

There's a long story here which I'm not going into. As time passes, the \(\alpha_i\) evolve according to Schroedinger's (differential) equation, which relates the rate of change of the \(\alpha_i\) to the surroundings. The details of how this works are, fortunately, irrelevant here.

In addition, there can be more than one way of expressing the state \(|\psi\rangle\) as a combination of classical states, and the relationship between these expressions is tied to the famous uncertainty principle of Heisenberg. We are doubly fortunate, because we don't need to worry about that here either.

The cat's state

So, what's actually going on inside the box? We have the radioactive sample, which at a certain time is in a quantum state which means that an observation will show decayed or not decayed with equal probability. Such a microscopic system being in a weird quantum state doesn't raise any immediate alarm bells, it's when the state of a macroscopic object (such as a cat, and whether the cat is alive or dead) is weird in this way that we start to worry.

Looking at this again, it seems reasonable that we can identify the two classical states of the radioactive samples as 'decayed' and 'not decayed', and the relevant aspects of the cat's state as 'dead' (if the sample has decayed) and 'alive' (if the sample has not yet decayed). I'll use \(|a\rangle\) to represent the state 'alive' and \(|d\rangle\) to represent 'dead'.

So if the cat is in the state \(|a\rangle\) then the cat will certainly be alive when it is observed; likewise, if it is in the state \(|d\rangle\) then it will certainly be dead when observed. So you can see that it is not true that observation of a quantum system unavoidably changes its state. That depends on the state and the type of observation: the uncertainty principle is much more fundamental than the idea that poking a system to see where it is pushes it about a bit. But that's another story.

But in general the cat is not in either of these states. Instead, it is in some state of the form \[ |\psi\rangle = \alpha |a\rangle + \beta |d\rangle) \] where \(|\alpha|^2+|\beta|^2=1\). In this state, the probability that the observation reveals a live cat is \(|\alpha|^2\), and the probability of a dead cat is \(|\beta|^2\). In the classical thought experiment, we have both of these probabilities equal to \(1/2\), so one possible state would be \[ |\psi\rangle = \frac{1}{\sqrt{2}}(|a \rangle + |d \rangle) \]

So what is the state of the cat here? It is not alive. That would mean it was in the state \(|a\rangle\). It is not dead. That would mean it was in the state \(|d\rangle\). It certainly isn't simultaneously alive and dead, any more than I can be simultaneously in the living room and in the hallway. It also isn't in some kind of intermediate state, such as when I am standing in the doorway with one foot in the living room and one in the hallway. It is in a different kind of state, one without a classical analogue, and this kind of state is called a superposition.

To say that the cat is simultaneously alive and dead is like saying that somebody pointing north-east is simultaneously pointing north and east. They aren't: but the direction they are pointing in has a northern and an eastern component. In the same way, the cat isn't simultaneously alive and dead: but the state it is in has an 'alive' and a 'dead' component.

This actually has enormous consequences. The different components of the state can evolve indepenently, almost as if there are two universes involved, in one of which the cat is alive, and in the other of which it is dead. Developing this idea leads to the many worlds, or Everett-Wheeler model of quantum mechanics, which Sean Caroll describes here.

In any case, if we take quantum mechanics seriously, and the evidence for it it is pretty compelling, then we have to learn to live with and accept (even if we can never be entirely comfortable with) the idea that the state of a system might not be a classically recognized state, but rather a superposition of such states. We even have to accept that the acceptable states may not quite correspond to what we think of as classically acceptable states. They are more general in some respects (we have to allow for superpositions of classical states), but more restricted in others (there is no quantum state for a particle with a simultaneously well-defined position and momentum).

Decoherence again

All this has been about what the quantum state of the (unobserved) cat in the box is. But maybe you aren't any happier with the cat being in a superposition of states than with the cat somehow being in two states at once. There is a possible way of avoiding this, and it is to argue that when the radioactive sample interacts with the rest of the apparatus in the box (including the cat), that already causes decoherence, so at box opening time what we actually have is a box with either a dead cat or a live cat in it: or again, if we did it many times, a collection of boxes of which about half contain live cats and half contain dead one.

It would be easy to give the impression that decoherence is supposed to be the cure for all our problems with why macroscopic systems look classical, rather than quantum. It would also be wrong: I'm not going to go into any detail at all here, but if you've made it this far you can probably get quite a lot from reading (at least substantial chunks of) Schlosshauer's review article.

Sunday, July 9, 2017

The RIGHT way to think about derivatives

At secondary school and then university, we meet a variety of definitions of the derivative, all motivated by the fundamental idea that the derivative is telling us something about the rate of change of some quantity as another varies. The definitions are likely to start with a fairly intuitive one and become more rigorous, incorporating the notion of limit. Unfortunately, as the rigour and degree of precision grow, it is possible for the student to become lost in a thicket of notation, and to lose sight of just what they are studying. However, I claim that if one thinks of the derivative the right way, then one has a fully rigorous concept of derivative which keeps fruitful contact with the intuitive one.

Let's take a look at (some of) these definitions, before settling on the right one.

A menu of definitions

The slope of the tangent to a graph

This is a fairly straightforward idea. You start off with the graph of a function, say (just to be conventional) \(y=f(x)\), and at some chosen value of \(x\) you draw the tangent to the curve. Then the gradient is how fast the height of the graph changes as \(x\) changes, so that sounds good.

At least, it sound good until you start thinking about just what you might mean by 'tangent'. If the curve is a circle, that's pretty unambiguous: you take the line perpendicular to the radius at the point of interest, and we all know that is what is meant by the tangent.

Actually, since I am talking about the tangent to a graph, I've just used the upper half of the circle, given by \(y=\sqrt{1-x^2}\) for \(-1 \leq x \leq 1\).

But what if the graph isn't (part of) a circle?

We could try 'the tangent to \(y=f(x)\) at \((x,f(x))\) is the line that touches the graph at just the point \((x,f(x))\), without crossing it'. That works for circles, but now also includes some other well-known curves, including the parabola and hyperbola.


So we have generalized the notion of tangent usefully, to include some new curves. But what about this?

It looks like what we want, but cuts the graph in more than one place. But somehow that looks all right: the 'bad' intersection is a long way away from the point where we want the tangent. Clearly being a tangent is a local propery, so we can fix this by saying 'the tangent to \(y=f(x)\) at \((x,f(x)\)) is line that touches the graph at just one point in the vicinity of \((x,f(x))\) without crossing it'.

But what about this?

The straight line certainly looks like a tangent, except that it crosses the graph at the point of tangency. There's no obvious way of retaining the notion of 'touching the graph without crossing it' here.

So eventually we accept that this is going to be hard to fix, and that something a little subtler is required.

The chord slope definition

Instead of thinking directly about the tangent to a point on the graph, we think of an approximation to it, and see how that can lead to a notion of tangent.

So although we want the tangent at \((x,f(x))\) we decide to look instead at the straight line joining two points on the graph: \((x,f(x))\) and \((x+h,f(x+h))\), where we insist that \(h \neq 0\).

The gradient is \[ \frac{f(x+h)-f(x)}{(x+h)-x} = \frac{f(x+h)-f(x)}{h} \] and it seems entirely reasonable (because it is) to say that the slope of the tangent is the limiting value of this as \(h\) gets closer and closer to \(0\). So the tangent to \(y=f(x)\) at \((x,f(x))\) is the line through \((x,f(x))\) with this gradient. At first exposure, we might not go into too many details of just what we mean by the limiting value, sticking with examples where no obvious complications arise.

This is a meaningful definition. Using it we can calculate the derivative of a variety of simple functions, such as powers of \(x\) and, with the help of a little geometric ingenuity, the trig functions \(\sin\) and \(\cos\).

It's also a sensible definition, because it really does seem to do the job we want to do. It doesn't make sense to say it's 'correct', because this defines the notion. But it does have the properties we'd hope for, and it's something we can calculate with. It's worth noticing that in the process we have shifted from defining the derivative in terms of the tangent to defining the tangent in terms of the derivative!

I claimt that even so, it isn't the right way to think about the derivative.

The formal limit definition

After spending some time with the chord slope definition, and informally working out some simple limits, it's usually time to give a more rigorous idea of what is going in here.

We then introduce the following incantation: \[ \lim_{x \to a}f(x) = L \] means \[ \forall \epsilon>0, \exists \delta>0 \text{ such that } 0 \lt |x-a| \lt \delta \Rightarrow |f(x)-L| \lt \epsilon \] At first sight, and especially for the novice, this is likely to cause a great deal of fear and trembling. It is a complex statement which makes very precise a simple idea. The idea is that we can make \(f(x)\) as close as we want to \(f(a)\) by making \(x\) as close as we have to to \(a\).

With this in hand, we can tighten up some of what is done with the chord slope idea by writing \[ f'(x)= \lim_{h \to 0} \frac{f(x+h)-f(x)}{h} \] It isn't really anything new, it's just a more precise way of stating the previous idea.

There's also the minor variation that \(f\) has derivative \(f'(x)\) at \(x\) if \[ \lim_{h \to 0} \left| \frac{f(x+h)-f(x)}{h} - f'(x) \right| = 0 \]

Working with these, including showing that they are equivalent, gives some practice in understanding the formal statement of what is meant by a limit, and how that formal definition is used. But I also claim that this is not the right way to think about the derivative.

Best Linear Approximation

Once the derivative has been defined and calculated in some tractable cases, various theorems about derivatives are presented. Here's one that gets used a lot. \[ f(x+h) \approx f(x)+hf'(x) \] or rather, with some foresight, \[ f(x+h)-f(x) \approx f'(x)h \] Now, as it stands, this says less than you might think. As long as the function \(f\) is continuous at \(x\), then for very small \(h\), \(f(x)\) and \(f(x+h)\) are very close together, and \(f'(x)h\) is also very small, so the the approximation gets better and better as \(h\) approaches \(0\), no matter what value we use for \(f'(x)\). But this is just saying that both sides are approaching \(0\), and this obviously isn't all we mean by that approximate formula.

Let's tease it out by looking a little more carefully. Instead of having an approximate equality, we will have an exact equality which includes an error term. \[ f(x+h) - f(x) = f'(x)h+e(h) \] where \(e(h)\) is the error, which in general will depend on the size of \(h\). Now, it turns out that if \(f\) is differentiable at \(x\), with derivative \(f'(x)\), then as \(h\) gets closer and closer to \(0\), not only does \(e(h)\) get arbitrarily close to \(0\), but \(e(h)/h\) also gets arbitrarily small. When this happens I say that \(e(h)\) is suitably small.

In other words, for very small \(h\), the error is a very small fraction of \(h\), and we can make that fractional error as small as we want by choosing \(h\) small enough.

It's this behaviour of the error term that makes it all work. If we choose any other value, say \(K\) instead of \(f'(x)\) and try to use that instead, we have \[ f(x+h) - f(x) = Kh + E(h) \] where \(E(h)=(f'(x)-K)h +e(h)\). But for very small \(h\), we see that \[ \frac{E(h)}{h}=f'(x)-K + \frac{e(h)}{h} \] and we cannot make this small by making \(h\) small. So the special property of \(f'(x)\) is that \(f'(x)h\) is the best linear approximation to \(f(x+h)-f(x)\).

To make contact with the usual limit definition, we note that there is a best linear approximation if and only if the error term is suitably small.

I claim that this is the best way to think about the derivative.

What's so great about it?

The first thing to say is, it doesn't make it any easier to calculate a derivative. That's still just the same as it always was.

What's so great about it is that it gives insight into how derivatives behave. This is not to say that it makes the proofs of behaviour easier: but it does make the results easier to see and understand.

Differentiation rules

We learn the various rules for differentiating combinations of functions in our first calculus course: linearity, the product, quotient and chain rules. Let's see how this point of view does with them.

In each case, we'll think about two functions \(f\) and \(g\), and use \(e_f\) and \(e_g\) for the corresponding errors.


Suppose we have two real numbers \(\alpha\) and \(\beta\). Then \[ \begin{split} (\alpha f+\beta g)(x+h)-(\alpha f+\beta g)(x) &= \alpha f(x+h)+ \beta g(x+h) - \alpha f(x)-\beta g(x)\\ &=\alpha(f(x+h)-f(x))+\beta(g(x+h)-g(x))\\ &= \alpha (f'(x)h+e_f(h))+\beta(g'(x)h+e_g(h)\\ &=(\alpha f'(x)+\beta g'(x))h +\alpha e_f(h)+\beta e_g(h)\\ &=(\alpha f'(x)+\beta g'(x))h + e(h) \end{split} \] where \(e(h)\) is obviously suitable small.

So differentiation is linear.

Product rule

\[ \begin{split} (fg)(x+h)-(fg)(h) &= f(x+h)g(x+h)-f(x)g(x)\\ &= (f(x)+f'(x)h+e_f(x))g(x+g'(x)h+e_g(x))-f(x)g(x)\\ &= (f'(x)g(x)+f(x)g'(x))h +f'(x)he_g(x)+g'(x)he_f(x)\\ &= (f'(x)g(x)+f(x)g'(x))h + e(h) \end{split} \] where \(e(h)\) is still fairly obviously suitably small.

Quotient rule

Actually, I'll cheat slightly. Assuming that \(f(x) \neq 0\), we have \[ \begin{split} \frac{1}{f(x+h)} - \frac{1}{f(x)} &= \frac{1}{f(x)+f'(x)h+e_f(h}-\frac{1}{f(x)}\\ &=\frac{-f'(x)h-e_f(h)}{f(x)(f(x)+f'(x)h+e_f(h))}\\ &= -\frac{f'(x)}{f(x)(f(x)+f'(x)h+e_f(h))} - \frac{e_f(h)}{f(x)(f(x)+f'(x)h+e_f(h))} \end{split} \] which is less obviously, but quite plausibly \[ -\frac{f'(x)}{f(x)^2}h + e(h) \] where \(e(h)\) is suitably small.

Then the quotient rule is just the product rule combined with this one.

Chain rule

And finally, we have \[ \begin{split} f(g(x+h))-f(g(x)) &= f(g(x)+g'(x)h+e_g(h))-f(g(x))\\ &= f(g(x))+f'(g(x))(g'(x)h+e_g(h)) -f(g(x))\\ &= f'(g(x))g'(x)h+e(h)) \end{split} \] where again it is not immediately obvious, but is quite plausible that \(e(h)\) is suitably small.

Insight, not proof

So we can see that in each case, assuming that the behaviour of error terms is not unreasonable, this idea of the derivative as the best linear approximation to the change in the function value gives us insight into why the various rules for differentiation work as they do.

But there's more

This works very well. The mental picture of a best linear approximation is much more meaningful than that of some limiting value, and the arguments involving it are rather more intuitive than those involving the usual limit definition.

But, as is often the way with a good idea, we get more than we seem to have paid for.

Functions of several variables

Suppose that now instead of having a function \(f:\mathbb{R} \to \mathbb{R}\) we have \(f:\mathbb{R}^n \to \mathbb {R}\). The picture is much the same except that \(x\) and \(h\) are now vectors with \(n\) components, and I will adopt the standard convention that they are represented as column vectors. So we can still say that \(f\) is differentiable at \(x\) with derivative \(f'(x)\) if this \(f'(x)\) has the property that \[ f(x+h)-f(x)=f'(x)h +e(h) \] where \(e(h)\) is suitably small.

But now we can think about what sort of thing this \(f'(x)\) actually is. \(h\) is a vector with \(n\) components, and so \(f'(x)\) must be a linear mapping from the space of these vectors to \(\mathbb{R}\), in other words it must be a row vector, or covector, with \(n\) components.

This is where we first see why I wrote \(f'(x)h\) rather than \(hf'(x)\) previously. When \(f\) was a real valued function of a real variable, it made no difference, but now the order is significant, and is determined by the requirement that we end up with a scalar quantity.

The next obvious question is, what are these components of \(f'(x)\)? We can answer this by choosing \(h\) carefully. To pick out components of vectors, I'll use a superscript for a component of a column vector and a subscript for a component of a row vector. Then we have \[ f(x+h)-f(x) = f'(x)h +e(h) = f'(x)_i h^i +e(h) \] which tells us that \[ f'(x)_i = \frac{\partial f}{\partial x^i}(x) \]

So our definition of 'derivative' naturally produces the object we learn to call the gradient of \(f\) in vector calculus (and maybe gives a little more explanation for the name).

In fact, it tells us more. It tells us that partial derivatives are really components of the best linear approximation, i.e. the actual derivative of the function. The partial derivatives are not best thought of as objects which we define as ordinary derivatives 'holding all but one variable constant' but as components of a (row) vector naturally associated with the function.

Again, thinking of the derivative as the best linear approximation gives us a deeper insight into a chunk of familiar calculus. As a bonus, it also gives us our first glimpse of the fact that the derivative of a function of this type is not a vector, but is a covector. This is a distinction that grows in importance as one generalizes from functions on \(\mathbb{R}^n\) to functions on a surface in \(\mathbb{R}^n\), and finally to functions on a manifold which may or may not be equipped with a (semi-)Riemannian metric. It is a gift which keeps on giving.

Tangent (velocity) vector

In the last example, we looked at functions which associate a value to each point of space. This time, we'll do the opposite, and think about functions which associate a point in space to each value of some parameter which we can think of as time. So we have functions of the form \(f:\mathbb{R} \to \mathbb{R}^n\), and use \(t\) for the argument of the function.

We can now play the same game and see where it takes us. \(f\) is differentiable with derivative \(f'(t)\) if \[ f(t+h)-f(t) = f'(t) h + e(h) \] where \(e(h\) is suitably small.

This time, we see that \(h\) is just a number, and so \(f'(t)\) really is a vector with \(n\) components. And by looking at the components of \(f(t)\), we see that \[ f'(t)^i = \frac{d f^i}{dt}(t) \] and so the derivative of \(f\) really is the velocity of a point whose position is given by as a function of \(t\) by \(f(t)\).

It's useful to note that the character of \(f'\), i.e. whether it is a scalar quantity, or a vector, or a covector, is entirely determined by the dimensions of the domain and codomain of \(f\). We don't choose whether to make a vector or a covector out of the derivatives of \(f\); that is decided by the fact that \(f'\) is the best linear approximation.

The whole hog

Let's combine the two ideas up above, and consider \(f:\mathbb{R}^n \to \mathbb{R}^m\), so now \(f(x)\) is a vector with \(m\) components and \(x\) is a vector with \(n\) components. As always, \(f\) is differentiable with derivative \(f'(x)\) if \[ f(x+h)-f(x) = f'(x)h + e(h) \] where \(e(h)\) is suitably small.

Now, \(f'(x)\) is a linear transformation which takes a vector with \(n\) components and produces a vector with \(m\) components. In other words, it is an \(m \times n\) matrix. Using my previous notational device of labelling rows with superscripts and columns with subscripts, we now have \[ f'(x)^i_j = \frac{\partial f^i}{\partial x^j} \] and the hidden agenda behind the placement of the indices now becomes slightly exposed. (But trust me, there is plenty still lurking behind the tapestry.)

So the power of thinking of the derivative as the best linear approximation should now be rather clearer: we can't always use the chord-slope definition, though we can generally pick an aspect of the set-up where it can be used to define, say, partial derivatives. But the best linear approximation makes sense in great generality, and the familiar objects built out of partial or component derivatives can now be seen as the natural objects, and the partial or component derivatives as analogous to the components of vectors.


In general, we can't multiply, divide, or compose functions. But whenever we can, the rules for differentiating combinations work in just the same way. In fact, if you revisit the (plausibility) argument for each rule, you can see that all that matters is that the algebraic operation makes sense. Although they were written with functions of the form \(\mathbb{R} \to \mathbb{R}\) in mind, in fact that is irrelevant.

Probably the most powerful aspect of this comes from the chain rule. It tells us that the derivative of the composition is the product (in general, a matrix product) of the derivatives of the functions. A beautiful consequenc is that if \(f,g:\mathbb{R}^n \to \mathbb{R}^n\) are inverses of each other, so are the derivatives. So the partial derivatives of \(f\) are not the inverses of the partial derivatives of \(g\) (a standard error of students in advanced calculus courses), but the matrices built out of the partial derivatives are in inverse relation to one another.

Algebra for and from calculus

So we are just beginning to see that what we know about linear algebra (vector spaces, dual vector spaces, linear transformations) is relevant to calculus. Since the derivative is the best linear approximation, and derivatives of combinations of functions are linear algebraic combinations of the derivatives, we can use the apparatus of linear algebra to tell us about the (local) behaviour of non-linear functions.

In some ways, this is the entire point of differential calculus: it is a tool for turning the difficult analysis of nonlinear functions into the relatively easy analysis of linear ones. The identification of the derivative as the best linear transformation is what makes this all work.

This all extends to even more general situations. The spaces involved don't actually have to be finite dimensional, as long as there is a way of talking about the size of the vectors.

It leads on to, for example, ways of considering fluid flow, where the velocity field of the fluid can be thought of as the tangent velocity to a curve in an infinite dimensional space of fluid configurations, and of thinking about the Euler-Lagrange equations in the calculus of variations as the derivative of a functional, to name just two.

Simply the best?

This gives some of the reasons I think that the right way to think of the derivative is as the best linear approximation. It both gives insight into what the rules for derivatives of combinations of functions are actually saying, and makes the various generalizations easier to understand. It lets us see how we can use linear algebra to understand aspects of the behaviour of nonlinear functions. It generalizes still further, in ways that I have just hinted at.

But it isn't the right way for everybody. I don't think it's the right way for it to be introduced to novices, or indeed for novices to try to think about it. It really makes sense best when you already have a bunch of different-looking bits of differentiation at hand, which seem to be myusteriously linked together, when it provides a unifying framework to remove the mystery. Then it lets things drop into place together, and you can see how they're all really different aspects of the same thing. But you do have to go through a certain amount of getting to grips with the more traditional approaches in order to have a mental place for this to live.

Wednesday, June 28, 2017

Finding greatest common divisors and all that.

Greatest Common Divisor

The problem is simple: given two positive integers, \(a\) and \(b\), how do we find \(\gcd(a,b)\), the greatest common divisor of \(a\) and \(b\) (i.e. the largest integer that \(a\) and \(b\) are both multiples of)?

Just about everybody learns how to do this at school. You break each of \(a\) and \(b\) down into a product of prime factors (where a prime number is one with exactly two factors, itself and \(1\)) and then multiply together all the common prime factors to get the biggest number that each is a multiple of. But this actually relies on quite a deep result. The procedure works because The Fundamental Theorem of Arithmetic says that any positive integer can be expressed as a product of primes, and it can only be done in one way (except for the order of the factors).

So, there are two issues here.

The first is in plain sight. It is the question of why you should believe the fundamental theorem of arithmetic. This might not seem like an issue. In fact, after a lot of experience in factorizing integers you may well feel that it is pretty obvious. But it isn't, and part of the intention of this post is to persuade you that it isn't.

The other is hidden, but comes into view when the numbers \(a\) and \(b\) get larger. It is the problem that working out the prime factorization of a number is really quite hard when the number is large.

There is, fortunately, an answer for both of these: Euclid's algorithm gives us a different way to compute the greatest common divisor of two integers, and, as a bonus, we can use it to prove that every integer has a unique prime factorization. This algorithm is to be found in Euclid's Elements, and is probably the oldest non-trivial algorithm in significant current use, being a vital ingredient in the RSA public key cryptosystem.

Euclid's Algorithm

The algorithm relies on a familiar fact from basic arithmetic, which goes by the appallingly inappropriate name of the division algorithm. It isn't an algorithm: an algorithm is a set of step-by-step instructions which tell you how to do something. This just tells you that a certain type of question has a particular type of answer. A couple of sources call it the division theorem because of this, but, alas, the terminology is very well established. I would rather be right than popular, though, so I will call it the division theorem.

But enough whining. What does this result say?

The Division Theorem If \(a\) and \(b\) are positive integers, then there is a unique positive integer \(q\), and a unique non-negative integer \(r \lt b\) such that \(a=qb+r\).

In other words, if you divide \(a\) by \(b\) you get a quotient and remainder, and there's only one right answer. This is something so entrenched in our experience that we take it very much for granted, and that's just what I'm going to do. But it has consequences.

If the remainder, \(r\), is zero, then \(a\) is a multiple of \(b\): we also say that \(b\) divides \(a\).

Back to the original question: I have two positive integers, \(a\) and \(b\), and I want to know the biggest number that divides into both of them. Let's suppose that \(a \gt b\). (If not, I can swap them over). Now, by the division theorem, I know that I can write \(a\) as \(qb+r\), where \(0\le r \lt b\). What's more, if \(d\) is any number that divides \(a\) and \(b\), since \(a-qb=r\), \(d\) divides \(r\). And since \(a=qb+r\), if \(d\) divides into \(b\) and \(r\), it also divides \(a\). That means that \(b\) and \(r\) have exactly the same common divisors as \(a\) and \(b\).

But something really useful has happened. I've replaced the problem of finding the greatest common divisor of \(a\) and \(b\) by finding the greatest common divisor of \(b\) and \(r\), and \(r\) is smaller than \(b\), which is smaller than \(a\). I've made the problem easier.

There are two possibilities: if \(r=0\), then \(b\) divides into \(a\), so it is the greatest common divisor of \(a\) and \(b\). If not, I can repeat the trick. Eventually the remainder will be \(0\), and I will have the greatest common divisor of my original \(a\) and \(b\).

This procedure is the essence of Euclid's algorithm. Let's see how it works with an example.

What is \(\gcd(60,25)\)? We calculate as follows: \[ \begin{split} 60 &= 2 \times 25 + 10\\ 25 &= 2 \times 10 + 5\\ 10 &= 2 \times 5 \end{split} \] This tells us that \(\gcd(60,25)=5\). But we get something not at all obvious from it. We can rewind the calculation like this: \[ \begin{split} 5 &= 25 - 2 \times 10\\ &= 25-2 \times (60-2 \times 25)\\ &= 5 \times 25 - 2 \times 60 \end{split} \] This tells us that we can express \(5\) as a combination of \(60\) and \(25\). But there's nothing special about these numbers: I could have rewound any appliation of Euclid's algorithm in just the same way.

So, given any \(a\) and \(b\), we can use Euclid's algorithm to find \(\gcd(a,b)\), and also to find integers \(m,n\) with the property that \(\gcd(a,b)=ma+nb\). This wasn't at all obvious how to do from the prime factorization approach. In fact, with the prime factorization approach it would never have occurred to me that this could even be done.

Thinking about this approach to finding the gcd of two numbers, we can see that \(\gcd(a,b)\) isn't just the biggest number that divides into both \(a\) and \(b\): it is actually a multiple of any other divisor. The reasoning is just what was used above: any divisor of \(a\) and \(b\) is also a divisor of \(r\), the remainder. Repeating this, we eventually find that it must be a divisor of the last non-zero remainder, i.e. it must be a divisor of \(\gcd(a,b)\).

You might wonder who cares? I think there are two immediate reasons to care, namely that this addresses both of the issues I raised earlier. The first reason is that you don't have to take it on trust that prime factorization works as advertised, since you can see just why this works. The second is more practical. It is that for large numbers, this algorithm is very fast: enormously fast compared to the best known algorithms for factorizing large numbers, which is important. It is the fact that factorizing large integers is (as far as we know) expensive while finding the greatest common divisor is cheap that makes RSA public key cryptography work.

There are other, less immediate reasons too. Using this, we can actually prove that prime factorization works as advertised. But before getting there, we need to see something useful that this tells us about prime numbers.

Prime numbers

Suppose that \(p\) is a prime number, and \(a\) is any number which is not a multiple of \(p\). Then we must have \(\gcd(a,p)=1\), which tells us that there are integers \(m,n\) such that \(ma+np=1\). Now comes the clever bit.

Let \(p\) be a prime number, and suppose that \(a\) and \(b\) are integers such that \(ab\) is a multiple of \(p\). Then either \(a\) or \(b\) must be a multiple of \(p\).

"But that's obvious!" you may be tempted to object. But if you think it's obvious, think about why: it's almost certainly because you're so accustomed to how prime factorization works. Unfortunately that's exactly the thing that I want to prove.

So, let's suppose that \(ab\) is a multiple of \(p\). If \(a\) is a multiple of \(p\), we're done. But what if \(a\) is not a multiple of \(p\)? Well, in that case \(\gcd(a,p)=1\), so there are integers \(m,n\) such that \(ma+np=1\). Now multiply this equation by \(b\). We get \[ b=mab+npb. \] But \(ab\) is a multiple of \(p\) and \(npb\) is a multiple of \(p\), so their sum - which is \(b\) - must also be a multiple of \(p\). That's what we needed.

Now we have everything we need to prove the

Fundamental Theorem of Arithmetic

At last, the proof

We can now consider an integer, \(n\). There are two possibilities: either \(n\) is a prime, or \(n\) is the product of two smaller factors. Then we think about each of them in turn. Eventually, we must reach a point where each of our factors is a prime.

But this is only part of the problem, and the easy part at that. How do we know that there is only one way to express \(n\) as a product of primes?

This is where we use that last fact. Suppose we have two prime factorizations of \(n\), say \[ n=p_1p_2 \ldots p_k = q_1q_2 \ldots q_l \] Then \(p_1\) is a factor of \(q_1q_2 \ldots q_l\). If \(p_1\) is not a factor of \(q_1\) (in which case \(p_1=q_1\)), then \(p_1\) is a factor of \(q_2\ldots q_l\), and by repeating this argument we find that \(p_1\) must be one of the \(q_i\). Then divide \(n\) by \(p_1\) and repeat the argument. Eventually we find that each of the prime factors in either factorization must also be a factor in the other factorization, so the two are actually the same (except possibly in a different order).

What is a prime number again?

I rushed past this idea up at the top, with the brief statement that a prime number is one with exactly two factors, itself and \(1\). This isn't quite satisfactory if we allow negative numbers, so let's be a little more general than that, and try to give a definition which will work in more generality.

We say that a number is a unit if it has a multiplicative inverse, and then that a factorization is proper if no factor is a unit. Then a prime is a number with no proper factorization. So this works when we allow negative numbers, since although we can write \(3=1\times 3 = -1 \times -3\), in each case one of the factors is a unit.

But this property wasn't really what we used to prove the fundamental theorem of arithmetic. We used this to show that any integer has a prime factorization, but not to show that it was unique. For that part of the argument, the important fact was that if a product is a multiple of a prime, then at least one of the factors in the product must be a multiple of the prime.

In fact there two ways of characterizing prime numbers.

One characterization is that \(p\) is prime if it has no proper factorization.

The other is that \(p\) is prime if, whenever \(ab\) is a multiple of \(p\) then at least one of \(a\) and \(b\) must be a multiple of \(p\). We saw above that every prime has this property: we just have to make sure that any number with this property is prime. But if \(n\) has a proper factorization into \(ab\) then \(n\) divides into \(ab\), but since \(n\) is larger than both \(a\) and \(b\) neither is a multiple of \(n\), so no number with a proper factorization can satisfy this property.

But it doesn't have to be like that.

Another number system

The integers aren't the only number system we could work with. For example, we might consider the Gaussian integers \(\mathbb{Z}[i]\), i.e. the complex numbers whose real and imaginar parts are both integers. More interestingly, we could, and we will, do something rather more exotic. We consider numbers of the form \(m+n\sqrt{-5}\), where \(m\) and \(n\) are both integers, and call this collection \(\mathbb{Z}[\sqrt{-5}]\). Note that as for the integers, the units here are just \(\pm 1\).

We now get something happening which cannot happen with the integers.

First, notice that \(|m+n\sqrt{-5}| = \sqrt{m^2+5n^2}\), and since these objects are complex numbers, if \(M,N \in \mathbb{Z}[\sqrt{-5}]\), \(|MN|=|M||N|\). Then in this number system, just as in \(\mathbb{Z}\), neither \(2\) nor \(3\) has a proper factorization; and neither do \(1 \pm \sqrt{-5}\). You can check this in each by writing down all the numbers with modulus less than the target value, and checking that you can't get the target by multiplying any collection of them together (unless one is a unit). But \[ 2 \times 3 = (1+\sqrt{-5})(1-\sqrt{-5}) = 6. \] So in this number system we see that \(1+\sqrt{-5}\) is a factor of \(2 \times 3\) but it is not a factor or either \(2\) or \(3\).

So in this case we do not get a unique factorization. It can't really be obvious that there is only one way to factorize a number into a product of terms which can't be factorized any further, because there are number systems where it doesn't happen!

Prime versus irreducible

In this more general context, it is necessary to split up our two notions of prime, because they no longer coincide. The standard way to do this is to call a number with no proper factorization an irreducible number, and to define a prime to be a number which has the property that if it is a factor of \(ab\) then it must be a factor of at least one of \(a\) or \(b\). (This is why on your first encounter with abstract algebra, the definition of prime can seem to have nothing to do with the familiar prime numbers.)

Then any prime number must be irreducible, but a number can be irreducible without being prime. For each number to have a unique prime factorization, it must also be the case that every irreducible number is prime.

Moving on

This may have already been more than enough for you. But if not, there are lots of extenstions to play with here.

We could play the game again with the Gaussian integers, and it turns out that again there is a unique factorization into irreducibles. And we could consider variations, such as numbers of the form \(m+n\sqrt{-k}\) where \(k\) is a positive integer. How does the choice of \(k\) affect the behaviour? We already know that when \(k=5\), unique factorization fails. But is this typical, or does it happen for other vaues? Is there a property of \(k\) that we can calculate that tells us?

We could try with polynomials with real coefficients: it turns out that division of polynomials is similar enough to division of integers that we have a division theorem and a Euclidean algorithm, that prime and irreducible coincide, and that any polynomial can be expressed uniquely as a product of primes (which here means linear factors or quadratics with no roots). We could ask how things change if we take different types of coefficients, say complex, or rational, or integer.

These two alone give a large toy box to play with and explore these ideas, but they're only a start. Explore, investigate, make up your own variations. If all else fails, look in a book on abstract algebra or ask the internet.

Calculating gcd redux

So, after all that, how should you actually calculate the gcd of two integers?

The answer is, as it almost always is, "it depends". If the numbers are small enough to be factorized easily, using the prime factorization is probably the quicker of the two. If they are large enough that factorization is a pain, then Euclid's algorithm will do the job with less pain.

And I think it's worth calculating a few both ways just to see how the same result is arrived at by such very different routes.

Tuesday, June 20, 2017

When i grows up...

...\(i\) wants to be a quaternion.

We can think of the complex numbers as two dimensional vectors, with a rule for multiplication. This multiplication rule is actually very well behaved, and for some time mathematicians tried and failed to find a similarly well behaved multiplication rule for three dimensional vectors. In fact the failure was not due to any lack of ability: it's impossible to find such a well-behaved multiplication rule for these vectors. Then William Rowan Hamilton had a flash of inspiration which gave him the insight that although a multiplication rule couldn't be found for three dimensional vectors, it could be done for four dimensional vectors. He was so excited by the realization that he carved the basic algebraic identities into the stonework of the bridge he was crossing at that time. The structure resulting from the multiplication rule he discovered is the quaternions.

Hamilton then devoted an immense amount of time and effort developing the mathematics of these quaternions and their physical applications. Unfortunately, he was far less effective a communicator of mathematics then he was an inventor (or discoverer) and quaternions were not adopted by the mathematical community.

So what is so nice about the complex number multiplication, what are the quaternions, and how well do the nice properties of the complex numbers generalize to them?

Geometry of complex multiplication

Multiplication of complex numbers is very rich geometrically. Probably the simplest observation we can make has to do with multiplication by \(i\). If \(z=a+ib\), so that \(z\) corresponds to the point \((a,b)\) in the plane, then \(iz=-b+ia\), which corresponds to the point \((-b,a)\): so multiplication by \(i\) has rotated this point by a right angle about the origin.

This is pretty, and suggests that a closer look is warranted, so let's take one.

The complex conjugate of \(z\) is \(\overline{z}=a-ib\), and the modulus of \(z\) is \[ |z| = \sqrt{z\overline{z}} = \sqrt{a^2+b^2}. \] Thinking of \(z\) as the point \((a,b)\), \(|z|\) is the distance to the origin, so it tells us the size of \(z\).

We also see that \(1/z=\overline{z}/|z|^2\) so that every complex number apart from zero has a multiplicative inverse. An excellent start. But complex multiplication has a more remarkable property still.

If \(z=a+ib\) and \(w=c+id\), then \[ \begin{split} |zw|^2&=|(ac-bd)+i(ad+bc)|^2\\ &=(ac-bd)^2+(ad+bc)^2\\ &=(ac)^2-2abcd+(bd)^2+(ad)^2+2abcd+(bc)^2\\ &=a^2c^2+b^2d^2+a^2d^2+b^2c^2\\ &=(a^2+b^2)(c^2+d^2)\\ &=|z|^2|w|^2 \end{split} \] so that \(|zw|=|z||w|\), so the size of the product is the product of the sizes.

A special collection of complex numbers are the ones satisfying \(|z|=1\), so they form the unit circle in the complex plane. The point on the circle with polar coordinates \((1,\theta)\) has Cartesian coordinates \((\cos(\theta),\sin(\theta)\)), and expressed as a complex number is \(\cos(\theta)+i\sin(\theta)\).

We can do a little more with these special complex numbers.

If \(z\) and \(w\) both have modulus \(1\), then so do \(1/z=\overline{z}\) and \(zw\), so the complex numbers of unit modulus are closed under multiplication and multiplicative inverse: when a collection of quantities has this properties we can it a group, and this particular group goes by the name \(U(1)\).

We can put this together with in a very pretty way.

If \(z,w \in U(1)\) with \(z=\cos(\theta)+i\sin(\theta)\) and \(w= \cos(\phi)+i\sin(\phi)\), the \(zw=\cos(\theta+\phi)+i\sin(\theta+\phi)\). Depending on how you define things, this either follows from or gives a proof of the standard trigonometry addition formulae.

There's also a useful special case, already mentioned above. If \(|z|=1\), so \(z=\cos(\theta)+i\sin(\theta)\), then \(iz=-\sin(\theta)+i\cos(\theta)\), which associates with each point on the unit circle a unit vector pointing tangent to the circle in the anti-clockwise direction.

Three dimensional vectors

This is all so neat and elegant that a great deal of effort was invested into trying to do the same thing with three component vectors. However, it couldn't be done. There's no way of multiplying three dimensional vectors to give a three dimensional vector with all these nice properties.

All is not actually lost: it is possible to multiply three vectors in a useful way, but you have to be prepared to extend what you mean by multiplication. The result is geometric algebra, and it has many attractions. I'll come back to this later on, but for the moment let's just accept that you just can't define a nice multiplication on three dimensional vectors, and go on to see how you can do it with four dimensional vectors.

In fact we will see that just about everything described above for the complex numbers has an analogue in the quaternions. (Though sometimes in a surprising way.)


So, what is a quaternion?

One way of describing them is that a quaternion, \(q\), is a sum of the form \[ q=a+bi+cj+dk \] where \(i,j\) and \(k\) are a new kind of number, like the usual square root of minus one, but related in a particular way. We have \[ i^2=j^2=k^2 = ijk = -1 \] and then we carry out arithmetic by multiplying out the expressions and using these relationships. It's like complex arithmetic but with more component parts (and a more complicated set of rules for how these different square roots of minus one interact).

Just as with the complex numbers, we can think of these objects as vectors: but now they have four components, a real one and three imaginary ones.

We can also define a quaternionic conjugate, \[ \overline{q} = a-bi-cj-dk \] and then we quickly get \[ q\overline{q}=a^2+b^2+c^2+d^2=|q|^2 \] which gives the modulus, or size, of the quaterion, and \[ q^{-1} = \frac{\overline{q}}{|q|^2} \] is the inverse of \(q\).

There's a lot packed up into that.

The complex numbers are hiding in there

The quaternions contain the complex numbers. This is easy to see, because it's obvious that if you set \(c=0=d\) then what is left is just the arithmetic of complex numbers.

What's slightly less obvious is that the complex numbers sit in there in more than one way. If you set \(b=0=d\), you get exactly the same thing except that the square root of minus one is now \(j\) (which should please the electrical engineers), and if you set \(b=0=c\) it is now \(k\).

In fact, you can think of the quaternions as the result of building complex numbers out of complex numbers: if \(z\) and \(w\) are both complex numbers, than \(z+wj\) is a quaternion, as long as \(i,j\) and \(k\) satisfy the rules up above.

Vector algebra is hiding in there

The rules for multiplying \(i,j\) and \(k\) have some very evocative consequences.

If I multiply \(ijk=-1\) by \(i\) I get \(i^2jk=-i\), so that \(jk=i\), and similarly \(ij=k\), \(ki=j\).

If I square \(ij=k\) I get \(ijij=-1\), so multiplying on the left by \(i\) and on the right by \(j\) I get \(ji=-ij\), and similarly \(ik=-ki\), \(jk=-kj\).

The relations look alot like what happens with the usual unit vectors \(\pmb{i,j,k}\) pointing in the \(x,y,z\) directions in three dimensional vector geometry. The square of each unit vector is like the dot product (except with the wrong sign), and the product of two different ones is just like the cross product.

In fact, this is such an important observation that I'll actually think of \(i,j\) and \(k\) as those unit vectors, and regard the purely imaginary quaterions as vectors in three dimensions, with the rules I gave up above for multiplication.

So, what do I get if I have two purely imaginary quaternions (or two three dimensional vectors) and multiply them together using the quaternion rule? If you haven't seen this before, it really is quite amazing. So let's start with \(q_1=a_1 i+b_1 j+c_1 k\) and \(q_2=a_2 i+b_2 j+c_2 k\). Then \[ \begin{split} q_1 q_2 &= (a_1 i+b_1 j+c_1 k)(a_2 i+b_2 j+c_2 k)\\ &= a_1a_2i^2 + b_1b_2 j^2 + c_1c_2 k^2 + a_1b_2 ij +a_1c_2ik + b_1a_2ji + b_1c_2jk+ c_1a_2 ki + c_1 b_2 kj\\ &= -(a_1b_1+a_2b_2+a_3c_3) +(b_1c_2-c_cb_2)i + (c_1a_2-a_1c_2)j+(a_1b_2-b_1a_2)k \end{split} \] and if we think of \(q_1\) and \(q_2\) as vectors, then \[ q_1 q_2 = -q_1.q_2 + q_1 \times q_2 \]


I haven't mentioned this explicitly, but we do have to be a bit careful. There is a price to pay for extending the multiplication of two dimensional vectors to that of four dimensional vectors, and it is that the resulting multiplication is not commutative, so we have to pay attention to the order that they come in when multiplying: in general, \(q_1q_2 \neq q_2 q_1\).

So, for example, if we have two quaternions \(q\) and \(r\), and we want to solve the equation \[ qs=r \] for the unknown quaternion \(s\), we have to calculate \(q^{-1}r\), not \(r q^{-1}\) which is (usually) different.

But this is OK. It's no worse that the situation with matrices, and we all have to learn to cope with that. In fact, one can represent quaternions as \((4 \times 4)\) matrices with their entries made out of the real and imaginary parts so that the matrix multiplication matches the quaternion multiplication. An interesting challenge is to try to figure out how to do that, and the representation of the complex numbers as \((2 \times 2)\) matrices gives a good starting point.

What have the quaternions ever done for us?

OK, so we can invent a quaternion algebra, and it looks as if it has a fair bit of the usual vector algebra coded into it somehow. But what is it good for?

I think a piece of maths is worthwhile if it can be used to solve problems, or if it gives a new insight into something, or if it's just downright pretty. Quaternions meet all three criteria. In fact Hamilton spent a long time writing a multi-volume treatise on the quaternions and their applications. For good or ill, he didn't succeed in persuading the mainstream community of their utility, though they do have their proponents even now, and are heavily used in a few niche areas.

I am going to take a look at just two aspects of the quaternions. The first is an application, involving using them to represent rotations. The second is a bit of pure geometry, involving the sphere in four dimensions.


The unit complex numbers can be used to describe rotations in the plane, so you might expect, or at least hope, that unit quaternions (i.e quaternions of modulus \(1\)) can be used to describe rotations in space.

They can, and what's more it can be done in a very beautiful way.

You might expect (I certainly did) that it would work like this. Given a point in three dimensional space, with coordinates \((x,y,z)\), I think of this as the purely imaginary quaternion \(X=xi+yj+zk\), then if \(q\) is any quaternion with \(|q|=1\), we see that \(|qX|=|q||X|=|X|\), so this preserves the size of \(X\) and it looks quite promising. But it's quite hard to see just which rotation (if any) corresponds to a given \(q\),and whether this gives all rotations. Fortunately, there's a better way.

First, let's think what determines a rotation: we need an axis of rotation, and an angle of rotation. Let's say that the axis of rotation is given by the vector \(n=n_xi+n_yj+n_zk\) where \(|n|=1\), and the angle of rotation is \(\theta\). Then we build the quaternion \[ q(\theta) = \cos\left(\frac{\theta}{2}\right) + n \sin\left(\frac{\theta}{2}\right) \] Now, here comes the sneaky bit.

Take any point \((x,y,z)\) and represent it by the purely imaginary quaternion \(X=xi+yj+zk\). Then \[ qX\overline{q} \] is the result of rotating \(X\) about \(n\) by the angle \(\theta\).

That is a bold claim. You need a reason to believe it. So write \[ X=\alpha n + \beta m \] where \(m\) is orthogonal to \(n\). (You'll notice that I've completely given up distinguishing between vectors and purely imaginary quaternions.) Then a couple of lines of algebra shows that \[ \begin{split} qX\overline{q} &= \left(\cos\left(\frac{\theta}{2}\right) + n \sin\left(\frac{\theta}{2}\right)\right) (\alpha n + \beta m) \left(\cos\left(\frac{\theta}{2}\right) - n \sin\left(\frac{\theta}{2}\right)\right)\\ &= \alpha n + \beta(m\cos(\theta)+(n \times m)\sin(\theta) \end{split} \] which makes it explicit that the component of \(X\) parallel to \(n\) is unchanged, and the component perpendicular to \(n\) is rotated by the angle \(\theta\) in the plane determined by \(m\) and \(n \times m\), which is the plane perpendicular to \(n\), so \(X\) has been rotated about the axis of rotation \(n\), just as claimed above.

This is unreasonably nice. Not only is there a quaternion corresponding to any rotation, but we have an extremely simple recipe for building it out of the axis and angle of rotation.

There is, of course, more. Not only is this a pretty way to represent rotations, it is a very useful one. If you want to control a robot arm, or model how a three dimensional object moves under the influence of a force, or calculate what the view looks like from a different position then you need an efficient way to represent rotations. Quaternions give us an efficient way to do this, and for this reason are used in robotics, dynamical simulations of systems of molecules and computer graphics.

On top of all this, you can use this quaternion representation to obtain more insight into the structure of \(SO(3)\), the group of three dimensional rotations. I won't develop that idea here, but just leave you with the observation that if \(q\) is a unit quaternion, then \(-q\) gives the same rotation as \(q\), so there's something interesting going on.

I've just given a very brief description of how to get rotations out of the quaternions. For much more, from a rather different perspective, @mathforge gives a development of the quaternions from rotations here, including an implementation.

The 3 sphere

I described how the unit complex numbers give us a way of finding a vector tangent to each point in a circle in the two dimensional plane. We can use the unit quaternions to see a nice bit of geometry in four dimensions. Now, the unit quaternions are defined by the squares of their components adding up to \(1\), so they are the points a distance \(1\) from the origin in four dimensional space. This is an object sometimes called a hypersphere, but which I will call the three-sphere, and denote by \(S^3\).

So if we take a point on \(S^3\), and multiply it by \(i\), what do we get? \[ \begin{split} X&=a+bi+cj+dk\\ iX&=-b+ai-dj+ck \end{split} \] so we see that \(iX.X=-ab+ba-cd+dc=0\). In other words, \(iX\) is orthogonal to \(X\). But that means that \(iX\) is a tangent vector to \(S^3\) at \(X\). (This is just the same as in two and three dimensions: the tangent space to a circle or a sphere at a point is given by all the vectors orthogonal to the position vector at that point.)

This means that by this process we can obtain a unit tangent vector at each point on \(S^3\), which clearly varies continuously over the sphere, since small changes in \(X\) give small changes in \(iX\).

We can now extract a rather indirect way of seeing why we can't have a multiplication on three dimensional vectors which has all the nice properties of complex and quaternionic multiplication. If there were, we could use this to build a vector field on the surface of a sphere in three dimensions: but the hairy ball theorem tells us that this is impossible, so there can't be a multiplication of three vectors which behaves like this.

You can start from here and start asking questions such as: Which surfaces admit a non-vanishing vector field? How many linearly independent vector fields can there be? What kind of things go wrong when there can't be one? This leads into worlds of differential and algebraic topology, which is maybe a step or several too far for this article. Instead, I'll return to a couple of topics raised up above.

But what about...

Geometric algebra

An alternative approach to multiplying vectors is given by geometric algebra. For three-dimensional vectors, this works in the following way. \[ i^2=j^2=k^2=1, \qquad ij=-ji \text{ etc} \] and \(ij,jk,ki,ijk\) are new objects. With this in place, we can multiply vectors together, at the expense of everything taking place in an eight-dimensional space.

On the one hand, this gives a powerful tool for calculating with vectors, as well demonstrated in Jason Merrill's blog post and it is well-behaved in that every vector has a multiplicative inverse. On the other hand, it is not completely well-behaved algebraically. If \(\pmb{n}\) is a unit vector, than \((1+\pmb{n})(1-\pmb{n})=1-1=0\), so non-zero quantities can multiply together to give zero.

Of course, neither geometric algebra nor quaternion algebra is right or wrong. Each gives a useful tool for working with vectors. I find the quaternions more aesthetically satisfying, but if I had to do a calculation I'd use whichever was the most convenient. (Assuming I could tell in advance, of course.)

Higher dimensions

So we've seen how to think of a pair of real numbers as a complex number; and how to think of a pair of complex numbers as a quaternion, though the result is no longer commutative. Can the trick be repeated? Can we take pairs of quaternions and get a multiplication on eight dimensional vectors?

Yes, we can, and the resulting objects are called octonions. This time the price we have to pay is that the multiplication is not associative. This is quite serious, as it means that we can't use matrices to represent these objects. Nevertheless, they give another interesting geometry which has been investigated both for its mathematical structure and in mathematical physics.

To finish with

I have to admit that hand calculation with quaternions can be a fairly tedious business. But you do get quite a lot of power in return, and I think they that we should all have at least a passing acquaintance with them.

There is, of course, a wealth of material available on the internet, and Google can be your friend if you ask it nicely. If you don't already know about it, the Internet Archive is an amazing resource where you can access many out of print texts, including Hamilton's books and many of his research papers. I'll just add one more link, to Pertti Lounesto's Clifford algebras and spinors. This is an exceptionally careful text, which provides a general framework where both quaternionic and geometric algebra live, and also demonstrates many applications to physics.

Tuesday, June 13, 2017

What is this thing called i?

Where did complex numbers come from?

You might think that people invented \(i\) just so that they had a solution to the quadratic equation \(x^2+1=0\). But no: it made perfect sense for a long time just to say that this quadratic (and any other quadratic with a negative discriminant) just doesn't have a solution. There was a much more compelling reason for pretending that there was a number whose square is \(-1\) than just making up solutions to quadratics that don't have them.

A few centuries ago, an algorithm was found to solve cubic equations. But something weird happened: on the way to finding a root of a cubic, sometimes you had to pretend that \(-1\) had a square root, and work with that for a while. It was all right, though, because this non-existent number always cancelled out, leaving behind a root that made sense. The fact that you can find genuine solutions to problems by using these funny objects as book-keeping devices suggested that maybe we should take the book-keeping devices seriously as a new kind of number. And once they were taken seriously, they turned out to be a door into a mathematical paradise of wonderful (and useful) structure.

It is a sad accident of history that this new kind of number, the square root of a negative quantity, has become known as an imaginary number. That gives the impression that they are somehow less real, somehow inferior to the ones that we call real numbers. In fact, they're no more imaginary than negative numbers, non-integer rational numbers, irrational algebraic numbers, or transcendental numbers. All of them are convenient extensions of the natural numbers with interesting and useful properties, and all of them are just as imaginary as the 'imaginary' numbers.

Of course, the imaginary numbers by themselves aren't very interesting or useful. They naturally cropped up in combination with the real numbers in objects of the form \(a+ib\) where \(a\) and \(b\) are real numbers and \(i^2=-1\), which we call complex numbers. \(a\) is called the real part of \(z\) and \(b\) is called the imaginary part. (Yes, the imaginary part is a real number. Deal with it.) If you forget for the moment that there is no such thing as \(i\), you can do algebra with them just by replacing \(i^2\) with \(-1\) whenever it occurs, and that is what was done to solve cubic equations.

This suggests that it's worth taking these complex numbers seriously as objects in their own right.

But what are they?

The simplest answer is basically what I just wrote above. Think of complex numbers as polynomials in \(i\) when you do algebra, and replace \(i^2\) by \(-1\) whenever it occurs. Then you can add and multiply complex numbers without any more difficulty than you have in dealing with polynomials.

If you have two complex numbers \(z=a+ib\) and \(w=c+id\), then \[ \begin{split} z+w&=(a+ib)+(c+id)\\ &=a+c+ib+id\\ &=(a+c)+i(b+d) \end{split} \] and \[ \begin{split} zw&=(a+ib)(c+id)\\ &=ac+ibc+aid+ibid\\ &= ac+i^2bd+bci+adi=(ac-bd)+(bc+ad)i \end{split} \]

And now the first bit of magic happens. Given \(z=a+ib\) we define a new thing called the complex conjugate of \(z\), which is \(\overline{z}=a-ib\). Then multiplying out the brackets gives \[ z\overline{z} = a^2+b^2 \] which is never negative (since \(a\) and \(b\) are both real numbers) and is only zero when \(z=0\). This quantity is so useful it gets a name: \(z\overline{z}=|z|^2\), and we call \(|z|\) the modulus of \(z\). Then we see that \[ z \times \frac{\overline{z}}{|z|^2} = 1 \] which means that we can divide one complex number by another: \[ \frac{w}{z} = w \times \frac{\overline{z}}{|z|^2} \] And, just as with real numbers, this works whenever \(z \neq 0\). Because of the way we've done this, the complex numbers have exactly the same algebraic properties as the real numbers, so all the calculations we could do before, we can still do: but now we can do more.

But what are they really?

So far, this is just a formal game. We pretend that there is a number whose square is \(-1\), and find that we can treat it just like an ordinary number and still do algebra. Amazingly, all seems to work. But it doesn't really tell us what this thing, this complex number, actually is.

One thing we can do is notice that a complex number is built out of two real numbers. We could think of those numbers as the coordinates of a point in the \((x,y)\)-plane, or as the components of the position vector of that point.

Then if we think of \(z\) as \((a,b)\) and \(w\) as \((c,d)\) then we have what we always had, a way of adding vectors: \[ z+w=(a+b,c+d) \] but on top of that, we have a way of multiplying vectors, \[ zw=(ac-bd,ad+bc) \] and the multiplication is algebraically well-behaved.

Looking at it in more detail, we see some more magic.

\(z=(a,b)\) gives us a picture of \(z\) in terms of Cartesian coordinates. But we could just as well use polar coordinates \((r,\theta)\) to represent this point, where \(r\) is the distance from \(z\) to the origin, and \(\theta\) is the angle between the line joining the origin to \(z\) and the \(x\)-axis. \(r\) is just \(|z|\), the modulus of \(z\), and \(\theta\) is called the argument of \(z\).

If \(z\) has modulus \(r\) and argument \(\theta\), while \(w\) has modulus \(\rho\) and argument \(\phi\), it turns out that \(zw\) has modulus \(r\rho\) and argument \(\theta+\phi\).

The multiplication isn't just algebraically well-behaved, it has a nice geometric interpretation.

So we can think of complex numbers as vectors in the plane with a well-behaved multiplication on top of the addition which we can always do for vectors.

But what are they really?

Once we start thinking about vectors, we notice that the real and imaginary parts of \(zw\) look like inner products. With a bit of fiddling about, we finally notice that we can represent \(z\) by a \(2\times 2\) matrix, \[ z=\left[ \begin{array}{cc}a&b\\-b&a \end{array} \right] \] and the matrix addition and multiplication correspond to adding and multplying complex numbers: \[ z+w=\left[ \begin{array}{cc}a&b\\-b&a \end{array} \right] +\left[ \begin{array}{cc}c&d\\-d&c \end{array} \right] =\left[ \begin{array}{cc}a+c&b+d\\-(b+d)&a+c \end{array} \right] \] and \[ zw=\left[ \begin{array}{cc}a&b\\-b&a \end{array} \right] \left[ \begin{array}{cc}c&d\\-d&c \end{array} \right] =\left[ \begin{array}{cc}ac-bd&ac+bd\\-(ac+bd)&ac-bd \end{array} \right] \] Just to check: \[ i^2 = \left[ \begin{array}{cc}0&1\\-1&0 \end{array} \right]^2 = -\left[ \begin{array}{cc}1&0\\0&1 \end{array} \right] = -1 \] So complex numbers are just a particular kind of \(2 \times 2\) matrices, made entirely out of real numbers.

This means that we can get all the algebraic properties of complex numbers without inventing any new, imaginary, numbers, as long as we're willing to thing of the real number \(x\) as the matrix \[ \left[ \begin{array}{cc}x&0\\0&x \end{array} \right]. \]

You may feel that the introduction of matrices is a high price to pay to get rid of non-existent numbers. There is another, more algebraic approach.

But what are they, really?

We can also build complex numbers in a familiar kind of way: it's very similar to congruence arithmetic, sometimes called clock arithmetic. All this means is that we choose an integer \(n \gt 1\), and whenever we do sums, we only keep the remainder when we divide by \(n\). A useful we to think of this is to think of each number as a multiple of \(n\) plus a remainder, then treat \(n\) as \(0\). We call this the arithmetic of \(\mathbb{Z}_n\).

For example, if \(n=3\) the possible remainders are \(0,1,2\),and we denote this set by \(\mathbb{Z}_3\). Then in \(\mathbb{Z}_3\), we have \[1+2=3 = 1 \times 3+0 = 0, \] \[ 2+2=4=1\times3+1= 1, \] and \[ 2 \times 2=4 = 1 \times 3 +1 =1. \]

Now we run very far and fast with this ball.

We start with the set of all polynomials in \(X\) with real coefficients, but we only ever keep the remainder when we divide by \(1+X^2\).

This is a little more abstract looking, but given any polynomial \(P(X)\), we can write it as a multiple of \(X^2+1\) and a remainder. Keeping only the remainder has the same effect as treating \(X^2+1\) as \(0\); in other words, as treating \(X^2\) as \(-1\).

Actually, this is almost back to the beginning, where I said you can just think of a complex number as a polynomial in \(i\) where \(i^2=-1\); the only difference is that now we don't have to pretend that anything squared is \(-1\), we can just work with remainders like in congruence arithmetic.

Again, we haven't introduced any new objects, any imaginary numbers with negative square roots. We've taken mathematical objects we're already familiar with, and seen how to get out of them something which behaves in exactly the same way as these so-called complex numbers.

They are all of the above

There isn't really an answer to the question what are they, really?. They're all of the above. Mathematically, they are all exactly the same, in the sense that you can think of each of them as just an alternative way to write the others. But none of those representations is the 'right' one. They all give exactly the same arithmetic of complex numbers.

So we don't have to worry about 'making up imaginary numbers', and worrying that what we do doesn't make sense because they aren't 'real'. They're just as 'real' (in that they make mathematical sense) as a number whose square is \(2\). (There was some resistance to accepting \(\sqrt{2}\) as a number initially.)

And we also get some accidental benefits. We have different ways of thinking about complex numbers, which means that if we want to do something involving them we can use whichever way of thinking about them is most convenient.

If, that is, we want to do something involving them. Since we aren't sixteenth century mathematicians trying to persuade a nobleman to be our patron by being better at solving cubics than the competition, there remains the question of why we should care.

So what was the point of all that, then?

This is where it actually gets more interesting. The different representations make it natural to ask different questions, which means that we have more avenues for exploration than just having one.


Complex numbers were introduced because they cropped up naturally in the solution of cubic equations. What might we need to introduce to solve higher order polynomial equations? What if we let the coefficients of our polynomials be complex numbers themselves?

The astonishing answer is: nothing. Once you've allowed yourself complex numbers it turns out that you don't need anything else for higher order polynomials, or even polynomials with complex coefficients. The fundamental theorem of algebra tells us that any polynomial of degree \(n\) with complex coefficients (and that includes real ones, since a real number is a complex number with an imaginary part of zero) has \(n\) complex roots.

That's a beautiful mathematical result.

If beauty doesn't motivate you sufficiently, then take solace in the fact that the properties of some differential equations which arise in physics and engineering are determined by polynomials you can build out of them, and the systems modelled by these differential equations are stable if the real part of all the roots are negative. Google for control theory to see lots more.


So we can multiply two dimensional vectors together...what about three dimensional vectors? It turns out that there is no way to do this that is well behaved. But you can do it with four dimensional vectors, if you're willing to lose the commutative law. And with eight dimensional ones, but you lose the associative law. Lots of fun mathematics to play with.

But again, if you want something more useful, the complex numbers can be thought of as determining a magnitude and an angle, or a phase. This turns out to be a useful way of thinking about alternating current, and you can analyse AC circuits using complex arithmetic in much the same way as you use real algebra to analyse DC circuits.

More Algebraic

Can I try the same trick with a different polynomial from \(1+X^2\)?

Can you ever. This procedure is very similar to working in congruence arithmetic. when you work with the remainders on dividing by any \(n\), you always get well-behaved addition, subtraction and multiplication. but division only works if you can't factorize \(n\). In just the same way, with polynomials you always get well-behaved addition, subtraction and multiplication, and if the polynomial you divide by cannot be factorized, you also get division.

This turns out to useful in (for example) the construction of error-correcting codes which have brought us back beautiful pictures from space probes, and are responsible for the robustness of CDs and digital television signals.

So the algebra of complex numbers not only makes sense, it turns out to be useful in a variety of ways.

The miracle

All the above is fascinating stuff, but it misses out a big area of mathematics, namely calculus. Given two complex numbers, \(z\) and \(w\), the distance between them in the plane is just \(|z-w|\): and since we have a notion of what it means for two complex numbers to be close to one another, we can think about what it means for a function to be continuous, or differentiable, and so on.

This leads to some entirely unexpected consequences.

It turns out that if a complex function can be differentiated once, it can be differentiated infinitely often, so we don't get monsters like the real functions which are differentiable just once. It turns out that not only can the function be differentiated infinitely often, it is actually equal to its Taylor series (as long as you don't go too far away from the point where you calculated the derivatives). It turns out that differentiable functions give mappings of the plane that preserve angles, and this can be used to analyse fluid flow and electric fields. It turns out that you can do a kind of integration which lets you compute things that are much harder to do without the complex numbers. It even turns out that they can tell you a lot about the behaviour of the normal integers, which is where the whole thing started.

And all because people in the middle ages had competitions to solve cubic equations.


This whistle stop tour of complex numbers was partly motivated by a Twitter discussion with @SallyBiskupic, @nomad_penguin and @DavidKButlerUoA.

Thursday, June 8, 2017

0.999...: is nonstandard analysis the answer?

I previously blogged here about the perennial discussion on \(0.\dot{9}=1\), carefully avoiding nonstandard analysis. I also blogged here about nonstandard analysis, being equally careful not to mention \(0.\dot{9}\). Some consequent conversations with have made it evident that a look at the interaction of these two might be of interest.

\(0.\dot{9}\) meets nonstandard analysis

It is tempting (so tempting) to think that although \(0.\dot{9}=1\) in normal analysis, things will be different in nonstandard analysis. After all, there are infinitesimals there, so presumably that's a place where it makes sense for \(0.\dot{9}\) to be infinitesimally less than \(1\). But of course, just as in the real analysis case, we have to decide what we mean by \(0.\dot{9}\).

Well, whatever it means in nonstandard analysis, it has to be a natural extension of what it means in standard analysis, where \[ 0.\dot{9}= \lim_{N\to \infty} \sum_{n=1}^N \frac{9}{10^n}. \]

There are different ways of trying to transport this into nonstandard analysis, with different consequences. Here are the ones that come to my mind.

\(0.999\ldots\) with is infinitesimally less than \(1\)

First, we might try to interpret that string of digits in a plausible way, forgetting about the 'infinite sum as a limit' semantics.

Then one might try to interpret \(0.\dot{9}\) via the definition of a nonstandard real as a sequence of real numbers, and a fairly natural choice of sequence is \(0.9,0.99,0.999\ldots\), i.e. the sequence of partial sums for the limit. Clearly, since every term in the sequence is less than \(1\), the corresponding nonstandard real is less than \(1\), and since eventually every term in the sequence is bigger than any real number less than \(1\), the corresponding nonstandard real must be less than \(1\) by an infinitesimal quantity, say \(\epsilon\); so the sequence defines a nonstandard real, \(1-\epsilon\). In fact, this \(\epsilon\) is the infinitesimal defined by the sequence \(1/10,1/10^2,1/10^3,\ldots\).

But there's a problem with this. Why choose this particular sequence? We could equally have used \(0.99,0.999,0.999,\ldots\) which also defines a nonstandard real infinitesimally less than \(1\), but this time the result is \(1-\epsilon/10\); or \(0.99,0.9999,0.9999999\ldots\), which gives \(1-\epsilon^2\). There are a great many possible choices, all giving different limits infinitesimally less than \(1\).

This is not good: if \(0.\dot{9}\) is to be infinitesimally less than \(1\), it should be a well-defined nonstandard real number, and the infinitesimal should be unambiguously defined. This is a fail on both criteria. We need to try something else.

\(0.9\ldots9\) with an infinite number of \(9\)'s is infinitesimally less than 1

Let \(N\) be a hyperfinite natural number (i.e. a natural number in the nonstandard reals that is greater than any \(n \in \mathbb{N}\)). Then the transfer principle tells us that just as for finite \(N\), \[ \sum_{n=1}^N \frac{9}{10^n} = 1- \frac{1}{10^N} \] and \(1/10^N\) is, of course, an infinitesimal.

This looks a bit like the previous answer: the different choices of \(N\) correspond to different sequences of terms of the form \(0.9...9\), and give different infinitesimals. But it does have some more information. A bigger \(N\) corresponds to a sequence of natural numbers that grows faster, and so a sequence that converges to \(1\) faster: and this is what it means for the infinitesimal difference to be smaller. Nevertheless, we still don't obtain a well-defined infinitesimal that is the difference between \(0.\dot{9}\) and \(1\).

In fact, it's not just a bit like the previous answer: it is the previous answer, but expressed in a different language. Let's take a closer look.

If \(N\) is a hyperfinite natural number, then it corresponds to a divergent sequence \(n_1,n_2,n_3\ldots\) of natural numbers. This defines a new sequence \(p_1,p_2,p_3,\ldots\) where \(p_i\) is \(0.9\ldots 9\) with \(n_i\) nines. This sequence of real numbers, \(p_i\), determines the nonstandard real \(\sum_{n=1}^N 9/10^n\), and the sequence \(1/10^{n_i}\) determines the infinitesimal difference \(1/10^N\) between this number and \(1\).

So each of these approaches gives us something infinitesimally less than \(1\) but is really satisfactory.

In fact, we get a more satisfactory situation by saying that

\(0.999\ldots = 1\) in nonstandard analysis

Perhaps the most mathematically natural interpretation of \(0.\dot{9}\) is by simply to use the definition above: \[ 0.\dot{9}= \lim_{N\to \infty} \sum_{n=1}^N \frac{9}{10^n} \] but remembering that now \(N \in {}^*\mathbb{N}\), so that \(N\) can be larger than any finite integer.

We also have to remember what this means: given any \(\epsilon \gt 0\), there is some \(K\) with the property that \(\sum_{n=1}^N 9/10^n\) is within \(\epsilon\) of the limit whenever \(N \geq K\). If \(\epsilon\) is infinitesimal, then \(K\) is hyperfinite.

But then the transfer principle guarantees that the standard analysis argument still works, and the limit is \(1\), just as in standard analysis. This is clear and unambiguous, if a little disappointing.

There is another way of thinking about it that I'm going to consider, though, and it's the strangest one, and the hardest to understand (which is partly why I've kept it till last).

\(0.999\ldots\) isn't even a thing in nonstandard analysis.

We might think of \(0.\dot{9}\) in a slightly different way: \[ 0.\dot{9} = \sum_{n \in \mathbb{N}} \frac{9}{10^n} \] forgetting (for the moment) that this is really just an evocative shorthand for \[ \lim_{N \to \infty} \sum_{n=1}^N\frac{9}{10^n} \] which is itself just a notation for 'the thing I can get arbitrarily close to by making \(N\) sufficiently large'.

Let's take the notation seriously, and think of it as representing the sum of a collection of numbers indexed by \(\mathbb{N}\).

There is something very strange here. It seems that summing over a finite number of terms gives an answer less than \(1\), summing over all natural numbers gives exactly \(1\), but then summing a hyperfinite number of terms gives an answer that is less than \(1\). How can adding more terms make the answer smaller again, even by an infinitesimal amount?

This is where the waters become relatively murky.

When we talk about a property of (a set of) real numbers in nonstandard analysis, what we're really doing is talking about a property of the representatives of (the set of) these real numbers in the nonstandard reals. Now, a single real number does have a natural nonstandard real that it corresponds to, and it's the obvious one. This works for any finite set of real numbers. But the representative of an infinite set of real numbers always acquires new elements. So the natural representative of the set of all natural numbers, \(\mathbb{N}\), is \({}^*\mathbb{N}\), which includes all the hyperfinite natural numbers too.

In fact there is no way to specify the set of finite natural numbers in \({}^*\mathbb{N}\) in the language of nonstandard analysis. It simply isn't a subset of \({}^*\mathbb{N}\), just as there is no set of all infinitesimals, or of all hyperfinite natural numbers.

Actually, simply is something of a lie here: Sylvia Wenmackers (@SylviaFysica) called me on this, so here's my attempt to explain what I mean by it.

Again, we have a rather subtle idea to struggle with. There are collections which we can describe in the language of nonstandard analysis, which we call internal sets. These are the meaningful sets, since they are the only ones we can talk about when we are working within nonstandard analysis. The other sets, the ones which require the use of descriptions such as infinite or infinitesimal, are called external sets. We can only talk about them when we talk about nonstandard analysis, looking at it from outside. They don't correspond to anything meaningful inside nonstandard analysis. Depending on the way that you like to do your nonstandard analysis, you can think of internal sets (part of the system) and external sets (not part of the system); or you can think of sets and descriptions which don't describe sets at all.

The situation is a little like that of trying to talk about the collection of all sets in set theory. If you try to make that collection a set, then you get into all kinds of problems, so you are forced into the position that some collections are sets, and have all the properties we require of sets but others, which are called proper classes are not, and do not.

This is hard to come to terms with, but we really don't have any option. It's the price we have to pay for having access to infinitesimal and infinite quantities in our number system, if we want to retain the properties that make things work well, such as every set of natural numbers having a smallest member (there is no smallest hyperfinite natural number), or every set with an upper bound having a least upper bound (there is no least upper bound for the infinitesimals).

To summarize: the only objects which exist meaningfully in nonstandard analysis are the ones which can be described using the language we already have for real analysis: and that doesn't allow us to talk about infinitesimal or infinite quantities. If we work purely inside the nonstandard reals, we can't tell that some are infinitesimal and some are infinite, because we can't pick out the standard reals to compare them with. It's only when we look at the system from outside that we can tell that some nonstandard reals are smaller than all positive reals, and some nonstandard natural numbers are larger than all standard natural numbers.

So the answer to the question up above is likely to feel unsatisfactory: it is that that isn't actually a question, because it refers to things that don't make any sense in the context of nonstandard analysis.

So where does that leave us?

It leaves us with the conclusion that the situation for \(0.999\ldots\) is exactly the same in nonstandard analysis as it is in standard analysis. Either the string of \(9\)'s terminates, in which case the number is less than \(1\); or it does not terminate, in which case it represents a limit, and the corresponding number is exactly \(1\).

What we gain in the nonstandard interpretation is that the string might terminate after a hyperfinite number of terms, in which case the result is less than \(1\) by an infinitesimal. In this case we can think of the hyperfinite number of terms as specifying a particular rate at which the number of \(9\)'s increases, and the infinitesimal as telling at what rate the sequence of numbers of the form \(0.9\ldots9\) approaches \(1\): the hyperfinite and infinitesimal quantities have an interpretation in terms of the standard reals in which they tell us about rates of convergence.

It has been argued that one reason for the widespread conviction that \(0.\dot{9} \lt 1\) is that people are somehow (implicitly, inchoately, even unwittingly) thinking of \(0.\dot{9}\) as a terminating infinite, i.e. hyperfinite, sum, requiring the addition of an infinitesimal \(0.0\ldots01\) term (with a terminating infinite, i.e. hyperfinite, number of zeros) to give \(1\). I'm not entirely convinced by this: it seems more to me that they're taking the 'obvious' inductive limit of the behaviour of a finite terminating decimal expansion, and that this interpretation is more eisegesis than exegesis of the phenomenon. But I have encountered people trying to argue that the difference is a decimal expansion with infinitely many \(0\)'s followed by a \(1\), so the idea can't be entirely discounted.

After all that, is nonstandard analysis the answer?

No. nonstandard analysis isn't the answer. But it does give us a framework for thinking about the question in a fruitful way, and for asking (and answering) some interesting questions. Maybe that's even better.