Tensors in Machine Learning for Software Engineers

In the context of machine learning, a tensor is a multidimensional array. That's it! There's nothing to it!

PyTorch and TensorFlow agree, and even academic papers are OK with this:

Tensors (of order higher than two) are arrays indexed by three or more indices, say \((i, j, k, \cdots)\) — a generalization of matrices, which are indexed by two indices, say \((r, c)\) for (row, column).

Sidiropoulos, et al. "Tensor Decomposition for Signal Processing and Machine Learning"

But, of course, this is not the impression you'd get from googling "tensor":

A screenshot of a Google search result for keyword tensor, showing a lengthly mathematical definition, various images with many symbols, and a bunch of People Also Ask items ending with 'What the heck is tensor?'

This is because the concept of tensors were first introduced in differential geometry, on which mathematicians and physicists have strong, divergent viewpoints. As a result, the two most dominant uses of the term "tensor" have different meanings, both of them with elaborate, well-developed tehoretical machinery. Machine learning, for the most part, does not use much of this machinery, and so what we mean by tensor in machine learning is also different from the math tensors and the physics tensors.

How are we to understand tensors, then? Seriously:

A screenshot of 'What the heck is tensor?' from the previous screenshot of a google search

Let's try to approach the subject from the data-structures angle. Recall that defining different operations on the same "shape" of data can yield distinct data structures: think about stacks, queues, arrays, and linked lists, for example. From this perspective, a tensor is a multidimensional array with standard operations defined on it.

In machine learning, some such operations behave differently based on what the underlying array looks like. The prime example is the matrix multiplication operation on tensors (PyTorch / TensorFlow), which computes the usual numerical multiplication, the usual matrix multiplication from linear algebra, or the batch matrix multiplication depending on the input.

Such piecemeal way of defining computational logic of an operation is common in software engineering, but it limits what we can say about the operation. It is not easy to reason about and say something broadly useful about an operation that behaves differently for many different cases.

To ensure consistent behaviors, mathematics considers a more limited set of operations on multidimensional arrays, ending up with a different notion of tensors. For example, it does not make sense in mathematics to talk about the "tensor product" of two arbitrary multidimensional arrays. We consider only tensor products of basis vectors and talk instead about how we can represent arbitrary multidimensional arrays as sums of simpler elements, i.e., the tensor products of basis vectors. This approach allows us to say quite a bit about tensors and their behaviors without knowing much about the specifics, in the form of multilinear algebra. Tensors in theoretical machine learning research often refer to the math tensors: see, for example, "Tensor Methods in Machine Learning" or "Tensor Decomposition for Signal Processing and Machine Learning".

In both machine learning and mathematics, we assume that the underlying multidimensional array of a tensor is immutable. If we wanted to say something about a tensor with the values in its underlying multidmensional array changing over time, we would define a tensor-valued function whose input value is time and output value is a tensor. We could, of course, define a separate data structure that collects tensor-valued functions of a certain kind and define operations on it, but the resulting structure would be distinct from what we consider a tensor in machine learning or mathematics.

But, in physics, things always change. (There wouldn't be much point in studying physics if nothing ever changed, would it?) And so, physics only considers multidimensional arrays that keep changing, and the operations that make sense over such things. As a result, tensors in physics are what we'd call tensor fields in mathematics, which are tensor-valued functions with appropriate restrictions so we can say interesting things about them. This leads, for example, to the theory of general relativity.

In conclusion, a tensor is (really!) a multidimensional array. But, as is the case with any other data structure, what matters is the operations we define on it.