Schröder’s Equation

Introduction

You probably know that, for natural number exponents anyways, exponentiation can be treated as repeated multiplication.

\displaystyle b^n:=\underbrace{bb\cdots b}_n

More formally, we define exponentiation recursively by

b^0:=1

and

b^{n+1}:=bb^n

for n\in\mathbb{N}. You probably also know, however, that exponentiation isn’t solely defined for natural number exponents. Raising a number to the power of -1, for example, gives us its multiplicative inverse (if it exists), and b^{1/2} is just the square root of b. We can even raise complex numbers to complex numbers (e.g., \mathrm{i}^{\mathrm{i}}=\mathrm{e}^{-\pi/2}\approx0.207879\ldots) (*).

When I first learned this, I began to wonder if repeated exponentiation, or tetration, could be extended to non-integer heights. In other words, I wondered if it was possible to make sense of \underbrace{b^{b^{\cdot^{\cdot^{\cdot^{b}}}}}}_t for t\notin\mathbb{N} in the same way we can make sense of b^t=\underbrace{bb\cdots b}_t for t\notin\mathbb{N}. By the way, we denote \underbrace{b^{b^{\cdot^{\cdot^{\cdot^{b}}}}}}_t with ^{t}b or \mathrm{tet}_b(t).

My first thought was to find any nice properties that tetration satisfies so that I could use those to at least extend tetration to real heights. However, because exponentiation isn’t associative, properties like {\left(b^x\right)}^y=b^{xy} don’t carry over to tetration (so {\phantom{\left({}^xb\right)}}^y{\left({}^xb\right)}\neq{}^{xy}b in most cases). In fact, the only real nice property of tetration comes from its recursive definition: {}^{x+1}b=b^{{}^xb}. This meant I’d have to be more clever to find the most ‘natural’ extension of the tetrational to non-integer heights. Eventually, I gave up on the problem and moved on to other things.

Around six years later, however, I learned that there is, in fact, a way to systematically find non-integer iterates of certain functions, including exponential functions. This makes evaluating, say, {\phantom{\left(\sqrt{2}\right)}}^{1/2}{\left(\sqrt{2}\right)} or ^{\mathrm{i}}\mathrm{i} possible. Not only that, but the way in which we can find non-integer iterates of functions is strikingly similar to one of the more well-known methods of finding non-integer powers of certain square matrices.

Matrix Diagonalization

As a refresher, linear transformations are functions between vector spaces which preserve addition and scalar multiplication. Specifically, if V and V^{\prime} are vector spaces over a field F and T:V\rightarrow V^{\prime} is a linear transformation between V and V^{\prime}, then for all vectors \mathbf{u},\mathbf{v}\in V and scalars c\in F,

T(\mathbf{u}+\mathbf{v})=T(\mathbf{u})+T(\mathbf{v})

and

T(c\mathbf{v})=cT(\mathbf{v}).

Put differently,

\displaystyle T{\left(\sum_{k=1}^nc_k\mathbf{v}_k\right)}=\sum_{k=1}^nc_kT(\mathbf{v}_k)

for any c_1,c_2,\ldots,c_n\in F and \mathbf{v}_1,\mathbf{v}_2,\ldots,\mathbf{v}_n\in V.

Every finitely-dimensional vector space V has a basis B, a subset of V which satisfies two properties:

  • B is linearly independent, i.e., there does not exist a vector in B that is a linear combination of other vectors in B.
  • B spans V, i.e., every vector in V is a linear combination of some vectors in B.

This allows us to represent vectors as columns of scalars. If V is a vector space over the field F, \mathbf{v} is a vector in V, B=\{\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_n\} is an ordered basis of V, and

\displaystyle \mathbf{v}=\sum_{j=1}^nc_j\mathbf{x}_j=c_1\mathbf{x}_1+c_2\mathbf{x}_2+\cdots+c_n\mathbf{x}_n

for some c_1,c_2,\ldots,c_n\in F, then we may represent \mathbf{v} by the coordinate vector

\displaystyle [\mathbf{v}]_B=\begin{bmatrix}c_1\\c_2\\c_3\\\vdots\\c_n\end{bmatrix}.

c_1,c_2,\ldots,c_n, by the way, are known as the coordinates of \mathbf{v}.

Furthermore, if V^{\prime} is another vector space over F, B^{\prime}=\{\mathbf{x}_1^{\prime},\mathbf{x}_2^{\prime},\ldots,\mathbf{x}_m^{\prime}\} is an ordered basis of V^{\prime}, and T:V\rightarrow V^{\prime} is a linear transformation, then for each j=1,2,\ldots,n, T(\mathbf{x}_j)\in V^{\prime} can be written as a linear combination of vectors in B^{\prime}. In other words,

\displaystyle T(\mathbf{x}_j)=\sum_{i=1}^ma_{i,j}\mathbf{x}_i^{\prime}=a_{1,j}\mathbf{x}_1^{\prime}+a_{2,j}\mathbf{x}_2^{\prime}+\cdots+a_{m,j}\mathbf{x}_m^{\prime}

for some a_{1,j},a_{2,j},\ldots,a_{m,j}\in F. Hence,

\displaystyle T(\mathbf{v})=T{\left(\sum_{j=1}^nc_j\mathbf{x}_j\right)}=\sum_{j=1}^nc_jT(\mathbf{x}_j)=\sum_{j=1}^nc_j\sum_{i=1}^ma_{i,j}\mathbf{x}_i^{\prime}=\sum_{j=1}^n\sum_{i=1}^ma_{i,j}c_j\mathbf{x}_i^{\prime}=\sum_{i=1}^m{\left(\sum_{j=1}^na_{i,j}c_j\right)}\mathbf{x}_i^{\prime}.

This means that, as long as we know the coordinates of \mathbf{v}, all we need in order to find the coordinates of T(\mathbf{v}) are the coordinates of T(\mathbf{x}_j) for j=1,2,\ldots,n. We may write these coordinates in an m-by-n rectangular array known as a matrix.

\displaystyle [T]_{B^{\prime},B}=\begin{bmatrix}a_{1,1}&a_{1,2}&a_{1,3}&\cdots&a_{1,n}\\a_{2,1}&a_{2,2}&a_{2,3}&\cdots&a_{2,n}\\a_{3,1}&a_{3,2}&a_{3,3}&\cdots&a_{3,n}\\\vdots&\vdots&\vdots&\ddots&\vdots\\a_{m,1}&a_{m,2}&a_{m,3}&\cdots&a_{m,n}\end{bmatrix}

Notice how for each j=1,2,\ldots,n, the j-th column of the matrix above is just

\displaystyle [T(\mathbf{x}_j)]_{B^{\prime}}=\begin{bmatrix}a_{1,j}\\a_{2,j}\\a_{3,j}\\\vdots\\a_{m,j}\end{bmatrix}.

Because we want [T]_{B^{\prime},B} to behave like T, we define

\begin{bmatrix}a_{1,1}&a_{1,2}&a_{1,3}&\cdots&a_{1,n}\\a_{2,1}&a_{2,2}&a_{2,3}&\cdots&a_{2,n}\\a_{3,1}&a_{3,2}&a_{3,3}&\cdots&a_{3,n}\\\vdots&\vdots&\vdots&\ddots&\vdots\\a_{m,1}&a_{m,2}&a_{m,3}&\cdots&a_{m,n}\end{bmatrix}\begin{bmatrix}c_1\\c_2\\c_3\\\vdots\\c_n\end{bmatrix}=[T]_{B^{\prime},B}[\mathbf{v}]_B:=[T(\mathbf{v})]_{B^{\prime}}=\begin{bmatrix}\sum_{j=1}^na_{1,j}c_j\\\sum_{j=1}^na_{2,j}c_j\\\sum_{j=1}^na_{3,j}c_j\\\vdots\\\sum_{j=1}^na_{m,j}c_j\end{bmatrix}.

Furthermore, if V^{\prime\prime} is yet another vector space over F, B^{\prime\prime} is a basis of V^{\prime\prime}, and S:V^{\prime}\rightarrow V^{\prime\prime} is a linear transformation, then, letting

\displaystyle [S]_{B^{\prime\prime},B^{\prime}}=\begin{bmatrix}b_{1,1}&b_{1,2}&b_{1,3}&\cdots&b_{1,m}\\b_{2,1}&b_{2,2}&b_{2,3}&\cdots&b_{2,m}\\b_{3,1}&b_{3,2}&b_{3,3}&\cdots&b_{3,m}\\\vdots&\vdots&\vdots&\ddots&\vdots\\b_{\ell,1}&b_{\ell,2}&b_{\ell,3}&\cdots&b_{\ell,m}\end{bmatrix},

we define the matrix product of [S]_{B^{\prime\prime},B^{\prime}} and [T]_{B^{\prime},B} by

\begin{bmatrix}b_{1,1}&b_{1,2}&b_{1,3}&\cdots&b_{1,m}\\b_{2,1}&b_{2,2}&b_{2,3}&\cdots&b_{2,m}\\b_{3,1}&b_{3,2}&b_{3,3}&\cdots&b_{3,m}\\\vdots&\vdots&\vdots&\ddots&\vdots\\b_{\ell,1}&b_{\ell,2}&b_{\ell,3}&\cdots&b_{\ell,m}\end{bmatrix}\begin{bmatrix}a_{1,1}&a_{1,2}&a_{1,3}&\cdots&a_{1,n}\\a_{2,1}&a_{2,2}&a_{2,3}&\cdots&a_{2,n}\\a_{3,1}&a_{3,2}&a_{3,3}&\cdots&a_{3,n}\\\vdots&\vdots&\vdots&\ddots&\vdots\\a_{m,1}&a_{m,2}&a_{m,3}&\cdots&a_{m,n}\end{bmatrix}=[S]_{B^{\prime\prime},B^{\prime}}[T]_{B^{\prime},B}:=[S\circ T]_{B^{\prime\prime},B}=\begin{bmatrix}\sum_{i=1}^mb_{1,i}a_{i,1}&\sum_{i=1}^mb_{1,i}a_{i,2}&\sum_{i=1}^mb_{1,i}a_{i,3}&\cdots&\sum_{i=1}^mb_{1,i}a_{i,n}\\\sum_{i=1}^mb_{2,i}a_{i,1}&\sum_{i=1}^mb_{2,i}a_{i,2}&\sum_{i=1}^mb_{2,i}a_{i,3}&\cdots&\sum_{i=1}^mb_{2,i}a_{i,n}\\\sum_{i=1}^mb_{3,i}a_{i,1}&\sum_{i=1}^mb_{3,i}a_{i,2}&\sum_{i=1}^mb_{3,i}a_{i,3}&\cdots&\sum_{i=1}^mb_{3,i}a_{i,n}\\\vdots&\vdots&\vdots&\ddots&\vdots\\\sum_{i=1}^mb_{\ell,i}a_{i,1}&\sum_{i=1}^mb_{\ell,i}a_{i,2}&\sum_{i=1}^mb_{\ell,i}a_{i,3}&\cdots&\sum_{i=1}^mb_{\ell,i}a_{i,n}\end{bmatrix},

so that matrix multiplication acts like the composition of linear transformations.

Note that [S]_{B^{\prime\prime},B^{\prime}} has the same number of columns as [T]_{B^{\prime},B} has rows (m in this case). This is because the domain (the set of inputs) of S is the same as the codomain (the set of possible outputs) of T. In fact, two matrices can be multiplied if and only if the number of columns in the left-hand matrix is the same as the number of rows in the right-hand matrix.

Matrices with the same number of rows and columns are known as square matrices, and what makes square matrices special is that they can be multiplied by themselves. This means it makes sense to recursively define non-negative integer powers of square matrices. Specifically, if \mathbf{A} is an m\times m matrix, then we define

\displaystyle \mathbf{A}^0:=\mathbf{I}_m:=\begin{bmatrix}1&0&0&\cdots&0\\0&1&0&\cdots&0\\0&0&1&\cdots&0\\\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&\cdots&1\end{bmatrix}_{m\times m}

and

\displaystyle \mathbf{A}^{n+1}:=\mathbf{A}\mathbf{A}^n

for all n\in\mathbb{N}.

Furthermore, if \mathbf{A} is invertible (i.e., there exists a matrix \mathbf{A}^{-1} such that \mathbf{A}\mathbf{A}^{-1}=\mathbf{A}^{-1}\mathbf{A}=\mathbf{I}_m), we can define

\displaystyle \mathbf{A}^{-n}:={\left(\mathbf{A}^{-1}\right)}^n

for all n\in\mathbb{N}.

Square matrices whose entries outside the main diagonal are all equal to 0 are known as diagonal matrices, and they have the nice property that raising them to a given power is the same as raising their entries along the main diagonal to that power.

\displaystyle \begin{bmatrix}d_1&0&0&\cdots&0\\0&d_2&0&\cdots&0\\0&0&d_3&\cdots&0\\\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&\cdots&d_m\end{bmatrix}^n=\begin{bmatrix}d_1^n&0&0&\cdots&0\\0&d_2^n&0&\cdots&0\\0&0&d_3^n&\cdots&0\\\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&\cdots&d_m^n\end{bmatrix}

This gives us an easy way to compute arbitrarily high powers of diagonalizable matrices, matrices which can be written in the form \mathbf{PD}\mathbf{P}^{-1} where \mathbf{D} is a diagonal matrix and \mathbf{P} is an invertible matrix.

\displaystyle \mathbf{A}^n={\left(\mathbf{PD}\mathbf{P}^{-1}\right)}^n=\underbrace{{\left(\mathbf{PD}\mathbf{P}^{-1}\right)}{\left(\mathbf{PD}\mathbf{P}^{-1}\right)}\cdots{\left(\mathbf{PD}\mathbf{P}^{-1}\right)}}_n=\mathbf{P}\underbrace{\mathbf{DD\cdots D}}_n\mathbf{P}^{-1}=\mathbf{P}\mathbf{D}^n\mathbf{P}^{-1}

Notice how the adjacent \mathbf{P}s and \mathbf{P}^{-1}s cancel each other out.

Interestingly, it also gives us a way to raise diagonalizable matrices to fractional powers, and I’ll end this section with an example of a ‘square root’ of a diagonalizable matrix.

\displaystyle \begin{bmatrix}1&2\\3&4\end{bmatrix}^{1/2}=\begin{bmatrix}-\frac{3+\sqrt{33}}{6}&\frac{\sqrt{33}-3}{6}\\1&1\end{bmatrix}\begin{bmatrix}\frac{5-\sqrt{33}}{2}&0\\0&\frac{5+\sqrt{33}}{2}\end{bmatrix}^{1/2}\begin{bmatrix}-\frac{\sqrt{33}}{11}&\frac{11-\sqrt{33}}{22}\\\frac{\sqrt{33}}{11}&\frac{11+\sqrt{33}}{22}\end{bmatrix}=\\\begin{bmatrix}-\frac{3+\sqrt{33}}{6}&\frac{\sqrt{33}-3}{6}\\1&1\end{bmatrix}\begin{bmatrix}\sqrt{\frac{5-\sqrt{33}}{2}}&0\\0&\sqrt{\frac{5+\sqrt{33}}{2}}\end{bmatrix}\begin{bmatrix}-\frac{\sqrt{33}}{11}&\frac{11-\sqrt{33}}{22}\\\frac{\sqrt{33}}{11}&\frac{11+\sqrt{33}}{22}\end{bmatrix}=\\\frac{1}{\sqrt{33}}\begin{bmatrix}\sqrt{3+12\sqrt{-2}}&\sqrt{20-8\sqrt{-2}}\\\sqrt{45-18\sqrt{-2}}&\sqrt{102+12\sqrt{-2}}\end{bmatrix}\approx\begin{bmatrix}0.554+0.464\mathrm{i}&0.807-0.212\mathrm{i}\\1.21-0.319\mathrm{i}&1.76+0.146\mathrm{i}\end{bmatrix}.

Exercise 1:

Show that

[S\circ T]_{B^{\prime\prime},B}=\begin{bmatrix}\sum_{i=1}^mb_{1,i}a_{i,1}&\sum_{i=1}^mb_{1,i}a_{i,2}&\sum_{i=1}^mb_{1,i}a_{i,3}&\cdots&\sum_{i=1}^mb_{1,i}a_{i,n}\\\sum_{i=1}^mb_{2,i}a_{i,1}&\sum_{i=1}^mb_{2,i}a_{i,2}&\sum_{i=1}^mb_{2,i}a_{i,3}&\cdots&\sum_{i=1}^mb_{2,i}a_{i,n}\\\sum_{i=1}^mb_{3,i}a_{i,1}&\sum_{i=1}^mb_{3,i}a_{i,2}&\sum_{i=1}^mb_{3,i}a_{i,3}&\cdots&\sum_{i=1}^mb_{3,i}a_{i,n}\\\vdots&\vdots&\vdots&\ddots&\vdots\\\sum_{i=1}^mb_{\ell,i}a_{i,1}&\sum_{i=1}^mb_{\ell,i}a_{i,2}&\sum_{i=1}^mb_{\ell,i}a_{i,3}&\cdots&\sum_{i=1}^mb_{\ell,i}a_{i,n}\end{bmatrix}.

Exercise 2:

Show that

\displaystyle \begin{bmatrix}d_1&0&0&\cdots&0\\0&d_2&0&\cdots&0\\0&0&d_3&\cdots&0\\\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&\cdots&d_m\end{bmatrix}^n=\begin{bmatrix}d_1^n&0&0&\cdots&0\\0&d_2^n&0&\cdots&0\\0&0&d_3^n&\cdots&0\\\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&\cdots&d_m^n\end{bmatrix}

for all n\in\mathbb{N}.

Schröder’s Equation

A function f:X\rightarrow X composed with itself a certain number of times is known as an iterated function. Normally, we denote \underbrace{f\circ f\circ\cdots\circ f}_n by f^n, but f^n can also mean f raised to the power of n (e.g., \sin^2(x) almost always means \sin(x)^2 and not \sin(\sin(x))), so we’ll denote \underbrace{f\circ f\circ\cdots\circ f}_n by f^{\circ n} instead to avoid ambiguity.

Formally, we define f^{\circ n} recursively as follows:

f^{\circ0}:=\mathrm{id}_X

and

f^{\circ n+1}:=f\circ f^{\circ n}.

\mathrm{id}_X, by the way, is the identity function, which maps each element of X to itself.

If f has an inverse f^{\circ-1}, then we can define f^{\circ-n} by

\displaystyle f^{\circ-n}:={\left(f^{\circ-1}\right)}^{\circ n}

for all n\in\mathbb{N}.

Now in dynamics, its often useful to consider dynamical systems defined by functional iteration, treating the number of iterations t as time.

\mathit{\Phi}(t,x):=f^{\circ t}(x)

For t\in\mathbb{Z}, evaluating such a system’s state at t is as simple as applying some function f or its inverse to the initial state x a certain amount of times.

For t\notin\mathbb{Z}, however, it’s not that easy. I mean, you can’t apply f to x a non-integer number of times, so if we want to construct a smooth orbit for our dynamical system, we’ll have to find the fractional iterates of f. This is akin to the problem of finding fractional powers of square matrices, which we’re able to do via diagonalization, so the question is: can we do something similar to find fractional iterates of real/complex analytic functions?

Surprisingly, the answer is yes, and the analogue to diagonalization for analytic functions is Schröder’s equation, named after German mathematician Ernst Schröder (not to be confused with Erwin Schrödinger).

Given a function f and a constant s, we are asked to solve for a function \psi such that

\displaystyle \psi\circ f=s\psi,

or alternatively,

\displaystyle \psi(f(x))=s\psi(x)

for all x in the domain of f. If \psi has an inverse \psi^{\circ-1}, then we can apply \psi^{\circ-1} to both sides and obtain

\displaystyle f=\psi^{\circ-1}\circ s\psi=\psi^{\circ-1}\circ(x\mapsto sx)\circ\psi,

and you might see where this is going from here. Iterating f gives us

\displaystyle f^{\circ n}=\underbrace{\left(\psi^{\circ-1}\circ(x\mapsto sx)\circ\psi\right)\circ\left(\psi^{\circ-1}\circ(x\mapsto sx)\circ\psi\right)\circ\cdots\circ\left(\psi^{\circ-1}\circ(x\mapsto sx)\circ\psi\right)}_n=\psi^{\circ-1}\circ{(x\mapsto sx)}^{\circ n}\circ\psi=\psi^{\circ-1}\circ(x\mapsto s^nx)\circ\psi=\psi^{\circ-1}\circ s^n\psi,

or alternatively,

\displaystyle f^{\circ n}(x)=\psi^{\circ-1}(s^n\psi(x)).

Functional iteration has now been transformed into exponentiation! What this means is that, as long as we can solve for \psi, we can solve for f^{\circ t} with t not necessarily in \mathbb{Z}.

But how do we solve for \psi? How can we tell whether or not a solution for \psi even exists?

Solving Schröder’s Equation I

No matter our choice of f, there’s always one trivial solution to Schröder’s equation: \psi:x\mapsto0. Indeed, we see that \psi(f(x))=s\psi(x)=0 for all x. However, this solution isn’t helpful because it’s not invertible, or in other words, \psi^{\circ-1} doesn’t exist.

So, let’s assume that a non-trivial solution to Schröder’s equation exists. Typically, solving for constants is easier than solving for functions in a functional equation, so a good first course of action would be to solve for s.

Because s doesn’t depend on x, we should substitute x with a value which simplifies our expression somehow. A good candidate would be a fixed point of f, which we’ll denote by a, since f(a)=a by definition.

Assuming \psi(a) exists, substituting a into Schröder’s equation gives us

\displaystyle \psi(f(a))=\psi(a)=s\psi(a),

and we’re now left with only two possibilities: either \psi(a)=0 or \psi(a)\neq0 and s=1. If s=1, then Schröder’s equation becomes effectively useless for finding iterates of f. This is because 1^t=1 for all t, so we’re left with

\displaystyle f^{\circ t}(x)=\psi^{\circ-1}{\left(s^t\psi(x)\right)}=\psi^{\circ-1}(\psi(x))=x.

Thus, we’ll only focus on the case where \psi(a)=0. Taking the derivative of both sides of Schröder’s equation with respect to x yields

\displaystyle f^{\prime}(x)\psi^{\prime}(f(x))=s\psi^{\prime}(x),

and once again, we’ll substitute a into this new equation.

\displaystyle f^{\prime}(a)\psi^{\prime}(a)=s\psi^{\prime}(a)

Because we might want \psi to be analytic at a, we’ll assume \psi^{\prime}(a) exists. We’re again left with two possibilities: either \psi^{\prime}(a)=0 or \psi^{\prime}(a)\neq0 and s=f^{\prime}(a). If \psi^{\prime}(a)=0, then

\displaystyle {\left(\psi^{\circ-1}\right)}^{\prime}(0)=\frac{1}{\psi^{\prime}(\psi^{\circ-1}(0))}=\frac{1}{\psi^{\prime}(a)}=\frac{1}{0},

so (\psi^{\circ-1})^{\prime}(0) doesn’t exist, and hence, \psi^{\circ-1} is not analytic at 0. However, one might want \psi^{\circ-1} to be analytic at 0, so we’ll assume \psi^{\prime}(a)\neq0. Consequently, s=f^{\prime}(a).

We can now rewrite Schröder’s equation like this:

\displaystyle \psi(f(x))=f^{\prime}(a)\psi(x).

Fixed Points

As a side note, there are three types of fixed points of differentiable functions.

The first is the attracting fixed point. A fixed point a of f is said to be attracting if 0\leq|f^{\prime}(a)|<1.

Cobweb plot of a function f with an attracting fixed point a. Notice how f^{\circ n}(x_0)\rightarrow a as n\rightarrow\infty.

The second is the repelling fixed point. A fixed point a of f is said to be repelling if |f^{\prime}(a)|>1.

Cobweb plot of a function f with a repelling fixed point a.

Finally, the third is the neutral fixed point. A fixed point a of f is said to be neutral if |f^{\prime}(a)|=1.

A subtype of the attracting fixed point is the superattracting fixed point. A fixed point a of f is said to be superattracting if f^{\prime}(a)=0.

Cobweb plot of a function f with a superattracting fixed point a.

Solving Schröder’s Equation II

To answer whether or not a non-trivial solution for \psi exists, we turn to a sufficient, though not necessary, condition given by French mathematician Gabriel Koenigs in 1884: if a is an attracting, but not superattracting, fixed point of f and f is analytic on the unit disk centered at a, then we are guaranteed a unique analytic function \mathit{\Psi} such that \mathit{\Psi}^{\prime}(a)=1 and \mathit{\Psi}\circ f=f^{\prime}(a)\mathit{\Psi} (). \mathit{\Psi}, by the way, is known as the Koenigs function.

To actually compute \mathit{\Psi}, we first note that \mathit{\Psi}(a)=0 and \mathit{\Psi}^{\prime}(a)=1. With this, we can find the linear approximation of \mathit{\Psi} at a.

\displaystyle \mathit{\Psi}(x)\approx x-a

We can improve this approximation by substituting it into Schröder’s equation as such:

\displaystyle f(x)-a\approx\mathit{\Psi}(f(x))=f^{\prime}(a)\mathit{\Psi}(x).

Cancelling f^{\prime}(a) on the right-hand side, we get

\displaystyle \mathit{\Psi}(x)\approx {f^{\prime}(a)}^{-1}(f(x)-a).

We can do this again to get an even better approximation.

\displaystyle {f^{\prime}(a)}^{-1}(f(f(x))-a)\approx\mathit{\Psi}(f(x))=f^{\prime}(a)\mathit{\Psi}(x)

\displaystyle \mathit{\Psi}(x)\approx {f^{\prime}(a)}^{-2}(f(f(x))-a)

In fact, we can keep doing this to obtain better and better approximations of \mathit{\Psi}.

\displaystyle \mathit{\Psi}(x)\approx {f^{\prime}(a)}^{-n}(f^{\circ n}(x)-a)

Taking the limit as n\rightarrow\infty makes our approximation exact.

\displaystyle \mathit{\Psi}(x)=\lim_{n\rightarrow\infty}{f^{\prime}(a)}^{-n}(f^{\circ n}(x)-a)

Indeed, we can check that

\displaystyle \mathit{\Psi}(f(x))=\lim_{n\rightarrow\infty}{f^{\prime}(a)}^{-n}(f^{\circ n}(f(x))-a)=\lim_{n\rightarrow\infty}{f^{\prime}(a)}^{-n}{\left(f^{\circ n+1}(x)-a\right)}=\\\lim_{n\rightarrow\infty}{f^{\prime}(a)}^{1-n}(f^{\circ n}(x)-a)=f^{\prime}(a)\lim_{n\rightarrow\infty}{f^{\prime}(a)}^{-n}(f^{\circ n}(x)-a)=f^{\prime}(a)\mathit{\Psi}(x).

Graph of \mathit{\Psi} with f=\cos and a\approx0.739085\ldots.

It should be noted that the reason this method works is because a is an attracting fixed point of f, so for x sufficiently close to a, f^{\circ n}(x)\rightarrow a as n\rightarrow\infty. Specifically, for x really close to a, f^{\circ n}(x)-a roughly exponentially decays at the same rate that {f^{\prime}(a)}^{-n} grows as n\rightarrow\infty, so the entire limit converges for x within a certain neighborhood of a.

Through similar reasoning, one can show that \mathit{\Psi}^{\circ-1} can be expressed by the limit

\displaystyle \mathit{\Psi}^{\circ-1}(x)=\lim_{n\rightarrow\infty}f^{\circ-n}{\left({f^{\prime}(a)}^nx+a\right)}.

Therefore,

\displaystyle f^{\circ t}(x)=\mathit{\Psi}^{\circ-1}{\left(s^t\mathit{\Psi}(x)\right)}=\lim_{n\rightarrow\infty}f^{\circ-n}{\left({f^{\prime}(a)}^t(f^{\circ n}(x)-a)+a\right)}.

Challenge Exercise (Complex Analysis):

Prove Gabriel Koenigs’ 1884 theorem.

Carleman Matrices

The representation theorists among you might recall that linear transformations (and hence, matrices) can represent more abstract mathematical objects. For example, rotations about the origin of Euclidean space can be represented by rotation matrices, with the multiplication of rotation matrices representing the composition of rotations.

This raises the question: does a representation of analytic functions exist, and if so, would Schröder’s equation translate into matrix diagonalization under such a representation?

Let f be analytic at 0. By the analyticity of f at 0, the Maclaurin series

\displaystyle \sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}x^n

converges to f(x) for x sufficiently close to 0. We may interpret this sum as a dot product of two infinite-dimensional vectors.

\displaystyle \begin{bmatrix}f(0)\\f^{\prime}(0)\\f^{\prime\prime}(0)/2\\f^{\prime\prime\prime}(0)/6\\\vdots\end{bmatrix}\cdot\begin{bmatrix}1\\x\\x^2\\x^3\\\vdots\end{bmatrix}=\sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}x^n=f(x)

Alternatively, we may view it as the sole entry of the matrix product of an infinite-dimensional row vector and an infinite-dimensional column vector.

\displaystyle \begin{bmatrix}f(0)&f^{\prime}(0)&f^{\prime\prime}(0)/2&f^{\prime\prime\prime}(0)/6&\cdots\end{bmatrix}\begin{bmatrix}1\\x\\x^2\\x^3\\\vdots\end{bmatrix}=\begin{bmatrix}\displaystyle \sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}x^n\end{bmatrix}=\begin{bmatrix}f(x)\end{bmatrix}

It’s tempting to consider the row vector a representation of f, the column vector a representation of x, and the 1\times1 matrix a representation of f(x). However, if x is to be represented by an infinite-dimensional column vector, then so should f(x). Specifically, f(x) should be represented by the column vector

\displaystyle \begin{bmatrix}1\\f(x)\\{f(x)}^2\\{f(x)}^3\\\vdots\end{bmatrix},

and not the 1\times 1 matrix \begin{bmatrix}f(x)\end{bmatrix}, so instead of representing f by an infinite-dimensional row vector, we must instead represent it by the infinite matrix

\displaystyle \mathcal{C}[f]:=\begin{bmatrix}\mathcal{C}[f]_{0,0}&\mathcal{C}[f]_{0,1}&\mathcal{C}[f]_{0,2}&\mathcal{C}[f]_{0,3}&\cdots\\\mathcal{C}[f]_{1,0}&\mathcal{C}[f]_{1,1}&\mathcal{C}[f]_{1,2}&\mathcal{C}[f]_{1,3}&\cdots\\\mathcal{C}[f]_{2,0}&\mathcal{C}[f]_{2,1}&\mathcal{C}[f]_{2,2}&\mathcal{C}[f]_{2,3}&\cdots\\\mathcal{C}[f]_{3,0}&\mathcal{C}[f]_{3,1}&\mathcal{C}[f]_{3,2}&\mathcal{C}[f]_{3,3}&\cdots\\\vdots&\vdots&\vdots&\vdots&\ddots\end{bmatrix}

where

\displaystyle \mathcal{C}[f]_{m,n}:=\left.\frac{1}{n!}\frac{\mathrm{d}^n}{{\mathrm{d}x}^n}{f(x)}^m\right|_{x=0}

is the n-th coefficient of the Maclaurin series of {f(x)}^m. Indeed one can check that,

\displaystyle \mathcal{C}[f]\begin{bmatrix}1\\x\\x^2\\x^3\\\vdots\end{bmatrix}=\begin{bmatrix}\mathcal{C}[f]_{0,0}&\mathcal{C}[f]_{0,1}&\mathcal{C}[f]_{0,2}&\mathcal{C}[f]_{0,3}&\cdots\\\mathcal{C}[f]_{1,0}&\mathcal{C}[f]_{1,1}&\mathcal{C}[f]_{1,2}&\mathcal{C}[f]_{1,3}&\cdots\\\mathcal{C}[f]_{2,0}&\mathcal{C}[f]_{2,1}&\mathcal{C}[f]_{2,2}&\mathcal{C}[f]_{2,3}&\cdots\\\mathcal{C}[f]_{3,0}&\mathcal{C}[f]_{3,1}&\mathcal{C}[f]_{3,2}&\mathcal{C}[f]_{3,3}&\cdots\\\vdots&\vdots&\vdots&\vdots&\ddots\end{bmatrix}\begin{bmatrix}1\\x\\x^2\\x^3\\\vdots\end{bmatrix}=\\\begin{bmatrix}\sum_{n=0}^{\infty}\mathcal{C}[f]_{0,n}x^n\\\sum_{n=0}^{\infty}\mathcal{C}[f]_{1,n}x^n\\\sum_{n=0}^{\infty}\mathcal{C}[f]_{2,n}x^n\\\sum_{n=0}^{\infty}\mathcal{C}[f]_{3,n}x^n\\\vdots\end{bmatrix}=\begin{bmatrix}1\\f(x)\\{f(x)}^2\\{f(x)}^3\\\vdots\end{bmatrix},

so \mathcal{C}[f] is a valid representation of f. Furthermore,

\displaystyle \mathcal{C}[f]\mathcal{C}[g]\begin{bmatrix}1\\x\\x^2\\x^3\\\vdots\end{bmatrix}=\mathcal{C}[f]\begin{bmatrix}1\\g(x)\\{g(x)}^2\\{g(x)}^3\\\vdots\end{bmatrix}=\begin{bmatrix}1\\f(g(x))\\{f(g(x))}^2\\{f(g(x))}^3\\\vdots\end{bmatrix}=\mathcal{C}[f\circ g]\begin{bmatrix}1\\x\\x^2\\x^3\\\vdots\end{bmatrix},

so

\displaystyle \mathcal{C}[f\circ g]=\mathcal{C}[f]\mathcal{C}[g].

We refer to \mathcal{C}[f] as the Carleman matrix of f, named after Swedish mathematician Torsten Carleman.

Under this representation, Schröder’s equation translates into

\displaystyle \mathcal{C}[\psi]\mathcal{C}[f]=\mathcal{C}[x\mapsto sx]\mathcal{C}[\psi],

so if \mathcal{C}[\psi] is invertible, then

\displaystyle \mathcal{C}[f]={\mathcal{C}[\psi]}^{-1}\mathcal{C}[x\mapsto sx]\mathcal{C}[\psi].

\mathcal{C}[x\mapsto sx] can then be readily worked out as

\displaystyle \mathcal{C}[x\mapsto sx]=\begin{bmatrix}1&0&0&0&\cdots\\0&s&0&0&\cdots\\0&0&s^2&0&\cdots\\0&0&0&s^3&\cdots\\\vdots&\vdots&\vdots&\vdots&\ddots\end{bmatrix},

which is a diagonal matrix! The punchline is that Schröder’s equation does, in fact, translate into diagonalization under Carleman’s matrix representation of analytic functions.

Exercise:

Prove that \mathcal{C}[x\mapsto sx] is a diagonal matrix for any s\in\mathbb{C}.

Addendum: Abel’s Equation and Böttcher’s Equation

Applying \log_s to both sides of Schröder’s equation and replacing \mathrm{log}_s\circ\psi with \alpha yields the Abel equation, named after Norwegian mathematician Niels Henrik Abel.

\alpha(f(x))=\alpha(x)+1

Meanwhile, applying \exp to both sides of Schröder’s equation and replacing \mathrm{exp}\circ\psi with \phi gives us Böttcher’s equation, named after Polish mathematician Lucjan Böttcher.

\phi(f(x))={\phi(x)}^s

When a is a superattracting fixed point of f, solving Böttcher’s equation tends to be less cumbersome than solving Schröder’s equation, so it’s sometimes useful to convert Schröder’s equation into Böttcher’s equation.

Addendum: Tetration

As stated in the introduction, tetration can be seen as repeated exponentiation in the same way that exponentiation can be seen as repeated multiplication and multiplication can be seen as repeated addition. For natural heights, we define

\displaystyle ^{0}b:=1

and

\displaystyle ^{n+1}b:=b^{^{n}b}

for n\in\mathbb{N}.

Alternatively,

\displaystyle ^{n}b:=\exp_b^{\circ n}(1).

For b in the interior of the Shell-Thron region, the set of z\in\mathbb{C} such that \lim_{n\rightarrow\infty}{}^nz exists, \exp_b has the attracting fixed point a=-\mathrm{W}(-\ln(b))/\ln(b) where \mathrm{W} is the principal branch of the Lambert W function (the inverse of x\mapsto x\mathrm{e}^x), so Schröder’s equation can be used to find and calculate \exp_b^{\circ t} for t\notin\mathbb{Z}, and hence, extend \mathrm{tet}_b to non-integer heights.

We can also calculate the inverse of tetration, known as the super-logarithm, using Schröder’s equation. We denote the base-b super-logarithm by \mathrm{slog}_b.

The Shell-Thron region.
Graphs of \mathit{\Psi} and \mathit{\Psi}^{\circ-1} with f=\exp_{\sqrt{2}} and a=2.
Domain coloring of \mathit{\Psi} with f=\exp_{\sqrt{2}} and a=2.
Domain coloring of \mathit{\Psi}^{\circ-1} with f=\exp_{\sqrt{2}} and a=2.
Graphs of the identity function, \exp_{\sqrt{2}}^{\circ1/2}, and \exp_{\sqrt{2}}.
Domain coloring of \exp_{\sqrt{2}}^{\circ1/2}.
Graphs of \mathrm{tet}_{\sqrt{2}} and \mathrm{slog}_{\sqrt{2}}.
Domain coloring of \mathrm{tet}_{\sqrt{2}}.
Domain coloring of \mathrm{slog}_{\sqrt{2}}.
Domain coloring of \mathrm{tet}_{1/4}.
Domain coloring of \mathrm{tet}_{1/2}.
Domain coloring of \mathrm{tet}_{3/4}.
Domain coloring of \mathrm{tet}_{5/4}.
Domain coloring of \mathrm{tet}_{\mathrm{i}}.
Domain coloring of \mathrm{tet}_{1+\mathrm{i}}.
Domain coloring of \mathrm{slog}_{1/2}.
Domain coloring of \mathrm{slog}_{1+\mathrm{i}}.

We can also define a function, which I’ll call the product tetrational function and denote with \mathrm{\Pi tet}_b, by

\displaystyle \mathrm{\Pi tet}_b(n):=\prod_{k=0}^n{}^kb=bb^bb^{b^b}\cdots{}^nb.

As it turns out, the equality

\displaystyle \mathrm{\Pi tet}_b(n)=\frac{\mathrm{tet}_b^{\prime}(n)}{\mathrm{tet}_b^{\prime}(0){\ln(b)}^n}

holds for n\in\mathbb{N}, so we’re left with a valid extension of \mathrm{\Pi tet}_b to complex inputs.

\displaystyle \mathrm{\Pi tet}_b(z)=\frac{\mathrm{tet}_b^{\prime}(z)}{\mathrm{tet}_b^{\prime}(0){\ln(b)}^z}

A similar function, which I’ll call the sum tetrational function and denote with \mathrm{\Sigma tet}_b, can be defined by

\displaystyle \mathrm{\Sigma tet}_b(n):=\sum_{k=0}^n{}^kb=1+b+b^b+\cdots+{}^nb,

and it can readily be shown that

\displaystyle b^{\mathrm{\Sigma tet}_b(n-1)}=b^{1+b+b^b+\cdots+{}^{n-1}b}=bb^bb^{b^b}\cdots{}^nb=\mathrm{\Pi tet}_b(n),

so we can extend \mathrm{\Sigma tet}_b as such:

\displaystyle \mathrm{\Sigma tet}_b(z)=\log_b(\mathrm{\Pi tet}_b(z+1))+\frac{2\pi\mathrm{i}}{\ln(b)}N(z)

where N:\mathbb{C}\rightarrow\mathbb{Z} controls the branch cuts of \mathrm{\Sigma tet}_b.

Graphs of \mathrm{\Pi tet}_{\sqrt{2}} and \mathrm{\Sigma tet}_{\sqrt{2}}.
Domain coloring of \mathrm{\Pi tet}_{\sqrt{2}}.
Domain coloring of \mathrm{\Sigma tet}_{\sqrt{2}}.

Exercise 1:

Show that

\displaystyle a=-\frac{\mathrm{W}(-\ln(b))}{\ln(b)}

is a fixed point of \exp_b for b\neq0,1.

Exercise 2:

Show that \mathrm{tet}_b has period

\displaystyle T=\frac{2\pi\mathrm{i}}{\ln(-\mathrm{W}(-\ln(b)))}

for all b\neq1 in the interior of the Shell-Thron region.

Exercise 3:

Prove that

\displaystyle \mathrm{\Pi tet}_b(n)=\frac{\mathrm{tet}_b^{\prime}(n)}{\mathrm{tet}_b^{\prime}(0){\ln(b)}^n}

holds for all n\in\mathbb{N}.

Notes

* Here, b^x:=\mathrm{e}^{x\ln(b)} where \ln is the principal branch of the natural logarithm. Complex exponentiation isn’t always well-defined since logarithms are multivalued functions, so in most cases, we have to choose a principal value. For example, we choose \mathrm{i}^{\mathrm{i}} to be equal to \mathrm{e}^{\mathrm{i}\ln(\mathrm{i})}=\mathrm{e}^{-\pi/2} by convention, but we could also choose \mathrm{i}^{\mathrm{i}} to be equal to \mathrm{e}^{\mathrm{i}(\ln(\mathrm{i})+2\pi\mathrm{i}n)}=\mathrm{e}^{-\pi/2-2\pi n} for any n\in\mathbb{Z}.

† Koenigs’ original statement has the extra condition that a be equal to 0. However, it’s possible to show that this seemingly stronger statement in this post is, in fact, equivalent to the original, which I’ll leave as an exercise for the reader.

Sources and Additional Reading

https://linear.axler.net/

https://en.wikipedia.org/wiki/Diagonalizable_matrix

https://en.wikipedia.org/wiki/Square_root_of_a_matrix

https://en.wikipedia.org/wiki/Iterated_function

https://en.wikipedia.org/wiki/Schr%C3%B6der%27s_equation

https://en.wikipedia.org/wiki/Koenigs_function

https://arxiv.org/abs/math/9201272

Click to access 39SchroederEqSV.pdf

https://en.wikipedia.org/wiki/Carleman_matrix

https://en.wikipedia.org/wiki/Abel_equation

https://en.wikipedia.org/wiki/B%C3%B6ttcher%27s_equation

https://en.wikipedia.org/wiki/Tetration

Click to access tetration2.pdf

https://link.springer.com/article/10.1007/s10444-018-9615-7


Special thanks to Grant Sanderson and James Schloss for hosting the annual Summer of Math Exposition.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: