## Multiplying polynomials in nearly-linear time without leaving the ring of integers

The problem is really simple: given two polynomials: $p(x) = p_0 + p_1 x + p_2 x^2 + \hdots + p_{a} x^{a}$ and $q(x) = q_0 + q_1 x + q_2 x^2 + \hdots + q_{b} x^{b}$ with integer coefficients, how to compute the product $r(x) := p(x) \cdot q(x)$. The product is a polynomial of degree $a+b$ where the coefficient $r_k = \sum_{i,j \vert i+j=k} p_i \cdot q_j$. The definition suggests an algorithm of complexity $O(a\cdot b) = O(n^2)$ where $n = a+b+1$.

Surprisingly, one can do it a lot faster: in fact, one can calculate the coefficients of $p(x) \cdot q(x)$ in time $O(n \log n)$ if we are allowed to use complex numbers. This is one of those “a-ha” algorithms: the idea is very simple but incredibly ingenious — and a proof that surprising algorithm do exist. The algorithm is based on the following idea: if we know the value of a polynomial of degree $d$ on $d+1$ points, then we can reconstruct it exactly. So, if one can evaluate polynomials $p$ and $q$ on $ab+1=n$ points, say $\{ x_0 x_1, \hdots, x_{n-1}\}$ then we can calculate $r(x_i) = p(x_i) \cdot q(x_i)$ and reconstruct $r$. Notice that once we calculate $p(x_i), q(x_i)$, obtaining $r(x_i)$ is takes only $O(n)$ operations.

So far, we only shifter the complexity around: while the actual multiplication takes $O(n)$ we still need to evaluate the polynomials $n$ times and then reconstruct $r$ from the values $r(x_0), r(x_1), \hdots, r(x_{n-1})$. The idea is to choose values $x_0, \hdots, x_{n-1}$ carefully in such a way that we can evaluate those polynomials really fast.

Enter complex numbers ! A choice that works is to take $x_i = \omega_n^i$ where $\omega_n = e^{2 \pi i / n} \in \mathbb{C}$, i.e., $n$-th root of the unit. And here is the reason why we can multiply polynomials so fast:

$\begin{bmatrix} p(\omega_n^0) \\ p(\omega_n^1) \\ \vdots \\ p(\omega_n^{n-1}) \end{bmatrix} = \begin{bmatrix} 1 & 1 & \hdots & 1 \\ 1 & \omega_n^1 & \hdots & \omega_n^{n-1} \\ \vdots & \vdots & & \vdots \\ 1 & \omega_n^{n-1} & \hdots & \omega_n^{(n-1)^2} \end{bmatrix} \cdot \begin{bmatrix} p_0 \\ p_1 \\ \vdots \\ p_n \end{bmatrix}$

In other words, if we see $p$ and $p(\omega^i_n) = [p(\omega_n^i)]_i$ as vectors of dimension $n$, then we can write them as $p(\omega^i_n) = A_n \cdot p$ where $A$ is a matrix $n \times n$ with entries $\omega_n^{ij}$ for $i,j=0,\hdots,n-1$. Now, we use the power of recursion. Assuming $n$ is a power of $2$ to simplify things, we first note that:

\begin{aligned} p(x_i) & = \sum_{j=0}^{n-1} p_j \omega_n^{ji} = \sum_{j=0}^{n/2 - 1} p_{2j} \omega_n^{2ji} + \sum_{j=0}^{n/2 - 1} p_{2j+1} \omega_n^{(2j+1)i} = \\ & = \sum_{j=0}^{n/2 - 1} p_{2j} \omega_{n/2}^{ji} + \omega_n^i \cdot \sum_{j=0}^{n/2 - 1} p_{2j+1} \omega_{n/2}^{ji} \end{aligned}

since $\omega_n^2 = (e^{2 \pi i / n})^2 = e^{2 \pi i / (n/2)} = \omega_{n/2}$. So, if we define $p_{\text{even}} = [p_{2j}]_{j=0..n/2 - 1}$ and $p_{\text{odd}} = [p_{2j+1}]_{j=0..n/2 - 1}$, then we see that we can compute in linear time the coefficients of $p(\omega^i_n)$ from the coefficients of $A_{n/2} p_{\text{even}}$ and $A_{n/2} p_{\text{odd}}$.

So, the problem of computing the coefficients of $p(\omega^i_n)$ corresponds to computing two problems of half the size plus some $O(n)$ computation. This gives a total time of $O(n \log n)$. The algorithm works as follows:

% assumes length of p is a power of 2
function w = fft(p)
n = length(p);  omega = exp(2 * pi * i * sgn / n);
if (n == 1)
w = p;
return;
endif
p_even = p(1:2:n);  w_even = fft(p_even);
p_odd  = p(2:2:n);  w_odd  = fft(p_odd);
for j = 0 : n-1
w(j+1) = w_even(mod(j,n/2)+1) + omega^j * w_odd(mod(j,n/2)+1);
endfor
endfunction


We call the method fft since this is also known as the Fast Fourier Transform. We are almost there: we know how to compute $p(\omega_n^i), q(\omega_n^i)$ and therefore $r(\omega_n^i)$. We only need to get the coefficients of $r$. Notice that $r(\omega_n^i) = A_n r$, so we need only to solve this linear system.

How to do it ? Another “a-ha”: the matrix $A_n^{-1} = \frac{1}{n}[\omega_{n}^{-ij}]$ for $i,j = 0, \hdots, n-1$. The reason for that is that: $\sum_{k=0}^{n-1} \omega_n^{ik} \omega_n^{-kj} = \sum_{k=0}^{n-1} \omega_n^{k(i-j)}$. If $i=j$, then this is clearly $n$. If $i \neq j$, this zero, since this is the sum of all roots of the unit. Therefore, we can calculate $A_n$ and its inverse in a very easy way:

function w = fft_aux(p, sgn)
n = length(p);  omega = exp(2 * pi * i * sgn / n);
if (n == 1)
w = p;
return;
endif
p_even = p(1:2:n);  w_even = fft_aux(p_even, sgn);
p_odd  = p(2:2:n);  w_odd  = fft_aux(p_odd, sgn);
for j = 0 : n-1
w(j+1) = w_even(mod(j,n/2)+1) + omega^j * w_odd(mod(j,n/2)+1);
endfor
endfunction

function w = fft(p)
w = fft_aux(p, +1);
endfunction

function w = fft_inv(p)
w = fft_aux(p,-1) ./ length(p);
endfunction


Now, we have all the pieces to multiply to polynomials:

function r = multiply_poly(p, q)
n = length(p) + length(q) - 1;
p = expand_to_power_of_2(p, n);
q = expand_to_power_of_2(q, n);
r = fft_inv(fft(p) .* fft(q));
r = resize(r, 1, n);
endfunction

% pads p with zeros until it has length equal to
% the first power of 2 greater of equal n
function p = expand_to_power_of_2(p, n)
plen = 1;
while (plen < n) plen = plen * 2; endwhile
p = resize(p, 1, plen);
endfunction


And now we are done ! We multiplied two polynomials in time $O(n \log n)$. Even though our initial objective was to multiply two polynomials with integer coefficients, we had to leave the ring of integers, go to the field of complex numbers, multiply things over there and then come back to the ring of integers. Those things happen all the time in algebra, that the answer for a question involves leaving the space in which a question is asked, solving the question in a higher space and coming back in the end. But here we ask: can we multiply two polynomials doing only operations over the ring of integers ?

Schonhage and Strassen give an algorithm that runs in time $O(n \log n \log \log n)$ that does exactly that (this later improved here and here). The idea is to try to mimic the FFT-algorithm without using complex numbers. What we needed was an element $\omega$ with three properties: (i) $\omega^n = 1$, (ii) $\sum_{i=0}^{n-1} \omega^i = 0$ and (iii) $\{\omega^0, \omega^1, \hdots, \omega^{n-1}\}$ are $n$ distinct elements.

The idea of the algorithm is to work with some sort of symbolic $\omega$ that has the properties listed above. Let's begin by fixing some notation, let $\mathbb{Z}[x]$ denote the ring of polynomials over integers, which is just another name for the set of polynomials with integer coefficients. Now, fixed a certain polynomial $q(x) \in \mathbb{Z}[x]$, we define the class of residues module $\mathbb{Z}[x] / (q(x))$ as the residual we get from dividing a polynomial in $\mathbb{Z}[x]$ by $q(x)$. More formally, associate with each $p \in \mathbb{Z}[x]$ the set $\bar{p} := \{p(x) + q(x) \cdot a(x); a(x) \in \mathbb{Z}[x]\}$, then: $\mathbb{Z}[x] / (q(x)) = \{ \bar{p}; p \in \mathbb{Z}[x]\}$.

This is just some formalism. here we are only concerned with two such rings: (i) $\mathbb{Z}[x] / (x^n - 1)$. This is the ring in which $x^n = 1$. This means that for every $p(x) = \sum_{i=0}^K p_i x^i \in \mathbb{Z}[x]$ we associate the polynomial $\bar{p} = \sum_{i=0}^K p_i x^{i \text{ mod } n} \in \mathbb{Z}[x] / (x^n - 1)$. (ii) $\mathbb{Z}[x] / (x^n + 1)$. This is the ring in which $x^n = -1$. This means that for every $p(x) = \sum_{i=0}^K p_i x^i \in \mathbb{Z}[x]$ we associate the polynomial $\bar{p} = \sum_{i=0}^K p_i (-1)^{\lfloor i / n \rfloor} x^{i \text{ mod } n} \in \mathbb{Z}[x] / (x^n - 1)$.

The great thing is that if instead of $\mathbb{C}$ we have $\mathbb{Z}[x] / (x^n + 1)$, then we have an element like $\omega$, which is $x^2$. Notice that $(x^2)^n = x^{2n} = (-1)^2 = 1$, and $\sum_{i=0}^n x^{2i} = 0$ since $x^{2(i + n/2)} = x^{2i} \cdot x^n = -x^{2i}$. This is essentially the main trick: we will do arithmetic on $\mathbb{Z}[x] / (x^n + 1)$, doing FFT and all... Now, comes the last piece of the puzzle: notice that if $\text{deg}(p) \cdot \text{deg}(q) < n[/latex], then multiplying $p \cdot q \in \mathbb{Z}[x]$ is equivalent to $\bar{p} \cdot \bar{q} \in \mathbb{Z}[x] / (x^n + 1)$, so we only need to multiply in the ring $\mathbb{Z}[x] / (x^n + 1)$. The idea to multiply in this ring is as follows: write $\bar{p} = \sum_{i=0}^n p_i x^i = \sum_{i=0}^{\sqrt{n}} (\sum_{j=0}^{\sqrt{n}} p_{j + \sqrt{n}i} x^j) x^{i \sqrt{n}}$. Notice that this maps to a polynomial with degree $\sqrt{n}$ taking values in $\mathbb{Z}[x] / (x^{\sqrt{n}} + 1)$. Now, the coefficients (which are in $\mathbb{Z}[x] / (x^{\sqrt{n}} + 1)$) have a suitable $\omega$ we can use and multiplying them can be done recursively by calling calling the multiplication for $\mathbb{Z}[x] / (x^{\sqrt{n}} + 1)$. Following this plan, a multiplication in $\mathbb{Z}[x] / (x^{n} + 1)$, generates $\sqrt{n}$ multiplications in $\mathbb{Z}[x] / (x^{\sqrt{n}} + 1)$, each of which generates $\sqrt{n}$ multiplications in $\mathbb{Z}[x] / (x^{\sqrt{\sqrt{n}}} + 1)$, and so on. Since taking square roots $\log \log (n)$ gets to $O(1)$, this gives an algorithm of complexity $O(n \log n \log \log n)$. In order to make this plan work, we need to care about various implementation details. The first step is to reduce the problem to multiplying in the ring $\mathbb{Z}[x] / (x^{n} + 1)$. To facilitate recursion we will make $n$ be a power of $2$ in the following way:
 function r = multiply_poly(p,q) n = length(p) + length(q) - 1; [p,k] = expand_to_power_of_2(p,n); [q,k] = expand_to_power_of_2(q,n); r = multiply_poly_in_Zxn(p,q,k); r = resize(r, n, 1); endfunction function [p,k] = expand_to_power_of_2(p, n) p = reshape(p, length(p), 1); plen = 1; k = 0; while (plen < n) plen = plen * 2; k = k + 1; endwhile p = resize(p, plen, 1); endfunction
Ok, now we need to provide an implementation for multiply_poly_in_Zxn(p,q,k)( which multiplies two polynomials in [latex]\mathbb{Z}[x] / (x^{n} + 1)$
where $n = 2^k$. The first step is to write each $p$ and $q$ as polynomials with coefficients in $\mathbb{Z}[x] / (x^{2K} + 1)$ for some $K = O(\sqrt{n})$. We will do as follows: let $L = 2^{\lfloor (k-1)/2\rfloor}$ and $K = 2^{\lceil (k-1)/2\rceil}$ and write: $p(x) = \sum_{i=0}^{2L-1} (\sum_{j=0}^{K-1} p_{Ki+j} x^j) x^{Ki}$ and let's think of $later \bar{p}_i = \sum_{j=0}^{K-1} p_{Ki+j} x^j$ as a polynomial in $\mathbb{Z}[x] / (x^{2K} + 1)$. Now, we need an element in $\mathbb{Z}[x] / (x^{2K} + 1)$ that will serve as a $2L$-root of the unit (since this is the degree of the polynomials we are trying to multiply). Notice that $x^{4K} = 1$ If $K = L$, then $\omega = x^2$ is a $2L$-root of the unit. If $K = 2L$, then $\omega = x^4$ is a $2L$-root of the unit. This makes us almost ready to apply FFT. There is only one catch left and this is very subtle.

A brief intermission: given $a \in \mathbb{Z}$, there is the evaluation map $E_a : \mathbb{Z}[x] \rightarrow \mathbb{Z}$ that maps $p(x) \mapsto p(a) \in \mathbb{Z}$, simply by evaluating the polynomial. Now, say we want to extend this map to $\mathbb{Z}[x]/(q(x))$. The elements of $\mathbb{Z}[x]/(q(x))$ are (infinite) sets of polynomial which are equivalent modulo $q(x)$. If they all the polynomials in the equivalence class $\bar{q}$ evaluate to the same, then this defines a very evaluation map. It is simple to see that this happens iff $a$ is a root of $q(x)$, i.e., $q(a) = 0$. In such a case, the evaluation map extends naturally to $\bar{E}_a : \mathbb{Z}[x]/(q(x)) \rightarrow \mathbb{Z}$. If $a$ is not a root, then there is no evaluation map $\bar{E}_a$ such that $E_a = \bar{E}_a \circ \pi$, where $\pi$ is the projection $\pi: \mathbb{Z}[x] \rightarrow \mathbb{Z}[x]/(q(x))$.

We are looking at polynomials as elements in $A[y] / (y^{2K} + 1)$ where $A = \mathbb{Z}[x] / (x^{2K} + 1)$. So each element is in fact a set of polynomials, i.e., all polynomials equivalent modulo $y^{2K} + 1$. If we want to evaluate them at a certain point $\omega$ we must make sure all polynomials evaluate the same. Here is how to do it: define $\psi = \sqrt{\omega}$, i.e., $\psi = x$ if $K = L$ and $\psi = x^2$ if $K = 2L$. Then the map $\bar{p}(y) \mapsto \bar{p}(\psi y)$ maps from $A[y] / (y^{2K} + 1)$ to $A[y] / (y^{2K} - 1)$, in which $\omega$ is a root. So, now we are fine. Notice that we can also map back using the inverse $\bar{p}(y) \mapsto \bar{p}(\psi^{-1} y) = \bar{p}((x^{4K} /\psi) y)$. This point is really subtle and it was the one that took me more time to understand. After we understand why this needs to be done, things get really easy: we just need to do FFT as before, in other words, evaluate in $1, \omega, \hdots, \omega^{2L-1}$ and then reconstruct using the inverse FFT. The code in Octave is below:

function r = multiply_poly_in_Zxn(p,q,k)
if k<=2
r=naive_multiply_poly(p,q);
return;
endif
L = 2^floor((k-1)/2); K = 2^(k-1) / L; psi = 2;
if K==L psi = 1; endif

% map Z[x] --> (Z_{2K})_{2L}
p = [ reshape(p, K, 2*L); zeros(K, 2*L) ];
q = [ reshape(q, K, 2*L); zeros(K, 2*L) ];
for i = 0 : 2*L - 1
% map Z_{2L}[y] / (y^k+1)  -->  Z_{2L}[y] / (y^k - 1)
% implemented via   p(y)  |-->  p(psi*y)
p(:,i+1) = multiply_mono(p(:,i+1), i*psi);
q(:,i+1) = multiply_mono(q(:,i+1), i*psi);
endfor
w_p = fft(p, 2*psi);
w_q = fft(q, 2*psi);
w_r = [];
for i = 1:2*L
w_r(:,i) = multiply_poly_in_Zxn(w_p(:,i), w_q(:,i),
(k-1)-floor((k-1)/2)+1);
endfor
r = fft_inv(w_r, 2*psi);
rr = [];
for i = 0 : 2*L - 1
% back to Z_{2L}[y] / (y^{2L} +1)
rr(:,i+1) = multiply_mono(r(:,i+1), i*(4*K-psi));
endfor
r = rr;
r1 = reshape(r(:,1:2:2*L), 2*K*L, 1);
r2 = reshape(r(:,2:2:2*L), 2*K*L, 1);
r2 = [-r2(2*K*L-K+1:2*K*L); r2(1:2*K*L-K)];
r = r1 + r2;
endfunction


The auxiliary methods are below:

function r = naive_multiply_poly(p,q)
n = length(p); r = zeros(n, 1);
for i=0:n-1 for j =0:n-1
r(1+mod(i+j,n)) = r(1+mod(i+j,n))
+ (-1)^(floor((i+j)/n)) * p(1+i) * q(1+j);
endfor endfor
endfunction

function r = multiply_mono(p, t)
n = length(p);
r = zeros(n, 1);
for j = 0:n-1
r(1+mod(j+t,n)) += p(j+1) * (-1)^(floor((j+t)/n));
endfor
endfunction

function w = fft_aux(p, omega)
n = length(p(1,:));
if (n == 1)
w = p;
return;
endif
p_even = p(:, 1:2:n); w_even = fft_aux(p_even, 2*omega);
p_odd =  p(:, 2:2:n); w_odd  = fft_aux(p_odd,  2*omega);
for j = 0 : n-1
w(:, j+1) = w_even(:, mod(j,n/2)+1)
+ multiply_mono(w_odd(:,mod(j,n/2)+1), j*omega);
endfor
endfunction

function w = fft(p, omega)
w = fft_aux(p, omega);
endfunction

function w = fft_inv(p, omega)
n = length(p(1,:));
w = fft_aux(p, (n-1)*omega) ./ n;
endfunction


Notice however, that the implementation provided above is not super efficient since it does data copies, memory allocation and recursive calls. The implementation above was optimized not for speed but for making the algorithm clear.