The problem is really simple: given two polynomials: and with integer coefficients, how to compute the product . The product is a polynomial of degree where the coefficient . The definition suggests an algorithm of complexity where .

Surprisingly, one can do it a lot faster: in fact, one can calculate the coefficients of in time if we are allowed to use complex numbers. This is one of those “a-ha” algorithms: the idea is very simple but incredibly ingenious — and a proof that surprising algorithm do exist. The algorithm is based on the following idea: if we know the value of a polynomial of degree on points, then we can reconstruct it exactly. So, if one can evaluate polynomials and on points, say then we can calculate and reconstruct . Notice that once we calculate , obtaining is takes only operations.

So far, we only shifter the complexity around: while the actual multiplication takes we still need to evaluate the polynomials times and then reconstruct from the values . The idea is to choose values carefully in such a way that we can evaluate those polynomials really fast.

Enter complex numbers ! A choice that works is to take where , i.e., -th root of the unit. And here is the reason why we can multiply polynomials so fast:

In other words, if we see and as vectors of dimension , then we can write them as where is a matrix with entries for . Now, we use the power of recursion. Assuming is a power of to simplify things, we first note that:

since . So, if we define and , then we see that we can compute in linear time the coefficients of from the coefficients of and .

So, the problem of computing the coefficients of corresponds to computing two problems of half the size plus some computation. This gives a total time of . The algorithm works as follows:

% assumes length of p is a power of 2 function w = fft(p) n = length(p); omega = exp(2 * pi * i * sgn / n); if (n == 1) w = p; return; endif p_even = p(1:2:n); w_even = fft(p_even); p_odd = p(2:2:n); w_odd = fft(p_odd); for j = 0 : n-1 w(j+1) = w_even(mod(j,n/2)+1) + omega^j * w_odd(mod(j,n/2)+1); endfor endfunction

We call the method fft since this is also known as the Fast Fourier Transform. We are almost there: we know how to compute and therefore . We only need to get the coefficients of . Notice that , so we need only to solve this linear system.

How to do it ? Another “a-ha”: the matrix for . The reason for that is that: . If , then this is clearly . If , this zero, since this is the sum of all roots of the unit. Therefore, we can calculate and its inverse in a very easy way:

function w = fft_aux(p, sgn) n = length(p); omega = exp(2 * pi * i * sgn / n); if (n == 1) w = p; return; endif p_even = p(1:2:n); w_even = fft_aux(p_even, sgn); p_odd = p(2:2:n); w_odd = fft_aux(p_odd, sgn); for j = 0 : n-1 w(j+1) = w_even(mod(j,n/2)+1) + omega^j * w_odd(mod(j,n/2)+1); endfor endfunction function w = fft(p) w = fft_aux(p, +1); endfunction function w = fft_inv(p) w = fft_aux(p,-1) ./ length(p); endfunction

Now, we have all the pieces to multiply to polynomials:

function r = multiply_poly(p, q) n = length(p) + length(q) - 1; p = expand_to_power_of_2(p, n); q = expand_to_power_of_2(q, n); r = fft_inv(fft(p) .* fft(q)); r = resize(r, 1, n); endfunction % pads p with zeros until it has length equal to % the first power of 2 greater of equal n function p = expand_to_power_of_2(p, n) plen = 1; while (plen < n) plen = plen * 2; endwhile p = resize(p, 1, plen); endfunction

And now we are done ! We multiplied two polynomials in time . Even though our initial objective was to multiply two polynomials with integer coefficients, we had to leave the ring of integers, go to the field of complex numbers, multiply things over there and then come back to the ring of integers. Those things happen all the time in algebra, that the answer for a question involves leaving the space in which a question is asked, solving the question in a higher space and coming back in the end. But here we ask: can we multiply two polynomials doing only operations over the ring of integers ?

Schonhage and Strassen give an algorithm that runs in time that does exactly that (this later improved here and here). The idea is to try to mimic the FFT-algorithm without using complex numbers. What we needed was an element with three properties: (i) , (ii) and (iii) are distinct elements.

The idea of the algorithm is to work with some sort of symbolic that has the properties listed above. Let's begin by fixing some notation, let denote the ring of polynomials over integers, which is just another name for the set of polynomials with integer coefficients. Now, fixed a certain polynomial , we define the class of residues module as the residual we get from dividing a polynomial in by . More formally, associate with each the set , then: .

This is just some formalism. here we are only concerned with two such rings: (i) . This is the ring in which . This means that for every we associate the polynomial . (ii) . This is the ring in which . This means that for every we associate the polynomial .

The great thing is that if instead of we have , then we have an element like , which is . Notice that , and since . This is essentially the main trick: we will do arithmetic on , doing FFT and all... Now, comes the last piece of the puzzle: notice that if where . The first step is to write each and as polynomials with coefficients in for some . We will do as follows: let and and write: and let's think of $later \bar{p}_i = \sum_{j=0}^{K-1} p_{Ki+j} x^j$ as a polynomial in . Now, we need an element in that will serve as a -root of the unit (since this is the degree of the polynomials we are trying to multiply). Notice that If , then is a -root of the unit. If , then is a -root of the unit. This makes us almost ready to apply FFT. There is only one catch left and this is very subtle.

A brief intermission: given , there is the evaluation map that maps , simply by evaluating the polynomial. Now, say we want to extend this map to . The elements of are (infinite) sets of polynomial which are equivalent modulo . If they all the polynomials in the equivalence class evaluate to the same, then this defines a very evaluation map. It is simple to see that this happens iff is a root of , i.e., . In such a case, the evaluation map extends naturally to . If is not a root, then there is no evaluation map such that , where is the projection .

We are looking at polynomials as elements in where . So each element is in fact a set of polynomials, i.e., all polynomials equivalent modulo . If we want to evaluate them at a certain point we must make sure all polynomials evaluate the same. Here is how to do it: define , i.e., if and if . Then the map maps from to , in which is a root. So, now we are fine. Notice that we can also map back using the inverse . This point is really subtle and it was the one that took me more time to understand. After we understand why this needs to be done, things get really easy: we just need to do FFT as before, in other words, evaluate in and then reconstruct using the inverse FFT. The code in Octave is below:

function r = multiply_poly_in_Zxn(p,q,k) if k<=2 r=naive_multiply_poly(p,q); return; endif L = 2^floor((k-1)/2); K = 2^(k-1) / L; psi = 2; if K==L psi = 1; endif % map Z[x] --> (Z_{2K})_{2L} p = [ reshape(p, K, 2*L); zeros(K, 2*L) ]; q = [ reshape(q, K, 2*L); zeros(K, 2*L) ]; for i = 0 : 2*L - 1 % map Z_{2L}[y] / (y^k+1) --> Z_{2L}[y] / (y^k - 1) % implemented via p(y) |--> p(psi*y) p(:,i+1) = multiply_mono(p(:,i+1), i*psi); q(:,i+1) = multiply_mono(q(:,i+1), i*psi); endfor w_p = fft(p, 2*psi); w_q = fft(q, 2*psi); w_r = []; for i = 1:2*L w_r(:,i) = multiply_poly_in_Zxn(w_p(:,i), w_q(:,i), (k-1)-floor((k-1)/2)+1); endfor r = fft_inv(w_r, 2*psi); rr = []; for i = 0 : 2*L - 1 % back to Z_{2L}[y] / (y^{2L} +1) rr(:,i+1) = multiply_mono(r(:,i+1), i*(4*K-psi)); endfor r = rr; r1 = reshape(r(:,1:2:2*L), 2*K*L, 1); r2 = reshape(r(:,2:2:2*L), 2*K*L, 1); r2 = [-r2(2*K*L-K+1:2*K*L); r2(1:2*K*L-K)]; r = r1 + r2; endfunction

The auxiliary methods are below:

function r = naive_multiply_poly(p,q) n = length(p); r = zeros(n, 1); for i=0:n-1 for j =0:n-1 r(1+mod(i+j,n)) = r(1+mod(i+j,n)) + (-1)^(floor((i+j)/n)) * p(1+i) * q(1+j); endfor endfor endfunction function r = multiply_mono(p, t) n = length(p); r = zeros(n, 1); for j = 0:n-1 r(1+mod(j+t,n)) += p(j+1) * (-1)^(floor((j+t)/n)); endfor endfunction function w = fft_aux(p, omega) n = length(p(1,:)); if (n == 1) w = p; return; endif p_even = p(:, 1:2:n); w_even = fft_aux(p_even, 2*omega); p_odd = p(:, 2:2:n); w_odd = fft_aux(p_odd, 2*omega); for j = 0 : n-1 w(:, j+1) = w_even(:, mod(j,n/2)+1) + multiply_mono(w_odd(:,mod(j,n/2)+1), j*omega); endfor endfunction function w = fft(p, omega) w = fft_aux(p, omega); endfunction function w = fft_inv(p, omega) n = length(p(1,:)); w = fft_aux(p, (n-1)*omega) ./ n; endfunction

Notice however, that the implementation provided above is not super efficient since it does data copies, memory allocation and recursive calls. The implementation above was optimized not for speed but for making the algorithm clear.