The combinatorial Nullstellensatz

I haven’t done a proper math blog post for awhile, and I’ve been meaning to learn more about the polynomial method, so I thought I’d sit down and get to grips with the “combinatorial Nullstellensatz” (see Alon (1999)). This was prompted when I realised that there were a couple of exercises in of Knuth’s books (namely Volume 2 and Volume 4, Fascicle 5) that concern the Nullstellensatz. (I’m working off a preliminary draft of Mathematical Preliminaries Redux (MPR), which was posted to Knuth’s website before the book came out; it is no longer there because the book is now out! I do not know if the exercises (and numbering) in my draft correspond to the exercises in the actual book.) I must confess to having seen the proof of the Nullstellensatz before in Tao and Vu’s Additive Combinatorics which I read (read: skimmed) about three months ago, but I have forgotten it and so this post will see me trying to reconstruct it, with some hints from Knuth.

Update. I have also made a video on this topic. It follows the outline of this blog post pretty closely, but might be more enjoyable for people who prefer listening over reading.

Proof of the theorem

Here is the statement we’re trying to prove (exercise 114 of MPR):

Theorem (The combinatorial Nullstellensatz). Let $k$ be a field and let $f\in k[x_1,\ldots,x_n]$ be a polynomial such that the coefficient of $x_1^{d_1}\cdots x_n^{d_n}$ is nonzero and each term has degree at most $d_1+\cdots +d_n$. If $S_1, \ldots, S_n$ are subsets of $k$ with $|S_j|>d_j$ for every $1\leq j\leq n$ and we pick $X_1,\ldots,X_n$ uniformly (and independently) at random respectively from $S_1,\ldots,S_n$, then

${\bf P}\big\{ f(X_1,\ldots,X_n) \neq0\big\} \geq {(|S_1|-d_1) + \cdots + (|S_n|-d_n) - n+1\over|S_1|\cdots|S_n|}.$

This probability is nonzero by our assumption on the sizes of the $S_j$ and indeed, most applications of the Nullstellensatz simply use the existence of a single non-root of $f$ in $S_1\times\cdots\times S_n$. As a hint, Knuth says to consult exercise 4.6.1.-16, which is found in Volume 2. Reformulated in probabilistic notation to match the notation used in the statement of the Nullstellensatz above, here is its statement and proof:

Lemma. Let $k$ be a field and fix a nonzero multivariate polynomial $f\in k[x_1,\ldots x_n]$. Suppose that the degree of $f$ in the variable $x_j$ is at most $d_j$ and let $S_1,\ldots,S_n$ be subsets of $k$ with $|S_j|\geq d_j$ for all $j$. Then if we sample $X_j$ uniformly (and independently) at random from each $S_j$,

${\bf P}\big\{ f(X_1, \ldots, X_n) \neq 0\big\} \geq {\prod_{j=1}^n (|S_j| - d_j)\over |S_1|\cdots|S_n|}.$

Proof. We induce on $n$. When $n = 1$, a polynomial of degree $d_1$ can have at most $d_1$ roots, so there are at least $|S_1| - d_1$ non-roots in the set $S_1$. Thus the probability of choosing a non-root is at least $(|S_1| - d_1)/|S_1|$. Now let $n>1$ and suppose the lemma holds for $n-1$. Sample $X_1, \ldots, X_{n-1}$ and note that if we let $g(x) = f(X_1,\ldots,X_{n-1},x)$, then there exist polynomials $f_0,\ldots,f_{d_n}\in k[x_1,\ldots,x_{n-1}]$ such that

$g(x) = f_0(X_1,\ldots,X_{n-1}) + x f_1(X_1,\ldots,X_{n-1}) + \cdots + {x}^{d^n} f_{d_n}(X_1,\ldots,X_{n-1}),$

with some $i$ such that $f_i$ not identically zero. By the induction hypothesis, we have

${\bf P}\big\{ f_i(X_1, \ldots, X_{n-1}) \neq 0\big\}\geq{\prod_{j=1}^{n-1} (|S_j|-d_j)\over|S_1|\cdots|S_{n-1}|},$

meaning that $g(x)$ is a nonzero univariate polynomial with degree at most $d_n$. Thus it has at most $d_n$ roots and sampling $X_n$ uniformly from $S_n$, we conclude that

${\bf P}\big\{g(x_n)\neq 0\big\} \geq {\bf P}\big\{ f_i(X_1, \ldots, X_{n-1})\big\} {|S_n| - d_n\over |S_n|} \geq {\prod_{j=1}^n (|S_j| - d_j)\over |S_1|\cdots|S_n|}.$

This completes the induction and proves the lemma.  ∎

The lemma above is a variant of the Schwartz-Zippel lemma. (Knuth cites DeMillo and Lipton, and indeed, Wikipedia says that this lemma was independently discovered around the same time by Schwartz, Zippel, and DeMillo-Lipton; I have included all three references at the bottom of the post.) It quantifies the intuitive fact that with $f$ fixed, as the sizes of the $S_j$ increase, the probability of finding a root of $f$ when picking vectors uniformly at random from $\prod_j S_j$ goes to 0. With this lemma in hand, we can now prove the Nullstellensatz.

Proof of the Nullstellensatz assuming the lemma. The main trouble is that the degree of $f$ in the variable $x_j$ is not $\leq d_j$; it is only guaranteed to be $\leq d_1 + \cdots + d_n$. However, note that the polynomial $f_j(x) = \prod_{s\in S_j} (x-s)$, is of degree $|S_j|$ and is zero on all $s\in S_j$. So modifying $f$ by replacing ${x_j}^{|S_j|}$ by ${x_j}^{|S_j|} - f_j(x_j)$ everywhere does not change its value on the set $S_j$. After doing this for all $j$, the value of $f$ on $S_1\times\cdots\times S_n$ has not changed, but the degree of $f$ in each variable $j$ is at most $|S_j| - 1$. Applying the lemma gives

${\bf P}\big\{ f(X_1, \ldots, X_n) \neq0\big\} \geq {\prod_{j=1}^n (|S_j| - |S_j|+1)\over |S_1|\cdots|S_n|} = {1\over|S_1|\cdots|S_n|}$

and proves the theorem in the case that $|S_j|= d_j+1$ for all $j$. (It also proves existence of a non-root in the general case.)

We need to increase this probability to match the theorem statement in the general case, where $|S_j| > d_j+1$ for at least some $j$. From what we have already shown, we know that taking subsets $S_j’\subseteq S_j$ of size $|S_j| = d_j+1$ for all $j$, there is a non-root of $f$ in $S_1’\times\cdots\times S_n’$. We construct a set $A$ of non-roots as follows. Pick any $S_1’\times\cdots\times S_n’$ with $|S_j’| = d_j+1$ for every $j$. There is a non-root of $f$ in $S_1’\times\cdots\times S_n’$, call it $(a_1,\ldots,a_n)$, so we add this non-root to $A$ and then replace $a_1$ with a new element of $S_1\setminus S_1’$. We can do this this $|S_1| - d_1 - 1$ times for the first coordinate, and, in general, $|S_j|-d_j - 1$ times for the $j$th coordinate. After running out of new elements, we have placed $(|S_1| - d_1) + \cdots + (|S_n| - d_n)-n$ distinct non-roots in $S_1\times \cdots\times S_n$ into $A$, and then we add one more in the final iteration of the algorithm. The size of $A$ is enough to give us a lower bound for the probability of choosing a non-root.  ∎

Subgraphs of multigraphs

The next two exercises in MPR deal with applications of the Nullstellensatz. The first is an easy problem about covering a discrete grid with lines, which we’ll skip here. The second is a bit more complicated:

Proposition. Let $p$ be a prime. Any multigraph on $n$ vertices with more than $(p-1)n$ edges contains a nonempty subgraph in which the degree of every vertex is a multiple of $p$.

Proof. Let $G$ be a multigraph satisfying the hypotheses. We construct a polynomial $f$ over the finite field on $p$ elements with a variable $x_e$ for every edge $e$ in the multigraph. For each vertex $v$ in the multigraph, form a polynomial $f_v = \sum_{v\in e} x_e$, where $x_e$ is represented twice if $e$ is a loop from $v$ to itself. Note that $f_v$ has degree no more than 1 in each variable. Let $x$ be a vector indexed by $e\in E(G)$.

$f(x) = \prod_{v\in V(G)} (1-{f_v}(x)^{p-1}) - \prod_{e\in E(G)} (1-x_e).$

The first term has degree $(p-1)n < |E(G)|$ and the second term has degree $|E(G)|$, so if we let $d_e =1$ for every edge $e$, the whole polynomial has degree $|E(G)| = \sum_{e\in E} d_e$. The coefficient of $\prod_{e\in E(G)} x_e$ is $(-1)^{|E(G)|}$, which is nonzero, so letting $S_e = \{0,1\}$ for all $e$, by the combinatorial Nullstellensatz there is a subset $E’$ of $E(G)$ such that setting $x_e = 1$ for all $e\in E’$ and $x_e = 0$ otherwise, we have $f(x)\neq 0$. Let $G’$ be the subgraph of $G$ defined by taking the same vertex set and the subset $E’$ of edges.

Note that $f_v(x)$ is the degree of the vertex $v$ in the subgraph $G’$, modulo $p$. Picking none of the edges cannot be a solution, since then we would have $f(x) = 1 - 1 = 0$. Picking even one edge causes the second term to be zero, and by Fermat’s little theorem, the first term is nonzero only when $f_v(x) = 0$ for all $v$, meaning that the degree of each vertex $v$ in the nonempty subgraph is a multiple of $p$.  ∎

This exercise is related to a result of Alon, Friedland, and Kalai (1984). Next we will present two classical theorems that can easily be proven using the Nullstellensatz; in fact, Alon included both as examples in his original 1999 paper.

The Cauchy-Davenport theorem

This theorem about sumsets is rather oddly named because A. Cauchy died in 1857 and H. Davenport was only born in 1907. Given two subsets $A$ and $B$ of the finite field with $p$ elements (for a prime $p$), the sumset $A+B$ is the set of all elements of the form $a+b$, where $a\in A$ and $b\in B$. The Cauchy-Davenport theorem gives a lower bound on the size of $A+B$:

Theorem (Cauchy-Davenport). Let $k={\bf F}_p$ be a finite field of prime order. If $A$ and $B$ are two subsets of $k$, then

$|A+B| \geq \min\{p, |A|+|B|-1\}.$

Proof. If $|A|+|B|>p$, then for every $s\in k$, the size of the set $s-B = \{s - b: b\in B\}$ has the same size as $B$, so it must intersect $A$ in some element. This means that $s$ can be written as $a+b$ for some $a\in A$ and $b\in B$; that is, $A+B = F$ and has size $p$. Now assume $|A|+|B|\leq p$ and let $C$ be any subset of $k$ with size $|C| = |A|+|B|-2$. Let $f$ be the bivariate polynomial

$f(x_1,x_2) = \prod_{c\in C} (x_1+x_2-c).$

Let $d_1 = |A|-1$, $d_2 = |B|-1$, and note that the coefficient of ${x_1}^{d_1}{x_2}^{d_2}$ is, by the binomial theorem, ${|A|+|B|-2\choose |A|-1}$, and this is nonzero modulo $p$, since $|A|+|B|-2 < p$. Taking $S_1 = A$ and $S_2 = B$ in the Nullstellensatz, we find that there must be some $a\in A$ and $b\in B$ such that

$f(a,b) = \prod_{c\in C} (a+b-c) \neq 0,$

implying that $a+b\notin C$. Since $C$ was an arbitrary set of size $|A|+|B|-2$ and we found that $A+B\not\subseteq C$, this means that $|A+B|\geq |A|-|B|-1$.  ∎

Chevalley’s theorem

This theorem was proved by C. Chevalley in 1935, and strengthened the same year by E. Warning. The strong version states that given a set of polynomials $\{f_i\}$ over a finite field $k$, if the number of variables is sufficiently large in comparison to the degrees of the polynomials, the set of points $A$ on which every $f_i$ is zero must be a multiple of ${\rm char}\,k$. Chevalley’s original formulation simply stated that if the set $A$ is nonempty, then it must have at least two elements (which follows from the stronger version, since for a finite field $k$, ${\rm char}\,k\geq 2$). This is the version of the theorem we will state and prove.

Theorem (Chevalley). Let $k = {\bf F}_q$ be the finite field on $q$ elements, where $q$ is a prime power, and let $f_1, \ldots, f_m$ be polynomials in $k[x_1,\ldots,x_n]$. If $n > \deg f_1 + \cdots + \deg f_m$ and there exists a point $a = (a_1,\ldots,a_n) \in k^n$ such that $f_i(a) = 0$ for all $f_i$, then there is another point $a’\neq a$ such that $f_i(a’) = 0$ for all $f_i$.

Proof. Let

$g(x_1, \ldots,x_n) = \prod_{i=1}^m (1-f_i(x_1,\ldots,x_n)^{q-1})\quad\hbox{and}\quad h(x_1,\ldots,x_n) = \prod_{i=1}^n\prod_{s\in k\setminus{a_i}} (x_i - s),$

and let

$f(x_1, \ldots,x_n) = g(x_1,\ldots,x_n) - \lambda h(x_1,\ldots,x_n),$

where $\lambda\in k$ is a scalar chosen such that $f(a) = 0$. Since $g(a) = 1$ and and $h(a)\neq 0$, $\lambda\neq 0$ is exactly $h(a)^{-1}$. The total degree of $f$ is $(q-1)n$, since $h$ has exactly this degree and $g$ has degree

$(q-1)(\deg f_1 + \cdots + \deg f_m) < (q-1)n.$

Let $d_i = q-1$ for all $1\leq i\leq n$ and note that the coefficient of $\prod_{i=1}^n {x_i}^{d_i}$ is $-\lambda\neq 0$. By the Nullstellensatz with $S_i$ set to the entire field for all $i$, we find that there must be a point $a’ = (a_1’,\ldots, a_n’)$ with $f(a’)\neq 0$. Of course, $a’$ cannot equal $a$ since $f(a)=0$. But since there exists $i$ for which $a_i’ \neq a_i$, we must have $h(a’) =0$. So $g(a’)\neq 0$, meaning that $f_i(a’)$ is zero for all $i$, since any nonzero element of $k$ equals 1 when taken to the $(q-1)$st power.  ∎

• Noga Alon, “Combinatorial Nullstellensatz”, Combinatorics, Probability, and Computing 8 (1999), 7-29.
• Noga Alon, Shmuel Friedland, and Gil Kalai, “Regular subgraphs of almost regular graphs,” Journal of Combinatorial Theory, Series B 37 (1984), 79-91.
• Richard A. DeMillo and Richard J. Lipton, “A probabilistic remark on algebraic program testing,” Information Processing Letters 7 (1978), 193-195.
• Donald E. Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms, (Addison-Wesley, 1969).
• Donald E. Knuth, The Art of Computer Programming, Volume 4, Fascicle 5: Mathematical Preliminaries Redux; Introduction to Backtracking; Dancing Links, (Addison-Wesley, 2019).
• Jack Schwartz, “Fast probabilistic algorithms for verification of polynomial identities,” Journal of the ACM (1980), 701-717.
• Terence Tao and Van Ha Vu, Additive Combinatorics (Cambridge University Press, 2006).
• Richard Zippel, “Probabilistic algorithms for sparse polynomials,” Lecture Notes in Computer Science 72 (1979), 216-226.