Extremal rays in families of inequalities (I)

(This is Yan Sheng.)

What does it mean to "properly understand" some mathematical result? For me, I find it the most satisfying when I can answer the following two questions:

What is the minimal set of special cases that I need to verify to prove it?
How could I have come up with it myself?

In this and the next blog post, I'll describe two different situations in olympiad inequalities that I've tried to understand better recently, by answering the two questions above.

Theorem (Popoviciu 1965): Let $f:[a,b]\to\mathbb R$ be a convex function. Then for any $x,y,z\in[a,b]$, we have$$\begin{align*}&f(x)+f(y)+f(z)+3f\left(\frac{x+y+z}3\right)\\&\ge2\left(f\left(\frac{x+y}2\right)+\left(\frac{y+z}2\right)+\left(\frac{z+x}2\right)\right).\end{align*}$$ What an interesting statement! It's not immediately clear how to prove it with Jensen's inequality, and it makes me wonder what other similar inequalities hold for convex functions. Let's try proving it by doing the least amount of work possible: is there a finite set of special cases we can check in order to verify the inequality? (The interested reader is invited to pause here to think about this.)

The key idea here is that instead of varying $x,y,z$, what we should be doing is to fix $x,y,z$, and to vary $f$ instead. In general, given $x_i,a_i\in\mathbb R$, we want a minimal set of conditions which would verify that the inequality$$\sum_ia_if(x_i)\ge0\qquad(*)$$holds for every convex function $f$.

Geometrically, the family of all convex functions $\mathcal C$ is an example of (what is confusingly called) a convex cone, because any nonnegative linear combination of convex functions is also convex; in other words, if $f_i$ are convex, then for any $a_i\ge0$ we have $\sum_ia_if_i$ is also convex. Hence if there is some "basis set" $\mathcal S\subseteq\mathcal C$ whose nonnegative linear combinations generate $\mathcal C$, then it suffices to check $(*)$ for $f\in\mathcal S$.

This suggests that we should be looking at the convex functions which aren't nonnegative linear combinations of other convex functions (except trivially in terms of multiples of itself). Geometrically, for any nonzero $f$, we write $\langle f\rangle$ for the ray $\{\lambda f\,:\,\lambda\ge0\}$; a ray $\langle f\rangle$ is called an extremal ray if any line segment in $\mathcal C$ which intersects $\langle f\rangle$ actually lies entirely inside $\langle f\rangle$.

Now we want to take $\mathcal S$ to be one representative from each extremal ray of $\mathcal C$. What are they?

One property that we know about convex functions $f$ is that (essentially) $f''\ge0$; notice that if $f''(x)=0$ at some point $x$, and $f=\sum_ia_if_i$ is a nonnegative linear combination of convex functions $f_i$, then $f_i''(x)=0$ for all $i$. Hence the extremal convex functions should have $f''(x)=0$ at as many points as possible. If $f''$ is identically 0 then $f$ is an affine linear function; otherwise if $f''$ is nonzero at a single point then $f$ is piecewise linear in two pieces, for example$$f_t(x)=(x-t)^+\coloneqq\max(0,x-t).$$It turns out that these indeed give us enough extremal functions to form $\mathcal S$:

Proposition: If $f:[a,b]\to\mathbb R$ is a convex function, then$$f(x)=f(a)+f'(a)(x-a)+\int_a^bf''(t)(x-t)^+\,dt.$$Proof sketch: Write the integral as$$\int_a^x\int_t^xf''(t)\,ds\,dt=\int_a^x\int_a^sf''(t)\,dt\,ds.\qquad\square$$Hence every convex function is the "nonnegative linear combination" of functions of the form $(x-t)^+$ and an affine linear function. (The quotation marks are there because we're summing infinitely many of them together, as an integral.)

Back to the inequality $(*)$, we notice that affine linear functions are both convex and concave, and that $(x-t)^+$ can be written in terms of $\lvert x-t\rvert$. Hence we have the minimal set of conditions as follows:

Theorem: Let $x_i,a_i\in\mathbb R$. Then the inequality$$\sum_ia_if(x_i)\ge0\qquad(*)$$holds for every convex function $f$ if and only if:

$\sum_ia_i=\sum_ia_ix_i=0$; and
$\sum_ia_i\lvert x_i-t\rvert\ge0$ for all $t\in\mathbb R$.

Proof sketch: For any convex function $f$, we have$$\begin{align*}f(x)&=f(a)+f'(a)(x-a)+\int_a^bf''(t)(x-t)^+\,dt\\ &=f(a)+f'(a)(x-a)+\int_a^bf''(t)\frac{\lvert x-t\rvert+(x-t)}2\,dt\\ &=A+Bx+\frac12\int_a^bf''(t)\lvert x-t\rvert\,dt.\end{align*}$$Now substitute into $(*)$, and note that every term is nonnegative by conditions 1 and 2. $\square$

The astute reader will realise that I've told many lies and half-truths in the previous few paragraphs. Here's a list:

In fact, $(x-t)^+$ is not actually an extremal convex function, since you can always add an affine linear function to it; we should really have been working modulo addition by affine linear functions.
The fact that convex cones are generated by their extremal rays is false in general, see this math.SE answer. But it worked out in this case, so I guess all's well that ends well...?
$f''$ doesn't really exist everywhere, but if you say the magic words "distributional derivative" and wave the magic wand of integration by parts twice (i.e., $\int f''g=\int fg''+\text{boundary terms}$) then you're good to go.

Back to Popoviciu's inequality:$$\begin{align*}&f(x)+f(y)+f(z)+3f\left(\frac{x+y+z}3\right)\\&\ge2\left(f\left(\frac{x+y}2\right)+\left(\frac{y+z}2\right)+\left(\frac{z+x}2\right)\right).\end{align*}$$The first condition of our theorem is easy to check. The second condition, when we absorb $t$ into the variables $x,y,z$, states that$$\lvert x\rvert+\lvert y\rvert+\lvert z\rvert+\lvert x+y+z\rvert\ge\lvert x+y\rvert+\lvert y+z\rvert+\lvert z+x\rvert,$$which is known as Hlawka's inequality. (Here are some proofs from Cut The Knot.)

So there is a very real sense in which Popoviciu's inequality holds because of Hlawka's inequality, which is just a special case of Popoviciu. More generally, our theorem says roughly that every inequality about absolute functions of linear functions in real variables implies an inequality about convex functions.

Exercises:

Prove that in the second condition in the theorem, it suffices to take $t\in\{x_1,\ldots,x_n\}$. (This fulfils the promise of "finitely many checks" at the start of this post, in a sense.)
Let's prove Hlawka's inequality by checking finitely many cases. We may assume that $x\le y\le\frac{x+y+z}3\le z$; putting the variable $t$ back into the inequality, consider the function$$\begin{align*}\varphi(t)&=\lvert x-t\rvert+\lvert y-t\rvert+\lvert z-t\rvert+3\left\lvert\frac{x+y+z}3-t\right\rvert\\&\quad{}-2\left\lvert\frac{x+y}2-t\right\rvert-2\left\lvert\frac{y+z}2-t\right\rvert-2\left\lvert\frac{z+x}2-t\right\rvert.\end{align*}$$Which values of $t$ do we need to check to ensure that $\varphi(t)\ge0$ for all $t$ (and in particular $t=0$)? Hint: $\varphi$ is piecewise linear.
Using Hlawka's inequality, prove the weighted version of Popoviciu's inequality: if $\alpha,\beta,\gamma>0$ and $\alpha+\beta+\gamma=1$, then for all convex $f$ we have$$\begin{align*}&\alpha f(x)+\beta f(y)+\gamma f(z)+f(\alpha x+\beta y+\gamma z)\\&\ge(\alpha+\beta)f\left(\frac{\alpha x+\beta y}{\alpha+\beta}\right)+(\beta+\gamma)f\left(\frac{\beta y+\gamma z}{\beta+\gamma}\right)+(\gamma+\alpha)f\left(\frac{\gamma z+\alpha x}{\gamma+\alpha}\right).\end{align*}$$
What does the triangle inequality $\lvert x\rvert+\lvert y\rvert\ge\lvert x+y\rvert$ translate to for convex functions? (Also derive the weighted version, like in the previous exercise.)
You might expect that Hlawka's inequality generalises to more variables, where the terms are like inclusion-exclusion:$$\sum_{S\subseteq{1,\ldots,n}}(-1)^{\lvert S\rvert-1}\left\lvert\sum_{i\in S}x_i\right\rvert\ge0.$$Unfortunately, this doesn't work; find a counterexample for $n=4$.
Generalise Hlawka's inequality to $n$ variables (in a different way from the previous exercise, which doesn't work), and deduce the corresponding inequality for convex functions. Check against Popoviciu's original paper.
Given $x_i,a_i\in\mathbb R$, derive a minimal set of conditions which would verify that the inequality$$\sum_ia_if(x_i)\ge0\qquad(*)$$holds for every increasing function $f$. Hence prove Schur's inequality: for all $x,y,z\ge0$ and $r>0$, we have$$x^r(x-y)(x-z)+y^r(y-x)(y-z)+z^r(z-x)(z-y)\ge0.$$
In the spirit of proving inequalities from special cases: Suppose that $\lambda_i\in\mathbb R$, and $\ell_i$ are linear polynomials with constant coefficient 0, such that for all $x_1,\ldots,x_n\in\mathbb R$ we have$$\sum_i\lambda_i\lvert \ell_i(x_1,\ldots,x_n)\rvert\ge0.$$Prove that the same inequality holds for all $x_1,\ldots,x_n\in\mathbb R^d$. (For example, Hlawka's inequality holds in any finite dimensional vector space.)

Search This Blog

SIMO Retiree Blog

Extremal rays in families of inequalities (I)

Comments

Post a Comment

Popular posts from this blog

SMO Open 2024 ??% Speedrun

Spacetime, Special Relativity, and a Lot of Circles

Musings about the IMO