Extremal rays in families of inequalities (I)

(This is Yan Sheng.)

What does it mean to "properly understand" some mathematical result? For me, I find it the most satisfying when I can answer the following two questions:

  1. What is the minimal set of special cases that I need to verify to prove it?
  2. How could I have come up with it myself?

In this and the next blog post, I'll describe two different situations in olympiad inequalities that I've tried to understand better recently, by answering the two questions above.


Theorem (Popoviciu 1965): Let f:[a,b]Rf:[a,b]\to\mathbb R be a convex function. Then for any x,y,z[a,b]x,y,z\in[a,b], we havef(x)+f(y)+f(z)+3f(x+y+z3)2(f(x+y2)+(y+z2)+(z+x2)).\begin{align*}&f(x)+f(y)+f(z)+3f\left(\frac{x+y+z}3\right)\\&\ge2\left(f\left(\frac{x+y}2\right)+\left(\frac{y+z}2\right)+\left(\frac{z+x}2\right)\right).\end{align*} What an interesting statement! It's not immediately clear how to prove it with Jensen's inequality, and it makes me wonder what other similar inequalities hold for convex functions. Let's try proving it by doing the least amount of work possible: is there a finite set of special cases we can check in order to verify the inequality? (The interested reader is invited to pause here to think about this.)


The key idea here is that instead of varying x,y,zx,y,z, what we should be doing is to fix x,y,zx,y,z, and to vary ff instead. In general, given xi,aiRx_i,a_i\in\mathbb R, we want a minimal set of conditions which would verify that the inequalityiaif(xi)0()\sum_ia_if(x_i)\ge0\qquad(*)holds for every convex function ff.

Geometrically, the family of all convex functions C\mathcal C is an example of (what is confusingly called) a convex cone, because any nonnegative linear combination of convex functions is also convex; in other words, if fif_i are convex, then for any ai0a_i\ge0 we have iaifi\sum_ia_if_i is also convex. Hence if there is some "basis set" SC\mathcal S\subseteq\mathcal C whose nonnegative linear combinations generate C\mathcal C, then it suffices to check ()(*) for fSf\in\mathcal S.

This suggests that we should be looking at the convex functions which aren't nonnegative linear combinations of other convex functions (except trivially in terms of multiples of itself). Geometrically, for any nonzero ff, we write f\langle f\rangle for the ray {λf:λ0}\{\lambda f\,:\,\lambda\ge0\}; a ray f\langle f\rangle is called an extremal ray if any line segment in C\mathcal C which intersects f\langle f\rangle actually lies entirely inside f\langle f\rangle.

Now we want to take S\mathcal S to be one representative from each extremal ray of C\mathcal C. What are they?

One property that we know about convex functions ff is that (essentially) f0f''\ge0; notice that if f(x)=0f''(x)=0 at some point xx, and f=iaifif=\sum_ia_if_i is a nonnegative linear combination of convex functions fif_i, then fi(x)=0f_i''(x)=0 for all ii. Hence the extremal convex functions should have f(x)=0f''(x)=0 at as many points as possible. If ff'' is identically 0 then ff is an affine linear function; otherwise if ff'' is nonzero at a single point then ff is piecewise linear in two pieces, for exampleft(x)=(xt)+max(0,xt).f_t(x)=(x-t)^+\coloneqq\max(0,x-t).It turns out that these indeed give us enough extremal functions to form S\mathcal S:

Proposition: If f:[a,b]Rf:[a,b]\to\mathbb R is a convex function, thenf(x)=f(a)+f(a)(xa)+abf(t)(xt)+dt.f(x)=f(a)+f'(a)(x-a)+\int_a^bf''(t)(x-t)^+\,dt.Proof sketch: Write the integral asaxtxf(t)dsdt=axasf(t)dtds.\int_a^x\int_t^xf''(t)\,ds\,dt=\int_a^x\int_a^sf''(t)\,dt\,ds.\qquad\squareHence every convex function is the "nonnegative linear combination" of functions of the form (xt)+(x-t)^+ and an affine linear function. (The quotation marks are there because we're summing infinitely many of them together, as an integral.)

Back to the inequality ()(*), we notice that affine linear functions are both convex and concave, and that (xt)+(x-t)^+ can be written in terms of xt\lvert x-t\rvert. Hence we have the minimal set of conditions as follows:

Theorem: Let xi,aiRx_i,a_i\in\mathbb R. Then the inequalityiaif(xi)0()\sum_ia_if(x_i)\ge0\qquad(*)holds for every convex function ff if and only if:

  1. iai=iaixi=0\sum_ia_i=\sum_ia_ix_i=0; and
  2. iaixit0\sum_ia_i\lvert x_i-t\rvert\ge0 for all tRt\in\mathbb R.

Proof sketch: For any convex function ff, we havef(x)=f(a)+f(a)(xa)+abf(t)(xt)+dt=f(a)+f(a)(xa)+abf(t)xt+(xt)2dt=A+Bx+12abf(t)xtdt.\begin{align*}f(x)&=f(a)+f'(a)(x-a)+\int_a^bf''(t)(x-t)^+\,dt\\ &=f(a)+f'(a)(x-a)+\int_a^bf''(t)\frac{\lvert x-t\rvert+(x-t)}2\,dt\\ &=A+Bx+\frac12\int_a^bf''(t)\lvert x-t\rvert\,dt.\end{align*}Now substitute into ()(*), and note that every term is nonnegative by conditions 1 and 2. \square


The astute reader will realise that I've told many lies and half-truths in the previous few paragraphs. Here's a list:

  • In fact, (xt)+(x-t)^+ is not actually an extremal convex function, since you can always add an affine linear function to it; we should really have been working modulo addition by affine linear functions.
  • The fact that convex cones are generated by their extremal rays is false in general, see this math.SE answer. But it worked out in this case, so I guess all's well that ends well...?
  • ff'' doesn't really exist everywhere, but if you say the magic words "distributional derivative" and wave the magic wand of integration by parts twice (i.e., fg=fg+boundary terms\int f''g=\int fg''+\text{boundary terms}) then you're good to go.

Back to Popoviciu's inequality:f(x)+f(y)+f(z)+3f(x+y+z3)2(f(x+y2)+(y+z2)+(z+x2)).\begin{align*}&f(x)+f(y)+f(z)+3f\left(\frac{x+y+z}3\right)\\&\ge2\left(f\left(\frac{x+y}2\right)+\left(\frac{y+z}2\right)+\left(\frac{z+x}2\right)\right).\end{align*}The first condition of our theorem is easy to check. The second condition, when we absorb tt into the variables x,y,zx,y,z, states thatx+y+z+x+y+zx+y+y+z+z+x,\lvert x\rvert+\lvert y\rvert+\lvert z\rvert+\lvert x+y+z\rvert\ge\lvert x+y\rvert+\lvert y+z\rvert+\lvert z+x\rvert,which is known as Hlawka's inequality. (Here are some proofs from Cut The Knot.)

So there is a very real sense in which Popoviciu's inequality holds because of Hlawka's inequality, which is just a special case of Popoviciu. More generally, our theorem says roughly that every inequality about absolute functions of linear functions in real variables implies an inequality about convex functions.


Exercises:

  1. Prove that in the second condition in the theorem, it suffices to take t{x1,,xn}t\in\{x_1,\ldots,x_n\}. (This fulfils the promise of "finitely many checks" at the start of this post, in a sense.)
  2. Let's prove Hlawka's inequality by checking finitely many cases. We may assume that xyx+y+z3zx\le y\le\frac{x+y+z}3\le z; putting the variable tt back into the inequality, consider the functionφ(t)=xt+yt+zt+3x+y+z3t2x+y2t2y+z2t2z+x2t.\begin{align*}\varphi(t)&=\lvert x-t\rvert+\lvert y-t\rvert+\lvert z-t\rvert+3\left\lvert\frac{x+y+z}3-t\right\rvert\\&\quad{}-2\left\lvert\frac{x+y}2-t\right\rvert-2\left\lvert\frac{y+z}2-t\right\rvert-2\left\lvert\frac{z+x}2-t\right\rvert.\end{align*}Which values of tt do we need to check to ensure that φ(t)0\varphi(t)\ge0 for all tt (and in particular t=0t=0)? Hint: φ\varphi is piecewise linear.
  3. Using Hlawka's inequality, prove the weighted version of Popoviciu's inequality: if α,β,γ>0\alpha,\beta,\gamma>0 and α+β+γ=1\alpha+\beta+\gamma=1, then for all convex ff we haveαf(x)+βf(y)+γf(z)+f(αx+βy+γz)(α+β)f(αx+βyα+β)+(β+γ)f(βy+γzβ+γ)+(γ+α)f(γz+αxγ+α).\begin{align*}&\alpha f(x)+\beta f(y)+\gamma f(z)+f(\alpha x+\beta y+\gamma z)\\&\ge(\alpha+\beta)f\left(\frac{\alpha x+\beta y}{\alpha+\beta}\right)+(\beta+\gamma)f\left(\frac{\beta y+\gamma z}{\beta+\gamma}\right)+(\gamma+\alpha)f\left(\frac{\gamma z+\alpha x}{\gamma+\alpha}\right).\end{align*}
  4. What does the triangle inequality x+yx+y\lvert x\rvert+\lvert y\rvert\ge\lvert x+y\rvert translate to for convex functions? (Also derive the weighted version, like in the previous exercise.)
  5. You might expect that Hlawka's inequality generalises to more variables, where the terms are like inclusion-exclusion:S1,,n(1)S1iSxi0.\sum_{S\subseteq{1,\ldots,n}}(-1)^{\lvert S\rvert-1}\left\lvert\sum_{i\in S}x_i\right\rvert\ge0.Unfortunately, this doesn't work; find a counterexample for n=4n=4.
  6. Generalise Hlawka's inequality to nn variables (in a different way from the previous exercise, which doesn't work), and deduce the corresponding inequality for convex functions. Check against Popoviciu's original paper.
  7. Given xi,aiRx_i,a_i\in\mathbb R, derive a minimal set of conditions which would verify that the inequalityiaif(xi)0()\sum_ia_if(x_i)\ge0\qquad(*)holds for every increasing function ff. Hence prove Schur's inequality: for all x,y,z0x,y,z\ge0 and r>0r>0, we havexr(xy)(xz)+yr(yx)(yz)+zr(zx)(zy)0.x^r(x-y)(x-z)+y^r(y-x)(y-z)+z^r(z-x)(z-y)\ge0.
  8. In the spirit of proving inequalities from special cases: Suppose that λiR\lambda_i\in\mathbb R, and i\ell_i are linear polynomials with constant coefficient 0, such that for all x1,,xnRx_1,\ldots,x_n\in\mathbb R we haveiλii(x1,,xn)0.\sum_i\lambda_i\lvert \ell_i(x_1,\ldots,x_n)\rvert\ge0.Prove that the same inequality holds for all x1,,xnRdx_1,\ldots,x_n\in\mathbb R^d. (For example, Hlawka's inequality holds in any finite dimensional vector space.)

Comments

Popular posts from this blog

SMO Open 2024 ??% Speedrun

Musings about the IMO