\documentstyle[IJFCS]{article}
\input{amssym.def}
\begin{document}
%% print out the publisher copyright heading
\copyrightheading
%% use symbolic footnote
\symbolfootnote
%% use normal text like skip (13pt)
\textlineskip
\begin{center}
%% print out titles in IJFCS format
\fcstitle{DEPTH AND PATH LENGTH OF HEAP ORDERED TREES}
\vspace{24pt}
{\authorfont HELMUT PRODINGER}
\vspace{2pt}
%% use smaller line skip here
\smalllineskip
{\addressfont
Institut f\"ur Algebra und Diskrete Mathematik,
Technical University of Vienna,\\
Wiedner Hauptstrasse 8--10, A-1040 Vienna, Austria,\\
{\tt Email:\,Helmut.Prodinger@tuwien.ac.at}
}
\vspace{20pt}
%% authors need not care about this
\publisher{17 October 1995}{24 April 1996}{R. L. Graham}
\end{center}
%\alphfootnote
%% abstract environment
\begin{abstract}
A heap ordered tree of size $n$
is a planted plane tree
together with a bijection from the nodes to the set $\{1,\dots,n\}$
which is monotonically increasing when going from the root to the leaves.
In a recent paper by Chen and Ni, the expectation and the variance of the depth of
a random node in a random heap ordered tree of size $n$
was considered. Here, we give
additional results concerning level polynomials. From a historical point
of view, we trace these trees back to an earlier paper by Prodinger
and Urbanek and find formul\ae\ that are proved in the paper by Chen and Ni
by ad hoc computations already
in a classic book by Greene and Knuth.
Also, we mention that a paper by Bergeron, Flajolet and Salvy develops a more general
theory of increasing trees, based on simply generated families of trees.
Furthermore we consider the path length
which is a natural concept when dealing with the depth.
Expectation and variance are obtained, both explicitly and asymptotically.
\keywords{Increasing trees, depth, path length, Catalan numbers.}
\end{abstract}
\textlineskip
\def\({\left(}
\def\){\right)}
\def\ca{{\cal{ C}}}
\def\binom#1#2{{#1\choose#2}}
\def\mult{\binom{n}{j_1,\dots,j_k}}
\def\frac#1#2{{#1\over#2}}
\section{Introduction}
A {\sl heap ordered tree\/} with $n$ nodes (``size $n$'')
might be described as a {\sl planted plane tree\/}
together with a bijection from the nodes to the set $\{1,\dots,n\}$
which is {\sl monotonically increasing\/}
when going from the root to the leaves.
In Ref.~[\refcite{ChenNi94}], the depth of a random node in a random heap ordered tree was
considered. Here, we want to give a streamlined derivation of these
results,
together with new ones about the coefficients of the
{\sl level polynomials}.
Then, we turn to the {\sl path length}, which is the sum of the depths of
all nodes in the heap ordered tree. Path length is the natural concept when
analyzing search costs etc. (Compare Ref.~[\refcite{Knuth73}]).
Finally we would like to mention, that heap ordered trees appear already in Ref.~[\refcite{ProUrb83}]
where monotone labelings of trees where studied in general.
Things that are not explained here in detail are standard and can be found
%%DT for instance in the textbooks\cite{Knuth73} and\cite{Mahmoud92}.
for instance in the textbooks.\cite{Knuth73,Mahmoud92}
\medskip
We would like to emphasize that the present note has a strong tutorial
flavor as it emphasizes how heap ordered trees and related parameters should
be analyzed in a most efficient, direct, and natural way.
Especially we want to draw the reader's attention to the fact that
the important paper\cite{BeFlSa92} develops a general theory of
{\sl increasing trees},
based on {\sl simply generated families of trees\/}, where
planted plane trees are just a special case. Although it emphasizes
the asymptotic point of view, it has all the generating functions of
interest, and getting explicit expressions (involving harmonic
numbers etc.) is a complete routine task.
\medskip
The enumeration of the numbers $a_n$ of heap ordered trees of size $n$
is easy and appears already in Ref.~[\refcite{ProUrb83}]:
The recursion is
$$
a_{n+1}=\sum_{k\ge1}\sum_{j_1+\dots+j_k=n}
\mult \, a_{j_1}\dots a_{j_k}\quad\hbox{for}\ {n\ge1}, \ \ a_1=1~;
$$
hence the exponential generating function $A(z):=\sum_{n\ge0}a_n\frac
{z^n}{n!}$ fulfills the differential equation
$$
A'(z)=\frac1{1-A(z)}\qquad\hbox{with}\quad A(0)=0~,
$$
with the solution
$$
A(z)=1-\sqrt{1-2z\,}~,
$$
so that
$$
a_n=n!\,2^{1-n}\,\ca_n~.
$$
Here and later we use the notion of a (shifted) {\sl Catalan number}
$$
\ca_n=\frac1n\binom{2n-2}{n-1}~.
$$
\def\sp{\pi_{n;j_1,\dots,j_k}}
Considering any variety of trees, the concept of {\sl splitting probabilities}
is a good idea. We define them by
$$
\sp:=\mult\frac{a_{j_1}\dots a_{j_k}}{a_{n+1}}
=\frac{2^k}{\binom{2n}{n}}\, \ca_{j_1}\dots\ca_{j_k}~.
$$
It is a simple exercise to show analytically that they indeed sum to 1, which
is the same as to show that
$$
\sum_{k\ge0}\sum_{j_1+\dots+j_k=n}
2^k \, \ca_{j_1}\dots\ca_{j_k}
=\binom{2n}{n}~.
$$
In terms of generating functions this means
$$
\sum_{k\ge0}2^k\(\frac{1-\sqrt{1-4z\,}}{2}\,\)^k=\frac1{\sqrt{1-4z\,}}~,
$$
which is obvious.
\section{Level Polynomials of Heap Ordered Trees}
When studying statistics on the depth of any variety of trees, it is
customary to consider {\sl level polynomials} $L_n(u)$. The coefficient
of $u^k$ herein should be the expected number of nodes on level $k$, the
root being on level $0$, so that $L_n(1)=n$. Compare
Refs.~[\refcite{Knuth73}] and [\refcite{Mahmoud92}].
Here, the (self-explanatory) recursion is
$$
L_{n+1}(u)=1+u
\sum_{k\ge1}\sum_{j_1+\dots+j_k=n}\sp \Big(L_{j_1}(u)+\dots+L_{j_k}(u)\Big)~.
$$
By symmetry, we rewrite the recursion as
$$
L_{n+1}(u)\binom{2n}n=\binom{2n}n+u
\sum_{k\ge1}\sum_{j_1+\dots+j_k=n}k\,2^k \,
\ca_{j_1}\dots\ca_{j_k}\,L_{j_1}(u)~.
$$
Now we introduce the bivariate generating function
$$
L(z,u):=\sum_{n\ge0}L_n(u)\, \ca_n\, z^n~;
$$
then we get
\begin{eqnarray*}
\frac{\partial L(z,u)}{\partial z}
&\!\!=\!\!&\frac1{\sqrt{1-4z}} +u\sum_{k\ge1}k\,2^k\,L(z,u)
\(\frac{1-\sqrt{1-4z\,}}{2}\,\)^{k-1}
\\ &\!\!=\!\!&\frac1{\sqrt{1-4z}} +\frac{2u\, L(z,u)}{1-4z}~.
\end{eqnarray*}
The solution of this differential equation, subject to the condition
$L(0,u)=0$, is
$$
L(z,u)=\frac1{2(1+u)}\( (1-4z)^{-u/2}-(1-4z)^{1/2}\)~.
$$
Hence
$$
[z^n]L(z,u)=\frac1{2(1+u)}\Bigg( \binom{-u/2}{n}(-4)^n+ 2\, \ca_n \Bigg)~.
$$
Since (with signless Stirling numbers of the first kind, compare
Ref.~[\refcite{GrKnPa89}])
$$
\binom{-u/2}{n}=\frac{1}{n!}\sum_{j\ge0}\left[n\atop j\right]
(-1)^{n-j}\bigl(-\textstyle{\frac u2}\bigr)^j
\qquad\hbox{and}\qquad
\displaystyle{\frac{1}{1+u}=\sum_{j\ge0}(-u)^j,}
$$
we find
$$
[u^kz^n]L(z,u)=\frac{4^n(-1)^k}{2n!}\sum_{j=0}^k\left[n\atop j\right]
\bigl(-\textstyle{\frac 12}\bigr)^j+\ca_n(-1)^k~.
$$
Dividing this by $\ca_n$, we find the following theorem.
\begin{theorem} The expected number of nodes
on level $k$ in random heap ordered trees
of size $n$ is given by
$$
\frac{4^n(-1)^k(n-1)!}{2(2n-2)!}\sum_{j=0}^k\left[n\atop j\right]
\bigl(-\textstyle{\frac 12}\bigr)^j+(-1)^k
~.
$$
\rightline{$\diamondsuit$}
\end{theorem}
\def\wu{\sqrt{1-4z\,}}
Since $L_n(1)=n$, the quantity $L_n(u)/n$ is a probability generating
function, and $L'_n(1)/n$ is just the expected depth of a random
heap ordered tree of size $n$. We can perform the differentation
with respect to $u$, followed
by the substitution $u=1$, on the function $L(z,u)$, with the result
$$
M_1(z):=-\frac z{2\wu}+\frac1{8\wu}\log\frac1{1-4z}~.
$$
In order to get $[z^n]M_1(z)$, we have to know the formula
$$
[z^n]\frac1{\sqrt{1-z}}\log\frac1{1-z}=4^{-n}\binom{2n}n
\big(2H_{2n}-H_n\big)~,
$$
which was proved in Ref.~[\refcite{ChenNi94}] by an {\sl ad hoc method}.
However in Greene/Knuth\cite{GreKnu82}
we find the general result ($m\in {{\Bbb{C}}}$)
$$
[z^n]\frac{1}{{(1-z})^{m+1}}\log\frac{1}{1-z}
=\binom{n+m}{n}\big(H_{n+m}-H_m\big)~;
$$
the interpretation of $H_{n+m}-H_m$ is
$$
H_{n+m}-H_m=\frac{1}{m+1}+\frac{1}{m+2}+\dots+\frac{1}{m+n}~.
$$
The formula follows now by a simple computation;
\begin{eqnarray*}
H_{n-\frac12}-H_{-\frac12}&\!\!=\!\!&
\frac{1}{\frac12}+\frac{1}{\frac32}+\frac{1}{\frac52}
+\dots+\frac{1}{n-\frac12}\\
&\!\!=\!\!&2\Big(1+\frac13+\frac15+\dots+\frac{1}{2n-1}\Big)\\
&\!\!=\!\!&2H_{2n}-2\Big(\frac12+\frac14+\dots+\frac{1}{2n}\Big)\\
&\!\!=\!\!&2H_{2n}-H_n~.
\end{eqnarray*}
Hence
$$
[z^n]M_1(z)=-\frac12\binom{2n-2}{n-1}+\frac18\binom{2n}{n}
\big(2H_{2n}-H_n\big)~.
$$
We must divide this quantity by $n\ca_n=\binom{2n-2}{n-1}$ and get
a result of Ref.~[\refcite{ChenNi94}] as a corollary.
\begin{theorem} The expected depth $d_n$
of a random heap ordered tree of size $n$ is
given by
$$
d_n=\Big(1-\frac1{2n}\Big)\big(H_{2n}-\frac12H_n\big)-\frac12~.
$$\rightline{$\diamondsuit$}
\end{theorem}
In order to compute the variance, we differentiate $L(z,u)$ twice
with respect to $u$, followed by $u=1$, which gives
$$
M_2(z):=\frac z{2\wu}-\frac1{8\wu}\log\frac1{1-4z}
+\frac{1}{16\wu}\log^2\frac1{1-4z}
~.
$$
For that we need another formula from Greene/Knuth.\cite{GreKnu82}
$$
[z^n]\frac{1}{{(1-z})^{m+1}}\log^2\frac{1}{1-z}
=\binom{n+m}{n}\Big(\big(H_{n+m}-H_m\big)^2-
\big(H^{(2)}_{n+m}-H^{(2)}_m\big)
\Big)~.
$$
The special instance $m=-\frac12$ becomes
$$
[z^n]\frac{1}{\sqrt{1-z\, }}\log^2\frac{1}{1-z}
=4^{-n}\binom{2n}{n}\Big(\big(2H_{2n}-H_n\big)^2-
\big(4H^{(2)}_{2n}-H^{(2)}_n\big)
\Big)~.
$$
The variance is obtained via $[z^n]M_2(z)\big/n\ca_n+d_n-d_n^2$,
which yields another result from Ref.~[\refcite{ChenNi94}].
\begin{theorem} The variance $\hbox{{\rm Var}}_n$ of the depth
of a random heap ordered tree of size $n$ is
given by
$$
\hbox{{\rm Var}}_n=\Big(1-\frac1{2n}\Big)\Big[\big(H_{2n}-\frac12H_n\big)^2
-\big(H_{2n}^{(2)}-\frac14H_n^{(2)}\big)\Big]-d_n^2~.
$$
\rightline{$\diamondsuit$}
\end{theorem}
Observe that the formula\cite{ChenNi94} contains an extra square somewhere,
which is erroneous. As a consequence, the asymptotic formula is also in
error; the correct version is
$$
\hbox{Var}_n=
\frac{\log n}{2}+\log2+\frac\gamma2-\frac14-\frac{\pi^2}{8}+{\cal O}\Big(
\frac{\log n}{n}\Big)
~.
$$
\section{Path Length of Heap Ordered Trees}
As indicated earlier, we now engage into the path length, i.e. the sum
of the distances of all nodes to the root. Concerning the expected value,
it must be the expected depth times the number of nodes. However, for
the variance, such a relation does not hold; henceforth independent
computations are necessary.
We can set up
a sequence of probability generating functions for the path length,
viz.
$$
F_{n+1}(u)=u^n
\sum_{k\ge0}\sum_{j_1+\dots+j_k=n}
\sp\, F_{j_1}(u)\dots F_{j_k}(u),\qquad F_0(u)=0~,
$$
or
$$
F_{n+1}(u)\binom{2n}n=u^n
\sum_{k\ge0}\sum_{j_1+\dots+j_k=n}
2^k\,\ca_{j_1}\dots\ca_{j_k}\, F_{j_1}(u)\dots F_{j_k}(u),\qquad F_0(u)=0~.
$$
Again, let us define the bivariate generating function
$$
F(z,u):=\sum_{n\ge0}F_n(u)\, \ca_n\, z^n~;
$$
then
$$
\frac{\partial F(z,u)}{\partial z}=\frac{1}{1-2F(zu,u)}~.
$$
This functional differential equation is harder than the corresponding
equation in the instance of the depth; hence we cannot expect to solve
it explicitly. However, it contains enough information to get all moments,
at least in principle.
The generating function $f(z)$ of the expected values is
given by
$$
f(z)=\frac{\partial}{\partial u}F(z,u)\bigg\vert_{u=1}~,
$$
which leads to the differential equation
$$
f'(z)=\frac{2\,f(z)}{1-4z}+\frac{2z}{(1-4z)^{3/2}},\qquad f(0)=0~.
$$
The solution is
$$
f(z)
=-\frac z{2\wu}+\frac1{8\wu}\log\frac1{1-4z}~,
$$
which is the function $M_1(z)$ from before, whence we obtain
\begin{theorem} The expected path length $l_n$
of a random heap ordered tree of size $n$ is
given by
$$
l_n=\Big(n-\frac1{2}\Big)\big(H_{2n}-\frac12H_n\big)-\frac n2~.
$$\rightline{$\diamondsuit$}
\end{theorem}
\noindent
For the second factorial moment, we have to consider
$$
g(z)=\frac{\partial^2}{\partial u^2}F(z,u)\bigg\vert_{u=1}~.
$$
The differential equation for $g(z)$ is
$$
g'(z)=\frac8{(1-4z)^{3/2}}\Big(\frac{z}{\sqrt{1-4z}}+f(z)\Big)^2
+\frac{2}{1-4z}\bigg(\frac{2z^2}{(1-4z)^{3/2}}+2z\,f'(z)+g(z)\bigg)~.
$$
Maple solves it (subject to $g(0)=0$),
\begin{eqnarray*}
g(z)&\!\!=\!\!&\frac{1}{32}\frac{1}{(1-4z)^{3/2}}\log^2\frac{1}{1-4z}
-\frac{1}{16}\frac{1}{\sqrt{1-4z}}\log^2\frac{1}{1-4z}\\
&\!\!+\!\!&\frac{1}{16}\frac{1}{(1-4z)^{3/2}}\log\frac{1}{1-4z}
-\frac{5}{16}\frac{1}{\sqrt{1-4z}}\log\frac{1}{1-4z}\\
&\!\!+\!\!&\frac{3}{32}\frac{1}{(1-4z)^{3/2}}+\frac1{16\sqrt{1-4z}}
-\frac5{32}\sqrt{1-4z}~.
\end{eqnarray*}
Now we need the formul\ae
$$
[z^n]\frac{1}{{(1-z})^{3/2}}\log\frac{1}{1-z}
=4^{-n}(2n+1)\binom{2n}{n}
\big(H_{n+\frac12}-H_{\frac12}\big)
$$
and
$$
[z^n]\frac{1}{{(1-z})^{3/2}}\log^2\frac{1}{1-z}
=4^{-n}(2n+1)\binom{2n}{n}
\Big(\big(H_{n+\frac12}-H_{\frac12}\big)^2-
\big(H^{(2)}_{n+\frac12}-H^{(2)}_{\frac12}\big)
\Big)~.
$$
We observe
$$
H_{n+\frac12}-H_{\frac12}=\frac{2}{2n+1}-2+2H_{2n}-H_n
$$
and
$$
H^{(2)}_{n+\frac12}-H^{(2)}_{\frac12}=
\frac{4}{(2n+1)^2}-4+4H^{(2)}_{2n}-H^{(2)}_n~.
$$
Now we have enough formul\ae\ at hand to get $[z^n]g(z)$.
The variance is obtained via
$[z^n]g(z)+l_n-l_n^2$. With the help of Maple we find the following
theorem.
\begin{theorem} The variance $V_n$ of the path length
of a random heap ordered tree of size $n$ is
given by
\begin{eqnarray*}
V_n&\!\!=\!\!&
n^2\Big(\frac{1}{4}H_n^{(2)}-H_{2n}^{(2)}+\frac32\Big)\\
&\!\!+\!\!&n\Big(\frac{1}{2}H_n-H_{2n}
-\frac{1}{4}H_n^{(2)}+H_{2n}^{(2)}
-\frac34\Big)\\
&\!\!-\!\!&\frac{1}{4}H_n+\frac12H_{2n}
+\frac{1}{16}H_n^{(2)}-\frac14H_{2n}^{(2)}~.
\end{eqnarray*}
Asymptotically, we get
$$
V_n=n^2\Big(\frac32-\frac{\pi^2}{8}\Big)-\frac{n\log n}{2}
+n\Big(-\frac12-\log2+\frac{\pi^2}{8}-\frac{\gamma}{2}\Big)
+{\cal O}(\log n)~.
$$
The constant $\frac32-\frac{\pi^2}{8}$ is\/ $0.266299449$.
\rightline{$\diamondsuit$}
\end{theorem}
It is interesting to note that the asymptotic orders of depth
(expectation and variance) are resp. $\log n$ and $\log n$, whereas
for the path length we get $n\log n$ and $n^2$.
\nonumsection{References}
\bibliographystyle{plain}
\begin{thebibliography}{1}
\bibitem{BeFlSa92}
F.~Bergeron, P.~Flajolet, and B.~Salvy.
\newblock Varieties of increasing trees.
\newblock {\em Lecture Notes in Computer Science}, 581:24--48, 1992.
\bibitem{ChenNi94}
W.-C. Chen and W.-C. Ni.
\newblock On the average altitude of heap--ordered trees.
\newblock {\em International Journal of Foundations of Computer Science},
15:99--109, 1994.
\bibitem{GrKnPa89}
R.~L. Graham, D.~E. Knuth, and O.~Patashnik.
\newblock {\em Concrete Mathematics}.
\newblock Addison Wesley, 1989.
\bibitem{GreKnu82}
D.~H. Greene and D.~E. Knuth.
\newblock {\em Mathematics for the analysis of algorithms}.
\newblock Birkh\"auser, Boston, second edition, 1982.
\bibitem{Knuth73}
D.~E. Knuth.
\newblock {\em The Art of Computer Programming}, volume 3: Sorting and
Searching.
\newblock Addison-Wesley, 1973.
\bibitem{Mahmoud92}
H.~M. Mahmoud.
\newblock {\em Evolution of Random Search Trees}.
\newblock John Wiley, New York, 1992.
\bibitem{ProUrb83}
H.~Prodinger and F.J. Urbanek.
\newblock On monotone functions of tree structures.
\newblock {\em Discrete Applied Mathematics}, 5:223--239, 1983.
\end{thebibliography}
\end{document}