\input amstex
\documentstyle{amsppt}
%Fibonacci Quarterly,
%geometric distributions and forbidden subwords
\topmatter
\author
Helmut Prodinger
\endauthor
\affil
Department of Algebra and Discrete Mathematics,\\
Technical University of
Vienna
\endaffil
\address
Department of Algebra and Discrete Mathematics\newline
Technical University of
Vienna\newline
Wiedner Hauptstrasse 8--10\newline
A-1040 Vienna\newline
Austria\newline
\phantom{bergume}\newline
AMS classification: 05A15
\endaddress
\title
geometric distributions and forbidden subwords
\endtitle
\abstract We describe in this tutorial note how the geometric distribution
of order $k$ can be seen in context of enumeration problems of strings.
\endabstract
\subjclass 05A15
\endsubjclass
\thanks
This note was written while the author visited the University Paris 6;
he is thankful for the warm hospitality he encountered there
\endthanks
\endtopmatter
\TagsOnRight
\define\flavit{2}
\def\GOA{3}
\def\GOB{4}
\def\GOC{5}
\def\OA{8}
\def\OB{9}
\def\knu{7}
\def\kipro{6}
\def\babe{1}
\def\phil{10}
\document
In a recent paper {\cite\babe} Barry and Lo Bello dealt with the moment generating
function of the geometric distribution of order $k$. I want to draw
the attention of the {\sl Fibonacci Community\/} to several related papers
that were apparently missed by the authors and also to provide a
straightforward derivation of their result.
\medskip
Since the moment generating functing $M(t)$ is related to the probability
generating function $f(z)$ by $M(t)=f(e^t)$, it is sufficient to consider
$f(z)$.
We code a success trial by $\bold 1$ and a failure by $\bold 0$, thereby
obtaining a {\sl word\/} consisting of the letters $\bold0$ and $\bold1$.
A sequence of $n$ trials is thus represented by a {\sl word\/}
of length $n$
over the {\sl alphabet\/} $\{\bold 0, \bold 1\}$. In a natural
way we attach a {\sl weight\/} $\omega$ to each word $x$ by replacing
$\bold 1$ by $p$ and $\bold 0$ by $q$ and then multiplying as usual.
For instance, the word $\bold 0\bold 1\bold 1\bold 0$ has the
weight $p^2q^2$.
We consider {\sl languages\/}
(sets of words) $L$ and their {\sl generating function\/}
$l(z)$. The latter is defined to be
$$
l(z)=\sum_{x\in L}\omega(x)z^{|x|},\tag1
$$
where $|x|$ is the {\sl length\/} (number of letters) of the word $x$.
This generating function can be simply obtained by formally replacing
the letter $\bold 1$ by $pz$ and $\bold 0$ by $qz$ in the language $L$
and replacing the so-called {\sl concatenation\/} of words by the
usual product and the (disjoint) {\sl union\/} by the usual addition, so
that for instance $L=\{\bold 0, \bold 0\bold 1\bold 0,
\bold 0\bold 1\bold 1\bold 0\}$ has the generating function
$l(z)=qz+pq^2z^3+p^2q^2z^4$.
\medskip
Instead of considering $\Bbb P\{X=n\}$ it
is easier to consider $\Bbb P\{X>n\}$, that means the probability that
$n$ trials did not produce $k$ consecutive successes, or the probability
that a random word of $n$ letters does not contain the (contiguous) subword
$\bold1^k$.
We consider the
language of these words. A compact notion of it is
$$
\big(\bold1^{n\}z^n=\frac{1}{1-qz\frac{1-p^kz^k}{1-pz}}
\cdot \frac{1-p^kz^k}{1-pz}=
\frac{1-p^kz^k}{1-z+qp^kz^{k+1}}.
\tag4
$$
From this we also obtain the probability generating function
$$\aligned
f(z)&:=\sum_{n\ge0}\Bbb P\{X=n\}z^n
=\sum_{n\ge0}\Big(\Bbb P\{X>n-1\}-\Bbb P\{X>n\}\Big)z^n \\
&=1+z\sum_{n\ge1}\Bbb P\{X>n-1\}z^{n-1}-
\sum_{n\ge0}\Bbb P\{X>n\}z^n \\
&=1-(1-z)g(z)=\frac{1-z+qp^kz^{k+1}-1+p^kz^k+z-p^kz^{k+1}}{1-z+qp^kz^{k+1}}\\
&=\frac{p^kz^k(1-pz)}{1-z+qp^kz^{k+1}}.
\endaligned
\tag5
$$
This derivation completely avoided unpleasant recursions. For such very
useful combinatorial constructions and their automatic translation into generating
functions we refer to the survey \cite{\flavit} and a few earlier survey papers
of Flajolet cited therein.
The probability generating function \thetag5 appeared first in \cite\phil.
\smallskip
Guibas and Odlyzko in a series of papers (\cite{\GOA}, \cite{\GOB}, \cite{\GOC})
dealt with general forbidden subwords, not just $\bold1^k$.
These
papers were surveyed in \cite{\OA} and \cite{\OB}. Rewriting things accordingly,
formula \thetag{6.44} in \cite{\OB} gives
$$
f(z)=\frac{(pz)^k}{(pz)^k+(1-z)C(z)},
\tag6
$$
where the polynomial $C(z)$ (the ``correlation polynomial'')
depends on the forbidden pattern and is
$$
C(z)=1+(pz)+\dots+(pz)^{k-1}=\frac{1-(pz)^k}{1-pz}
\tag7
$$
in this special instance.
\smallskip
Knuth used similar arguments in \cite\knu. He considered strings of $\bold0,
\bold1, \bold2$, where $\bold0$ and $\bold2$ appear with probability $\frac14$
and $\bold1$ appears with probability $\frac12$ and the string $\bold1^k\bold2$
is forbidden. Also, he considered the zeroes of the ``auxiliary equation''
$$
1-z+qp^kz^{k+1}=0.
\tag8
$$
For example, there is a ``dominant'' solution $\rho=\rho_k$ which can be approximated
by ``bootstrapping'': Starting from
$
z=1+qp^kz^{k+1}
$,
a first approximation is $\rho\approx1$. Inserting this on the
right hand side and expanding,
we find
$
\rho\approx 1+qp^k
$,
and after one more step
$$
\rho\approx 1+qp^k+(k+1)q^2p^{2k},\tag9
$$
etc. Kirschenhofer and Prodinger also used this type of argument in \cite\kipro.
With this dominant singularity it is also easy to find the asymptotics of
$\Bbb P\{X=n\}$ for fixed $k$, as $n\to\infty$. We have
$$
f(z)
=\frac{p^kz^k(1-pz)}{1-z+qp^kz^{k+1}}\sim\frac{A_k}{1-z/\rho}
\qquad\text{as}\quad z\to\rho.
\tag10
$$
This can be explained informally by saying that {\sl locally\/} only
one term of the {\sl partial fraction decomposition\/} of the {\sl rational
function\/} $f(z)$ is needed to describe its behavior in a vicinity of
the dominant singularity $\rho$.
Here, $A_k$ is a constant that can be found by the traditional techniques
to compute the partial fraction decomposition of a rational function.
Thus the coefficient of $z^n$ in $f(z)$ (i.e. $\Bbb P\{X=n\}$)
behaves as $A_k\cdot \rho^{-n}$ (the coefficient of $z^n$ in
$\frac{A_k}{1-z/\rho}$). The constant $A_k$
behaves as $A_k\approx qp^k$ for $k\to\infty$.
Such asymptotic considerations
are to be found in many textbooks and survey articles, notably in
\cite\OB.
\newpage
\Refs
\ref\no\babe\by M.J.J. Barry and A.J. Lo Bello\paper The moment generating
function of the geometric distribution of order $k$
\jour The Fibonacci Quarterly\vol31\yr1993\pages178--180
\endref
\ref\no\flavit\by P.Flajolet and J.Vitter\paper Analysis of algorithms
and data structures
\pages432--524\inbook Handbook of Theoretical Computer Science, Vol.A.\publ
North Holland\yr 1990
\endref
\ref\no\GOA\by L.J.Guibas and A.M.Odlyzko\paper Maximal prefix-synchronized codes
\jour SIAM J. Appl. Math. \vol35\yr1978\pages 401--418
\endref
\ref\no\GOB\by L.J.Guibas and A.M.Odlyzko\paper Long repetitive patterns in random
sequences
\jour Z. Wahrscheinlichkeitstheorie und verwandte Gebiete \vol53
\yr1980\pages 241--262
\endref
\ref\no\GOC\by L.J.Guibas and A.M.Odlyzko\paper String overlaps, pattern matching,
and nontransitive games
\jour J. Comb. Theory A\vol30\yr1981\pages 183--208
\endref
\ref\no\kipro\by P.Kirschenhofer and H.Prodinger\paper A coin tossing
algorithm for counting large
numbers of events
\jour Mathematica Slovaca\vol 42\yr 1992\pages 531--545
\endref
\ref\no\knu\by D.E.Knuth
\paper The average time for carry propagation
\jour Indagationes Mathematic\ae\vol 40\yr 1978\pages 238--242
\endref
\ref\no\OA\by A.M.Odlyzko\paper Enumeration of strings\pages 205--228
\yr1985\inbook Combinatorial Algorithms on Words, A.Apostolico and Z.Galil, eds.
\publ Springer
\endref
\ref\no\OB\by A.M.Odlyzko\paper Asymptotic enumeration methods
\inbook in: Handbook of Combinatorics,
\toappear
\publ North Holland
\endref
\ref\no\phil\by A.N.Phillippou, C.Georghiu, and G.N.Philippou
\paper A Generalized Geometric Distribution and
Some of its Properties\jour Statistics and Probability Letters
\pages 171--175\vol 1.4 \yr 1983
\endref
\endRefs
\enddocument