\documentstyle[11pt]{article}
\newtheorem{Proposition}{Proposition}
\newtheorem{Theorem}{Theorem}
\newtheorem{Lemma}{Lemma}
\newenvironment{pic}[1]%
{\input{#1}\begin{figure}\centerline{\box\graph}\vspace*{\bigskipamount}}%
{\end{figure}}
\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\renewcommand{\baselinestretch}{1.3}
\newenvironment{comment}{\setbox0=\vbox\bgroup}{\egroup}
\newcommand{\con}{\rightarrow}
\newcommand{\cd}{\mbox{$\stackrel{d}{\rightarrow}$}}
\newcommand{\cp}{\mbox{$\stackrel{p}{\rightarrow}$}}
\newcommand{\ed}{\mbox{$ \ \stackrel{d}{=}$ \ }}
\newcommand{\ep}{\varepsilon}
\newcommand{\FF}{{\cal F}}
\newenvironment{singlespace}{\def\baselinestretch{1.0}\large\normalsize}{\par}
\setlength{\textheight}{8.5in}
\setlength{\textwidth}{6in}
\def\noi{\noindent}
\def\lf{\lfloor}
\def\rf{\rfloor}
\def\eod{\vrule height 6pt width 5pt depth 0pt}
\def\parsec{\par\noindent}
\def\big{\bigskip\parsec}
\def\med{\medskip\parsec}
\def\ll{\ell}
\def\l{\lambda}
\def\eps{\varepsilon}
\def\pr{{\rm \Pr}}
\def\lg{\langle}
\def\rg{\rangle}
\def\n{\eta}
\def\ss{w}
\def\eps{\varepsilon}
\def\GG{\Gamma}
\def\D{{\bf D}}
\def\X{\{X_k\}}
\def\Xs{\langle X, w\rangle}
\def \XXs{\langle {\bf X} , \sigma\rangle}
\def\L{{\cal L}}
%\def\F{{\cal F}_{\sigma}}
%\def\C{{\cal C}}
\def\W{{\cal W}}
%\def\A{{\cal A}}
\def\F{{\cal F}}
\def\B{{\cal A}}
\def\C{{\cal B}}
\def\A{\Sigma}
\def\P{{\bf P}}
\def\v{V}
\def\S{{\cal S}}
\def\U{{\cal U}}
\def\ts{\widetilde{s}}
\def\Wk{{\W_{k}}}
\def\lf{\lfloor}
\def\rf{\rfloor}
\def\ol{\overline}
\def\T{{\cal T}}
\def\la{\langle}
\def\ra{\rangle}
\vsize=8.5in
\voffset=-0.7in
\hoffset=-0.5in
\date{}
\begin{document}
\begin{center}
{\bf PROBABILISTIC MODELING OF DATA STRUCTURES ON WORDS:}\\
{\bf A Reply to Professor Andersson's Letter}
\par
\medskip
\parsec
\end{center}
\par
\bigskip\bigskip\bigskip
\begin{singlespace}
\noi
\begin{tabular}{lll}
Peter Kirschenhofer&Helmut Prodinger&Wojciech Szpankowski\footnotemark\\
Dept. of Algebra\&Discrete Math.&Dept. of Algebra\&Discrete Math.&Dept. of Computer Science\\
Technical University of Vienna&Technical University of Vienna&Purdue University\\
A-1040 Vienna &A-1040 Vienna &W. Lafayette, IN 47907\\
Austria&Austria&U.S.A.\\
\end{tabular}
\footnotetext{This research was supported by
the NSF Grant INT-8912631, and in part by
AFOSR grant 90-0107, by the NSF grant
CCR-9201078 and NCR-9206315,
and by grant R01 LM05118 from the National Library of Medicine.}
\bigskip\bigskip\bigskip
\begin{center}
{\bf Abstract}
\end{center}
\par
\bigskip
Professor Arne Andersson's Letter-to-the-Editor concerning our paper
``On the Balance Property of Patricia Tries: External Path Length
Viewpoint" {\it Theor. Comp. Sci.}, 68, 1989 motivated us
to present some thoughts about probabilistic analysis of data
structures on words. The intention of this note
is to discuss potential advantages
and disadvantages of probabilistic analyses, and in particular
to provide a proper interpretation of probabilistic results.
This can only be achieved after building a {\it suitable} probabilistic
model for a {\it given} set of data. We describe a sequence of
probabilistic models with an increasing generality that can be applied
to the analysis of algorithms on words. Finally, we discuss a few
theoretical results to indicate that our findings from the above cited paper
hold true under more general probabilistic assumptions.
\end{singlespace}
\renewcommand{\thefootnote}{\arabic{footnote}}
\setcounter{footnote}{0}
\big
{\bf 1. PROBABILISTIC MODELS}
\par
\medskip
It is important to realize that the expected size and shape of
the data structures on words depends upon the various probabilistic
assumptions on the words (strings).
Primarily, there are three considerations:
\begin{itemize}
\item Characteristics of the alphabet; that is, the distribution
on the alphabet $\Sigma$ that determines how the
symbols are selected to form a string;
\item Statistical dependency between words; that is, whether or not
the words are statistically dependent;
\item Number of words; that is, whether the number $n$ of words is fixed or
a random variable.
\end{itemize}
\par
We now discuss these considerations and present a few basic
probabilistic models for data structures on words,
in particular tries and suffix trees (cf. \cite{ahu}, \cite{axa}).
To recall, a trie is a digital tree built from a set of keys
(strings, sequences) $X_1, \ldots, X_n$. Every key is a sequence
of symbols from an alphabet. A suffix tree is a trie built from
suffixes of a {\it single} string (key, sequence) $X$. To
simplify our further presentation, we write $X$ or $\X_{k=1}^\infty$
as a generic notation for a sequence of symbols from a given alphabet
$\Sigma$.
The {\it Bernoulli model} is characterized by the following probabilistic
assumptions:
\begin{itemize}
\item[(A)] {\sc Bernoulli Model}\parsec
The symbols of the alphabet occur independently of one another;
thus, a key $X=x_1 x_2 x_3 \ldots$ can be described as the outcome
of an infinite sequence of
Bernoulli trials in which $\pr\{x_j = {\omega}_i\} = p_i$ and
$\sum_{i=1}^V p_i = 1$, where $\omega_i\in \Sigma$ and $V$ is the
size of $\Sigma$.
If $p_1=p_2= \ldots = p_V = 1/V$, then the model is called
{\it symmetric};
otherwise, it is {\it asymmetric}.
\item[(B)] The number of strings (keys) is fixed at $n$.
\end{itemize}
If the following assumption is added to (A) and (B), then
such a model is called {\it independent} Bernoulli model, and it
is often used to analyze independent tries.
\begin{itemize}
\item[(C)] {\sc Independent Model} \parsec
The words are statistically independent; that is,
knowing a particular
string does not give information about the other strings.
\end{itemize}
\par
In the analysis of tries, assumption (B) leads to quite
complicated functional and/or differential-functional equations
(cf. \cite{flajolet1}, \cite{fr}, \cite{fs}, \cite{gonnet}, \cite{jr},
\cite{KiP}, \cite{knuth1}, \cite{knuth}, \cite{mahmoud},
\cite{spa3}).
They can be drastically simplified if (B) is replaced by the
following assumption.
\begin{itemize}
\item[(B1)] {\sc Poisson Model}\parsec
The number of strings is a random variable $N$ that has a
Poisson distribution.
\end{itemize}
The Bernoulli model is a good model when keys are transformed into
a random stream of bits through a hashing function. However, in many
other applications,
assumption (A) is not very realistic. For instance,
if the strings
are words from the English language, then there certainly
is a dependence among
the symbols of the alphabet. As an example, $h$ is much
more likely to follow an $s$
than a $b$. When this is the case, assumption (A) can
be replaced by
\begin{itemize}
\item[(A1)] {\sc Markovian Model} \parsec
There is a Markovian dependency between consecutive symbols in
a key; that is, the probability $p_{ij} = \pr\{X_{k+1} = {\omega}_j |
X_k = {\omega}_i\}$
describes the conditional probability of sampling symbol
${\omega}_j\in \Sigma$ immediately after symbol ${\omega}_i\in \Sigma$.
\end{itemize}
Note that all the models discussed so far have independent
strings and are useful for
studying tries and PATRICIA tries but not the suffix and PAT trees
(i.e., a compact suffix tree; cf. \cite{gonnet}). In
the case of suffixes, the strings involved are substrings
of a given word so they are clearly dependent.
To model this situation, we replace assumption (C) by the following one.
\begin{itemize}
\item[(C1)] {\sc Dependent Model}\parsec
The strings are dependent; i.e., knowledge of one
key gives information about other strings.
\end{itemize}
There are two further generalizations of the Markovian model, namely
the {\it mixing model} and
the {\it stationary model}.
These models are defined as follows.
\begin{itemize}
\item[(A2)] {\sc Mixing Model}\parsec
The sequence $\X_{k=1}^\infty$ satisfies
the so called {mixing condition}, that is,
there exist two constants
$0