%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \documentstyle[11pt]{amsart} \input{amssym.def} \input{amssym} \def\E{\operatorname{E}} \def\Var{\operatorname{Var}} \newtheorem{lemma}{Lemma} \newtheorem{theorem}{Theorem} \def \P{{\Bbb P}} \def \epsilon{\varepsilon} \marginparwidth 0pt \oddsidemargin 0pt \evensidemargin 0pt \marginparsep 0pt \topmargin 0pt \textwidth 6.5in \textheight 8.5 in \begin{document} \title[Random subgraphs of the $n$-cycle]{ Short Communication \vskip 20pt A generating function approach to random subgraphs of the $n$-cycle} \author[X. Gourdon]{Xavier Gourdon} \address{Xavier Gourdon \\ Algorithms Project \\ INRIA Rocquencourt \\ F-78150 Le Chesnay \\ France } \email{ Xavier.Gourdon@@inria.fr} \author[H. Prodinger]{Helmut Prodinger} \address{Helmut Prodinger \\ Institut f\"ur Algebra und Diskrete Mathematik \\ Technische Universit\"at Wien \\ Wiedner Hauptstra{\ss}e 8--10 \\ A-1040 Vienna \\ Austria \vskip 1pt {\it E-mail address\normalshape :\ \ proding@@email.tuwien.ac.at}}% \thanks{The work of the second author was done while he visited INRIA. He is thankful for the warm hospitality encountered there.} \date{} \begin{abstract} Given a cycle with $n$ nodes a random subgraph is created by accepting' edges with probability $p$ and rejecting' them with probability $q=1-p$. The parameter of interest is the order of the largest component. There are some partial answers to this question in the literature. Using an appropriate encoding by formal languages, we present here a complete solution. Singularity analysis of generating functions gives good approximations of the probabilities, and the asymptotic evaluation of expectation and variance is performed by the Mellin (integral) transform. For instance, the expected order is like a logarithm of $n$ plus an oscillating function. \end{abstract} % \maketitle \section{Introduction} We denote by $C_n$ the directed cycle of size $n$ and by $M_n(H)$ the order of the largest connected component in a random subgraph $H$ of $C_n$, where $H$ has the same vertex set as $C_n$, and its edge set is defined by independently selecting, with the same constant probability $p$, each of the edges of $C_n$. The random variable $M_n$ appears in various papers: the probability $\Pr(M_n=k)$ has been studied in \cite{PaQu84} for the case $k=1$ and $\lfloor n/2 \rfloor \leq k \leq n$; in \cite{KaQu93}, the case $k=2$ is treated and an asymptotic value for fixed $k$ as $n$ tends to infinity is given. Here, we give an asymptotic value of the value $\Pr(M_n \leq k)$ for all $k$ (varying from $1$ to $n$) when $p$ is fixed, as $n$ tends to infinity. As a consequence, we show that $M_n$ has an expected value asymptotic to $\log_{p^{-1}}(n)$, and that the variance of $M_n$ is asymptotically constant. The precise asymptotic expansion (given below) can in principle be extended to any order of accuracy. %We give also the following terms of the asymptotic value of the mean and the %variance of $M_n$ as $n$ tends to infinity. \section{The generating function model} The generating function approach is classical to study such problems, and as we shall see on this particular case, it is a natural tool to work with. First we describe an equivalent model which translates easily in terms of generating functions. Suppose we construct from $C_n$ what we call a random $n$-$ab$-cycle $C$, that is an $n$-cycle having two types of edges $a$ and $b$, such that every edge has a constant probability $p$ $(0From (\ref{e2}), a short calculation leads to $$G_k(z) = {qz (1-pz) \over 1-z+{q \over p}(pz)^{k+1}} A_k(pz) + B_k(pz),$$ where $$A_k(z)=\sum_{i=0}^{k-1} (i+1) z^i={1-(k+1)z^k + k z^{k+1} \over (1-z)^2} \quad \text{and} \quad B_k(z)=\sum_{i=0}^k z^i={1-z^{k+1} \over 1-z}.$$ \section{Analysis} This section is devoted to the asymptotic analysis of$[z^n]G_k(z)$, which is quite analogous to that of \cite{Knuth78}. For that purpose, we look for the dominant singularity of the function$G_k(z)$. This function being rational, its singularities are poles given by the roots of the polynomial $$P_k(z)=1-z+{q \over p} (pz)^{k+1}.$$ When$k$is large, the value$P_k(1)$is small, thus it is natural to expect that the polynomial$P_k(z)$has one root close to$1$, namely$1+\epsilon_k$. In fact, we have the following result. \begin{lemma} The polynomial$P_k(z)$has only one root in$|z|<1/p$. Calling this root$1+\epsilon_k$, we have $$\epsilon_k = q p^k ( 1 + O(kp^k)) \quad \text{as} \quad k \to \infty.$$ \end{lemma} \begin{pf} Because$|{q \over p} (pz)^{k+1}|$is strictly less than${q \over p}$for$|z|=\rho<1/p$, hence less than$|1-z|$on this circle, we can apply Rouch\'e's theorem (see \cite[p. 247]{Dieudonne68} or \cite[p. 116]{Titchmarsh39}) implying that$P_k(z)$has the same number of roots as the polynomial$1-z=P_k(z)-{q \over p} (pz)^{k+1}$in$|z| \leq \rho$. This being true for all$\rho<1/p$, we have proved the first part of the lemma. Now, noticing that$P_k(1+1/k)$is negative for large$k$, we find that$\epsilon_k=O(1/k)$. Since \begin{equation} \label{e3} \epsilon_k= q p^{k} (1+\epsilon_k)^{k+1} \end{equation} we have$\epsilon_k=O(p^k)$, thus by equality (\ref{e3}) again (bootstrapping''), we find $$\epsilon_k = q p^k ( 1 + O(kp^k))$$ and the lemma is proved. \end{pf} We return to the study of the asymptotic behavior of$\Pr(M_n \leq k)=[z^n] G_k(z)$. We fix$\rho$such that$1 < \rho < 1/p$. By the residue theorem, we have, when$k$is large enough, $$\Pr(M_n \leq k) + { c_k \over (1+\epsilon_k)^{n+1}} = {1\over 2i\pi} \int_{|z|=\rho} G_k(z)\,dz,$$ where$c_k$denotes the residue at$z=z_k=1+\epsilon_k$of the function$G_k(z)$. We have $$c_k= {q z_k ( 1- pz_k) \over -1 + (k+1) {q\over p} (pz_k)^k} A_k(pz_k),$$ and since we have$G_k(z)=O(1)$uniformly for$|z|=\rho$, the Cauchy integral shows that \begin{equation} \label{e4} \left|\Pr(M_n \leq k) + {c_k \over (1+\epsilon_k)^{n+1}}\right| = O( 1/\rho^n) \end{equation} uniformly in$k$. The explicit value we found for$A_k(z)$implies that $$A_k(pz_k)={1 \over (1-pz_k)^2} + O((pz_k)^k)= {1 \over (1-p)^2} + O(p^k),$$ because$z_k=(1+\epsilon_k)$with$\epsilon_k=O(p^k)$. >From this, we derive $$c_k={q (1-p) + O( p^k) \over -1 + O(k p^k)} \cdot \left( {1\over (1-p)^2} + O(p^k) \right) = - {q(1-p) \over (1-p)^2} + O(k p^k)= -1 + O(kp^k).$$ We do not present the following analysis in full detail, because it is quite technical and resembles the one in \cite{Knuth78}. When$k$is not too large'' and not too small'' with respect to$n$(that is something like$n^{-3} \leq p^k \leq \log n/n$) we have $$(1+\epsilon_k)^{-(n+1)}=\exp \Big[-(n+1) ( qp^k + O(kp^{2k}))\Big] = \exp (-nq p^k) \Big( 1 + O( p^k) + O( nkp^{2k} ) \Big)$$ $$= \exp(-nqp^k) \left( 1+ O \left( {\log^3 n\over n} \right) \right),$$ and thus for such values of$k$, we find by means of (\ref{e4}) \begin{equation} \label{e5} \Pr(M_n \leq k) = \exp( - nq p^k) + O \left( {\log^3 n \over n}\right). \end{equation} When$k$is not in this range, it is possible to show that this estimation holds with an error term of$O(n^{-2})$instead of$O(\log^3 n/n)$. \section{Mean and variance} Thanks to a classical Abel transformation (summation by parts), the mean value of the random variable$M_n$can be expressed as $$\E(M_n)=\sum_{k=0}^n \Pr(M_n > k)=\sum_{k=0}^n \Big( 1- \Pr(M_n \leq k) \Big).$$ >From (\ref{e5}) --- which holds for$O(\log n)$values of$k$--- and from our previous discussion, one can show $$\E(M_n)=\sum_{k=0}^n \left(1- e^{-nqp^k} \right) + O \left( {\log^4 n \over n} \right)$$ and since$\sum_{k>n} (1-e^{-nqp^k})$is exponentially small, we have $$\E(M_n)=E_n + O \left( {\log^4 n \over n} \right), \qquad E_n=\sum_{k=0}^{\infty} \left(1-e^{-nqp^k}\right).$$ The asymptotic expansion for$E_n$is a classical application of Mellin transform techniques (see \cite[p. 278, 292]{ViFl90}). The result is $$E_n=\log_{p^{-1}} (qn) - {\gamma \over \log p} + {1\over 2} + \delta (\log_{p^{-1}}(qn)) + O(n^{-1}),$$ where$\delta(x)$is a$1$-periodic function of$x$, $$\delta(x)={1 \over \log p} \sum_{k \not=0} \Gamma\left({2ik\pi \over \log p} \right) e^{ 2ik\pi x}.$$ The function$\delta$has mean value$0$and a small amplitude. Such a calculation can be achieved for the variance of$M_n$, by means of to the relation $$\E(M_n^2)=\sum_{k =0}^n (2k+1) \Big( 1- \Pr(M_n \leq k) \Big).$$ Finally we obtain the following result. \begin{theorem} The average order of the largest component in a random cycle$C_n$is $$\E(M_n)=\log_{p^{-1}} (qn) - {\gamma \over \log p} + {1\over 2} + \delta (\log_{p^{-1}}(qn)) + O \left( {\log^4 n \over n}\right);$$ its variance is given by $${\Var}(M_n)={\pi^2 \over 6 \log^2 p}+{1\over 12} + \delta_1(\log_{p^{-1}} (qn)) + O\left( \log^5 n \over n \right).$$ Here,$\delta(x)$is the function from above and$\delta_1(x)$is a 1-periodic function of$x$, with a small amplitude (namely something like$<10^{-10}$). \end{theorem} This time, the mean of$\delta_1$is {\em not} exactly zero since it contains the square of$\delta$. Roughly speaking, the theorem says that the variance is approximatively a constant. \section{Conclusion} We have seen that a problem on the random circle can be easily analyzed by an appropriate encoding as a formal language and by analysis of generating functions. For paths, this is more obvious and known in the literature (forbidden subwords'', \cite{GuOd81}), but with slight modifications it also works in this instance. This approach is powerful enough to deal with some other parameters of a similar type. For a systematic account we again refer to \cite{ViFl90}. Note that equation~(\ref{e5}) characterizes the shape of the limit distribution of the random variable$M_n$. %As we have seen on this particular example, %the generating function approach is very powerful. It %could be used to study other parameters in a very systematic way (see %\cite{Flajolet88a}). \bibliographystyle{acm} %\bibliography{/home/meursault/algo/gourdon/Texte/algo} \begin{thebibliography}{1} \bibitem{Dieudonne68} {\sc Dieudonn{\'e}, J.} \newblock {\em Calcul Infinit{\'e}simal}. \newblock Hermann, Paris, 1968. \bibitem{GuOd81} {\sc Guibas, L.~J., and Odlyzko, A.~M.} \newblock Periods in strings. \newblock {\em Journal of Combinatorial Theory, {\rm {S}eries {A}} 30\/} (1981), 19--42. \bibitem{KaQu93} {\sc Katona, G. O.~H., and Quintas, L.~V.} \newblock The largest component in a random subgraph of the$n$-cycle. \newblock {\em Discrete Mathematics 121\/} (1993). \bibitem{Knuth78} {\sc Knuth, D.~E.} \newblock The average time for carry propagation. \newblock {\em Indagationes Mathematicae 40\/} (1978), 238--242. \bibitem{PaQu84} {\sc Palka, Z., and Quintas, L.~V.} \newblock Random subgraphs of the$n\$-cycle. \newblock {\em Discrete Mathematics 52\/} (1984), 243--251. \bibitem{Titchmarsh39} {\sc Titchmarsh, E.~C.} \newblock {\em The Theory of Functions}, second~ed. \newblock Oxford University Press, 1939. \bibitem{ViFl90} {\sc Vitter, J.~S., and Flajolet, P.} \newblock Analysis of algorithms and data structures. \newblock In {\em Handbook of Theoretical Computer Science}, J.~van Leeuwen, Ed., vol.~A: Algorithms and Complexity. North Holland, 1990, ch.~9, pp.~431--524. \end{thebibliography} \end{document}