`Beta processes, stick-breaking, and power lawsTamara BroderickUC Berkeley [email protected]Michael I. JordanUC Berkeley [email protected]Jim PitmanUC Berkeley [email protected]8th Workshop on Bayesian NonparametricsIntroductionThe beta-Bernoulli process provides a Bayesian nonparametric prior for models involving collections of binary-valued features. A draw from the beta process provides an infinite collection of probabilities in the unit interval, and a draw from the Bernoulli process turns these into binary-valued features. Recent work has shown how to derive stick-breaking representations for the beta process, by analogy to Sethuraman's derivation of a stick-breaking representation for the Dirichlet process. We show how to derive one such stick-breaking representation directly from the characterization of the beta process as a completely random measure. We show that this approach motivates a three-parameter generalization of the beta process, and we study the power laws that can be obtained from this generalized beta process.Power Laws in Clustering and Featural ModelsClustering. Given a number N of draws from a discrete random probability measure, let (N1, N2, . . .) denote the sequence of counts associated with the unique values obtained among the N draws, where we view these unique values as &quot;clusters.&quot; Let the number of clusters drawn exactly j times be KN,j , and the total number of clusters be KN . q1 q2 q3 q4 q5 qiiBeta Process Power LawsTypes I and II. N an integer makes finding power laws in KN , KN,j difficult. Poissonizing KN and KN,j replaces them with continuous versions K(t) and Kj (t) such that a.s. K(N )  KN and a.s. Kj (N )  KN,j .KN,j :=i=11(Ni = j), KN =i=11(Ni &gt; 0).There are two types of power-law behavior that a clustering model might exhibit. First, there is what we call a Type I power law, reminiscent of Heaps' law:012345tType I Power Law KN  bN a as N   for some constants b &gt; 0, a  (0, 1).Next, there is a law reminiscent of Zipf's law:a(j-a) bN a j!a.s.The first five sets of points show Poisson processes with rates q1, . . . , q5. The bottom set of points collects the points above and is a Poisson process with rate i qi. Kj (t) will be the number of these Poisson processes with j points in the interval [0, t], and K(t) is the sum over j. In this example, K(1) = 2, K(4) = 5, and K2(4) = 1. Lemma 1. For feature frequencies from a Poisson process with infinite rate measure, as N  , |EKN - EK(N )|  0, |EKN,j - EKj (N )|  0. Lemma 2. For feature frequencies from a Poisson process, as N  a.s. a.s. , KN  EKN , k&lt;j KN,k  k&lt;j EKN,j . Lemma 3. For feature frequencies from a Poisson process with mean measure , l a slowly varying function, and a  (0, 1): a 1-a x (dx)  x l(1/x) (x  0) implies 1-a 0 al(t), EK (t)  a(j - a) tal(t) (t  ). EK(t)  (1 - a)t j j! The lemmas above combine to give the following proposition when applied to the three-parameter beta process.1Beta and Bernoulli Processes(a) Beta process rate A completely random measure (CRM) is constructed from a Poisson process  = {(i, Ui)}i as  i=1 Uii . The beta process , B  BP(, B0), is the CRM with rate measure BP(d, du) = u-1(1 - u)-1 du · B0(d), u  [0, 1],    When B1 is discrete with atoms in (0, 1], i.e. B1 =  Uii and i=1 Ui  (0, 1], the Bernoulli process, Y  BeP (B1) is of the form Y =  Wii i=1 with Wi  Bern(Ui). Form a matrix Z  {0, 1}N ×K as follows. Draw B  BP(, B0) and then Yn|N  BeP (B). In n=1 this case, Y places unit masses at some finite subset {k }K of the k=1 atoms of B. Write Zn,k = Yn(k ). Then we say Z  BP-BeP(N, , B0) is drawn from a beta-Bernoulli process.iid indepKN,ja.s.N  . It can be shown that these two power laws imply each other and the constants are the same in both cases . Treating N as constant in the last equation implies what we call a Type II power law; here, c = abN a.Type II Power Law KN,j  cj -1-a as j   for some constants c &gt; 0, a  (0, 1).Featural. In the featural case, we use the same notation, but now we note that each draw can place non-zero mass on more than a single atom, and thus the counts Ni no longer sum to N in general. We can still talk about Type I and II power laws with the same interpretation as above. In the featural case, however, it is also possible to consider a third type of power law. If we let kn denote the number of features present in the nth draw, we say that kn shows Type III power law behavior ifa.s.(b) Beta process draw1Proposition 2 For the beta process in Eq. (2) and C := KN  CNType III.a.s. d a.s. d(j-d) , KN,j  j!(1-d) CN d (1+) d(+d) ,0.5(N  ).0(c) Bernoulli draws50 40 30 20 10 0Type III Power Law P(kn &gt; M )  rM -s as M   for some constants r, s &gt; 0..Proposition 3 For feature frequencies such that qi  0 and i qi &lt; , Zi independent Bernoulli random variables with success probability qi, and large enough N , P[ i Zi &gt; N ] &lt; eN -  N N -N ..(d) Feature matrixBeta Process Stick-breakingIn the clustering case, the Dirichlet process exhibits neither Type I nor Type II power law behavior. However, the Pitman-Yor process yields both kinds of power law. As the one-parameter Dirichlet process generalizes to the two-parameter Pitman-Yor process with an extra parameter in the stick fractions Beta(1 - d,  + id), so too we might consider generalizing the stick-breaking representation of the beta process in  with an extra parameter: B= Ci  Pois(),iid (i) i-1 (l) Ci  i=1 j=1 Vi,j l=1 (1 - Vi,j )i,j , iid 1 (l) indep Vi,j  Beta(1 - d,  + id), i,j   B0.Experimental ResultsTo illustrate our results in the context of a concrete application, we study a discrete factor analysis model previously considered by . The model is of the form X = Z + E, where X  RN ×P is the data and E  RN ×P is an error matrix. The matrix   RK×P is a matrix of factors, and Z  RN ×K is a binary matrix of factor loadings. The K rows of  comprise a potentially infinite collection of factors. The matrix Z is drawn from a beta-Bernoulli process of size N . (a) Number of factorsTwo-parameter model 50 40 50 40 Autocorrelation Three-parameter model(1)Dirichlet Process Stick-breaking1 V1To derive the three-parameter stick-breaking representation, we start by generalizing the beta process to a three-parameter version. (1 + ) BP(d, du) = u-1-d(1 - u)+d-1 du B0(d). (2) (1 - d)( + d)(c) # factors ACFTwo-parameter model 1 K  0.530 K30 K0V2(1 - V1) 1 - V1Proposition 1 B is constructed according to the stick-breaking process in Eq. (1) if and only if B  BP(, d, B0).2020-0.5 10 10 -1 0 500 1000 1500 Iteration number 2000 1 0 50 Lag 10000500 1000 1500 Iteration number20000Three-parameter model K  d 0.5 Autocorrelation0Stick-breaking is the process of recursively breaking off random fractions of the unit interval (far left). First, we break off a random fraction V1; the remaining stick has length 1 - V1 (middle left). Next, we break off a random fraction V2 of the remaining stick, i.e. a fragment of size V2(1 - V1); the remaining stick has length (1 - V1)(1 - V2). This process generates stick fragments V1, V2(1 - V1), . . . , Vi j&lt;i(1 - Vj ), . . .. These fragments form a random partition of the unit interval (far right).  showed that the Dirichlet process (DP) arises from the special case in which Vi are independent draws from the Beta(1, ) distribution. So we can write a draw G  DP(, G0) as:i-1 G = l=1 (1 - Vl ) i iid iid Vi  Beta(1, ), i  G0,  i=1 Vi(b) Stick parameters Proof. As a Poisson number of conditionally independent and identically distributed points,     i-1 i-1   (l)  i,1, V (i) (1 - V (l)) , . . . , i,C , V (i) (1 - Vi,C ) , Pi := i,1 i,1 i,Ci i i  l=1 l=1Two-parameter model 100 Three-parameter model 100 Three-parameter model 10 50 50d 0.5-0.5 0 0 001000 2000 Iteration number01000 2000 Iteration number01000 2000 Iteration number-1050 Lag100Pi is a Poisson point process. So P := process. i=1 Pi is a Poisson point(d) Top featuresTwo-parameter model~ ~ Let  be the rate measure of P . Then both (A × A) = B0(A)µ(A) ~ ~ and BP(A × A) = B0(A)µBP(A) for some measures µ and µBP. So it is enough to show µ = µBP. ~ Taking a size-biased pick U from the vector of stick-breaking fre~ ~ quencies, we can find both P(U  du) = uµ(du) and U  Beta(1 - d,  + d). Therefore, ~ µ(du) = u-1P(U  du) = u-1(1 + ) (1 - d)( + d) u-d(1 - u)+d-1 = µBP(du)Three-parameter modelReferences A. Gnedin, B. Hansen, and J. Pitman. Notes on the occupancy problem with infinitely many boxes: General asymptotics and power laws. Probability Surveys, 4:146­171, 2007.  J. Paisley, A. Zaas, C. W. Woods, G. S. Ginsburg, and L. Carin. A stick-breaking construction of the beta process. In International Conference on Machine Learning, Haifa, Israel, 2010.  J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4(2):639­650, 1994.  R. Thibaux and M. I. Jordan. Hierarchical beta processes and the Indian buffet process. In International Conference on Artificial Intelligence and Statistics, San Juan, Puerto Rico, 2007.`

1 pages

#### Report File (DMCA)

Our content is added by our users. We aim to remove reported files within 1 working day. Please use this link to notify us:

Report this file as copyright or inappropriate

1318010

Notice: fwrite(): send of 199 bytes failed with errno=104 Connection reset by peer in /home/readbag.com/web/sphinxapi.php on line 531