`PRINCIPAL CURVESbyTrevor Hastie Werner StuetzleTECHNICAL REPORT No. 151 Revised October 1988Department of Statistics, GN-22 University of Washington Seattle, Washington 98195 USACURVESAT&amp;Murray andLV.,.....,~,'&quot; StuetzlelImNew Jersey, 07974IJe.va:rtl11.e1:lt of Statistics University of Washington Seattle, 98195AUTHORS' FOOTNOTE Trevor Hastie is a Member of the Technical Staff at AT&amp;: T Bell Laboratories, Murray Hill, New Jersey 07974. Werner Stuetzle is Associate Professor, Department of Statistics, University of Washington, Seattle, 98195. This work was developed for the most part at Stanford University with partial support from the Department of Energy under contracts DE-AC03-76SF and DE-AT03-81-ER10843, and the Office of Naval Research contract ONR N00014-81-K-0340, and by the U.S. Army ResearchAnareas Buja, Tom Duchamp, lain Johnstone, and Larry Shepp for their theoretical support; Robert 'I'ibshirani, Brad Efron and Jerry Friedman for many helpful discussions and suggestions; Horst Friedsamand Will Oren for supplying the SLC example, and their help with the analysis; both referees for their constructive criticism of earlier drafts.AbstractPrincipal curves are smooth one dimensional curves that pass through the middle of a p dimensional data set, providing a non-linear summary of the data. They are and their shape is suggested by the data. The algorithm for constructing principal curves starts with some prior summary such as the usual principal component line. The curve in each successive iteration is a smooth or local average of the p dimensional points, where the definition of local is based on the distance in arc length of the projections of the points onto the curve found in the previous iteration. In this paper we define principal curves, give an algorithm for their construction, present some theoretical results, and compare our procedure to other generalizations of principal components. Two applications illustrate the use of principal curves. In the first application, two different assays for gold content in a number of samples of computer chip waste appear to show some systematic differences which are blurred by measurement error. The classical approach using linear errors in variables regression can detect systematic linear differences, but is not able to account for nonlinearities. 'When we replace the first linear principle component by a principal curve, a local &quot;bump&quot; is revealed, and we use bootstrapping to verify its presence. As a second example we describe how the principal curve procedure was used to align the magnets of the Stanford Linear Collider. The collider uses around 950 magnets in a roughly circular arrangement tobend-electronand positron beams and bring them to collision. Mterconstructionitwas found that some of the magnets had ended up significantly out of place. As a result the beams had to be bent too sharply and could not be focused. The engineers realized that the magnets did not have to be moved to their originally planned locations but rather to a sufficiently smooth arc through the &quot;middle&quot; of the existing positions. Tills arc was found using the principal curve procedure.Keywords: Principal Components, Smoother, Non-parametric, Non-linear, L.i~~&quot;&quot;'&quot; in variables, Symmetric, Self-Consistency.1. IntroductionConsider a data set consisting of17.observations on two variables, x and y.can represent the17.points in a scatterplot, as in figure La, It is natural to try and summarize the pattern exhibited by the points in the scatterplot. The form of summary we choose depends on the goal of our analysis. A trivial summary is the mean vector which simply locates the center of the cloud but conveys no information about the joint behavior of the two variables. It is often sensible to treat one of the variables as a response variable, and the other as an explanatory variable. The aim of the analysis is then to seek a rule for predicting the response using the value of theexpla.natory variable. Standard linear re~ressionproduces a linear ptediction rule. expectation OfYiis mOdeled as a linear function of x and is usually estimated by least squares. This procedure is equivalent to finding the line that minimizes the sum of vertical squared deviations, as depicted in figure Ia,In many situations we do not have a preferred variable that we wish to label response, but wouldstill like to summarize the joint behavior of x and y. The dashed line in figure La shows what happens if we used x as the response. So simply assigning the role of response to one of the variables could lead to a poor summary. An obvious alternative is to summarize the data by a straight line that treats the two variables symmetrica.ilY' The first principal component line in. figure lb does just this by minirxrizin.g.theClrthogonaldeviations. Linear regression has been generalized to include nonlinear functions of z. This has been achieved using predefined parametric functions, and with the reduced cost and increased speed of computing nonparametric scatterplot smoothers have gained popularity. These include kernel smoothers (Watson 1964), nearest neighbor (Cleveland 1979), smoothers (Silverman 1985 and minimize the vertical it is foundgeneral sl:attel'Pl()t smoothens produce a curvedeviations as depicted in figure Ic, subject to some form of smoothness constraint. The non-parametric versions referred to above allow the data to dictate the form of the non-linear aepenoencypaper we COl1S111er similar generauzanens for a str;3.igllt curves we use a SILlOC&gt;tn curve; pass throeghsymmernc srtuanon. Instead of summanzcurve we treat two¥&lt;LJr14UU::;:'waetner orsituation is depictedcurves,3linear principal components, focus on the orthogonal or shortest distance to This means thatpoints. vVe formallydefine principal curves to be those smooth curves that are self consistent for a distribution or data set. we pick any point on the curve, collect all the data that &quot;project&quot; onto this point and average them, then this average coincides with the point on the curve. The algorithm for finding principal curves is equally intuitive. Starting with any smooth curve (usually the largest principal component), it checks if this curve is self consistent by projecting and averaging. If it is not, the procedure is repeated, using the new curve obtained by averaging as a starting guess. This is iterated until (hopefully) convergence. The largest principal component line plays roles other than that of a data summary: · In errors in variables regression it is assumed that.tll.ere is randomness in the predictors as well-as the response. This can occur in practice when the predictors are measurements of some underlying variables, and there is error in the measurements. It also occurs in observational studies where neither variable is fixed by design. The errors in variables regression technique models the expectation of y as a linear function of the systematic component of z , In the case of a single predictor, the model is estimated by the principal component line. This is also the total least squares method of Golub and van Loan (1979). More details are given in an example in section 8. · Qftenwe want to replace a nwn~erof highlY correlated vari.a,bles by a single varia.ble, such as a normalized linear combination of the original set. The first principal component is the normalized linear combination with the largest variance.I· In factor analysis we model the systematic component of the data by linear functions of a smallset of unobservable variables called factors. Often the models are estimated using linear principal one use the largest principal component. Many variations of this model have appeared in the literature. In all the above situations the model can be written as::Ci=random component.assume(1)UO+aAi isthesquares estimate is::Ciprincipal component.=4+ ei.This might then be a factor analysis or structural model, and for two variables and some restrictions an errors in variables regression model. In the same spirit as above, where we used the first linear principal component to estimate (1), the techniques described in this article can be used to estimate the systematic component in (2). This paper focuses on the definition of principal curves, and on an algorithm for finding them. We also present some theoretical results, although lots of open questions remain.2. The principal curves of a probability distributionWe first give a brief introduction to one dimensional curves, and then define the principal curves of smooth probability distributiQns in p space. Subsequent sections give algQrithms for finding the curves, both for distributions and finite realizations. This is analogous to motivating a seatterplot smoother, such as a moving average or kernel smoother, as an estimator for the conditional expectation of the underlying distribution. We also discuss briefly an alternative approach via regularization using smoothing splines.2.1. One dimensional curvesA one dimensional curve in p dimensional space is a vector f(&gt;..) of p functions of a single variable A. These functions are called the coordinate functions, and A provides an ordering along the curve.If the coordinate functions are .smooth, then f is by definition a smooth curve. We can apply anymonotone transformation to &gt;.., and by modifying the coordinate functions appropriately the curve remains unchanged. The parametrization, however, is different. There is a natural parametrization for curves in terms of the arc-length. The arc-length of a curve1= Iff from &gt;&quot;0 to &gt;&quot;1 is given by1'\0'\ 1IIf'(z)1I dz:IIf'(z)1I == 1 then 1 = &gt;&quot;1 - &gt;&quot;0. This is a rather desirable situation, since if all the coordinate variablesThe vector 1'(&gt;&quot;) is tangent to the curve at A and is sometimes called the velocity vector at A. Aare in the same units of measurement, then&gt;&quot; is also in those units.curve curve== 1 isaspeedcurve. 'YVe can our curve, smootnnessany smooth of smoothness relatesmore naturally totranslates directly into amoovn visual appearance ofpoint set {f(A), &gt;.. E A} (absence of sharp5If 11 is a unit vector, then f(&gt;..) =unique -i&quot;'( &gt;..) = u we will always assume that {U,11}110+ &gt;&quot;11is a unit speed straight line. This parametrization is not In the following+ a11 + &gt;&quot;11 is another unit speed parametrization for the same line.= O.fill II f&quot; II is called the principal normalThe vector fll(&gt;..) is called the acceleration of the curve at &gt;.., and for a unit speed curve, it is easy to check that it is orthogonal to the tangent vector. In this case to the curve at &gt;... The two vectors f'(&gt;,,) and f&quot;(&gt;..) span a plane. There is a unique unit speed circle in the plane that goes through f(&gt;..) and has the same velocity and acceleration at f(&gt;..) as the curve itself. The radius rf(A) of this circle is called the radius of curvature of the curvef at A; it is easy to see that rfC&gt;&quot;) = 1/1If&quot;(A)II. The center Cf(A) of the circle is called the center of curvature of f at &gt;..Thorpe (1979) gives a clear introduction to these and related ideas in differential geometry.II Insert figure 3 around here2.2. Definition of principal curvesIIDenote by X a random vector in lRP with density h and finite second moments. Without loss of generality assume E(X)= O. Let fdenote a smooth (COO) unit speed curve in lRP parametrized overA ~ JRl, a dosed (possibly infinite) interval, which does not intersect itself (AI and has finite length inside any finite ball in lRl'. We define the projection index Af: lRl'&gt;&quot;f(~)-+#A2 =&gt; f(&gt;&quot;l)# f(A2»,JRl by= SUp{A: II~ ).f(&gt;&quot;)11= infll~ Iif(,u)II}·~.(3)If there are several suchThe projection index Af(3;) of.~ is the value.of &gt;.. for which f(A) is closest tovalues, we pick the largest one. We show in the appendix that Af( e ) is well defined and measurable.DefinitionThe curve f is called self-consistent or a principal curve of h if E(X IAf(X) = A) = f(&gt;..) for a.e. A. Figure 3 illustrates particular parameter curve. If f( A) is principal curve. In the we our algorithms to come; we intuitive motivation Deluna our dennitron of a prmcrpai curve. For any&gt;.. we couect allobservations, and ifhave f(&gt;..) as holds forclosest point onfis called aa aeignbcracoconcurve.~v&quot;.r~&lt;rin(\$'to estimate conditionai expectations.6The definition of principal curves immediatelyrise to a number of interesting questions: for principal curves are there for awhat kinds of distributions do principal curves exist, how manygiven distribution, and what are their properties? We are unable to answer those questions in general. vVe can, however, show that the definition is not vacuous curves. It is easy to check that for ellipsoidal distributions the principal components are principal curves. For a spherically symmetric distribution, any line through the mean vector is a principal curve. For any two dimensional spherically symmetric distribution, a circle with center at the origin and radius E there are densities which do have principalIIXIIis a principal curve. (Strictly speaking, a circle does not fit our definition, because it does intersect itself. See,however, our note at the beginning of the appendix.) We show in the appendix that for compact A it is always possible to construct densities with carrier in a thin tube aroundf, which have f as a principal curve.What about data generated from the model X= f(A) + c, withf smooth and E(c)= O.Isf aprincipal curve for this distribution? The answer seems to be no in general. We will show in section 7 in the more restrictive setting of data scattered around the arc of a circle that the mean of the conditional distribution of :v, given A(:V) = Ao, lies outside the circle of curvature at Ao; this implies that We have some evidence that this bias is small, andf cannotbe a principal curve. So in this situation the principal curve will be biased for the functional model. decreases to zero as the variance of the errors gets small relative to the radius of curvature. We discuss this bias as well as estimation bias (which fortunately appears to operate in the opposite direction) in section 7.3. Connections between principal curves and principal componentsIn this section we establish some facts that make principal curves appear as a reasonable generalization of linear principal components. Proposition 1: If a straight line leA) Proof: The has to passttll~ou,~h= Uo + AVo is self consistent, then it is a principal component.v.,.&amp;u&lt;,thebecause0= E(X) =(X IAf(X) = A)=E,\(Uo+ =Uo+we assumedUQ.Lremains to7the covariance of X:Evo = E(XXt)vo= E&quot;E(XXtvo /Aj(X) = A) = E&quot;E(XXtvo Ixtvo = A) = E&quot;E(AX Ixtvo = A)= E&quot; A2 VOoPrincipal components need not be self consistent in the sense of the definition; however, they are self consistent with respect to linear regression. Proposition 2: Suppose leA) is a straight line, and we linearly regress the p components Xj of X on the projection Ae(X) resulting in linear functions fj(A). Then1£01= liff Vo is an eigenvector of E and= o.The proof of this requires only elementary linear algebra and is omitted.3.1. A distance property of principal curves An important property of principal components is that they are critical points of the distance from the observations. Let d(3::,/) denote the usual euclidean distance from a point 3:: to its &quot;projection&quot; on I: d(3::,/)=113:: - I(Aj(3::»II, and defineConsider a straight line leA) = 1£+A11. The distance D 2(h,/) in this case may be regarded as a function of 1.£ and 11: D2(h,l) == b 2(h , 1£, v ). It is well known that gradu,vD2(h,1£,v) 0 iff 1£ 0 and v is an==eigenvector of E, i.e., the line l is a principal component line. vVe now restate this fact in a variational setting and extend it to principal curves. Let class of curves parametrized over A. For 9 E9 denote a9 define It = I+ tg.This creates a perturbed version ofI·Definition curveIisa entreat point of the distance functionvariationsthe class9 iff=89EProposition 3: Let (h denote the class of straight lines g(A)= a+Ab. A straight line lO(A) = aO+Abois a critical point of the distance function for variations in (h iff vo is an eigenvector of Cov(X) andao = O.The proof involves straightforward linear algebra and is omitted. A result analogous to proposition 3 holds for principal curves. Proposition 4: Let (iB denote the class of smooth (C CO ) curves parametrized over A, with and IIg'lI .\$ 1. Then perturbations in (i B. A proof of proposition 4 is given in the appendix, The condition that Ilg11 is bounded guarantees thatIIgli.\$ 11 is a principal curve of h iff 1 is a critical point of the distance function forit lies in a thin tube around 1 and that the tubes shrink uniformly, as t -;. O. The boundedness ofIIg'llensures that for t small enough,I: is well behaved and in particular is bounded away from 0 forAftt &lt; 1. Both conditions together guarantee that, for small enough t,4. An algorithm for finding principal curvesis well defined.In analogy to linear principal component analysis, we are particularly interested in finding smoothcurves corresponding to local minima of the distance function. Our strategy is to start with a smooth curve such as the largest linearprincipal component, and check if it is a principal curve. This involves projecting the data onto the curve, and then evaluating their expectation conditional on where they project. Either this conditional expectation coincides with the curve, or we get a new curve as a byproduct. We then check if the new curve is self consistent, and so on. If the self consistency condition is met, we have found a principal curve. It is easy to show that both the operations of projection and conditional expectation reduce the expected distance from the points to the curve.4.1. The principal curve algorithmThe above discussion motivates the following iterative algorithm. initialization: Set 1(0)(&gt;.) = z+a&gt;. where a is the the first linear principal component of h. Set A(O)(::e) = &gt;'j&lt;O)(::e) repeat: over iteration counter j1)I(j)(·) =.u'CJlll'CIAj&lt;i-1) (X) = .)= &gt;'f(j)(::e) V::e Esol(j)2)is=some tnresncta.9There are potential problems with the above algorithm. While principal curves are by definition differentiable, there is no guarantee that the algorithm have curves produced by the conditional expectation step ofproperty. Discontinuities can certainly occur at the endpoints of a curve.problem is illustrated in figure 4, where the expected values of the observations projecting onto f( Amin) and f( A =:) are disjoint from the new curve. mIf this situation occurs, we have have to join f(Amin) and f{Ama:r;) to the rest of the curve inadifferentiable fashion. Inl1ghtor the above, we cannot prove that the algorithm converges. All we have is some evidence in its favor:· By definition principal curves are fixed points of the algorithm.· Assuming that each iteration is well defined and produces a differentiable curve, we can show that the expected distance D2(h,f(j) converges.· If the conditional expectation operation in the principal curve algorithm is replaced by fitting aleast squares straight line, then the procedure converges to the largest principal component.5. Principal curves for data setsSo far, we have considered principal curves of a multivariate probability distribution. In reality, however, we always work with finite multivariate data sets. Suppose then that X is an nXp matrix ofn obsersatiens on p variables. 'Ve regard the data set as a sample from an underlying probabilitydistribution. A curve f(A) is represented by n tuples (Ai,fi), joined up in increasing order of A to form a polygon. Clearly the geometric shape of the polygon depends only on the order, not on the actual values of the Ai. 'Ve will assume that the tuples are sorted Al = 0 and Ai is the increasing order of A, and we along the from use the arc-length parameterization, forfl to fi.isdistributlen case,algorithm alternates between a projectionan expectation as a starting curve;absence of to bemforrnation we usepr()jei:ti()ns ofn cbservations ontoissome taresnoldarssance is estimated.addmgupsquared distances of the points in the sample to their closest points on the current curve. We are unable to prove that algorithm converges, nor that each step guarantees a decrease in the criterion. Inpractice we have had no convergence problems with more than 40 real and simulated examples.5.1. The projection stepFor fixed fW(.) we wish to find for each Zi in the sample the value Ai Define= Af(j)(Zi).dik as the distance between Zi and its closest point on the line segment joining each pair(f(i)(A~»), fW(A~ll))' Corresponding to each dik is a value Aik E [A~), A~lIJ. We then set Ai to theAik corresponding to the smallest value of dik:(4)Corresponding to each Ai is an interpolated replace Ai by the arc length from fii) to fi(i).5.2. The conditional expectation step-Scatterplot smoothingf.ei ); using thesevalues to represent the curve, weThe goal of this step is to estimate f(i+1)(A)= E(XIAf(j)= A).We restrict ourselves to estimatingthis quantity at only n values of A, namely AI,'&quot; ,An found in the projection step above. A natural way of estimating E(X I Af(j) = Ai) would be to gatherall the observations that projectontoj(i) at Ai, and find their mean. Unfortunately there will in general be ocl.y one such observation, namely Zi. It is at this stage that we introduce the scatierplot smoother, a fundamental building block in the principal curve procedure for finite data sets. We estimate the conditional expectation at Ai by averaging all the observations Zk in the sample for which Ak is close to Ai. As long as these observations are close enough and the underlying conditional expectation is smooth, the bias introduced in approximating the conditional expectatioIl,willbe small. On the other hand, the variance of include more observations in the neighborhood.5.2.1 Scatterplot smoothingestimate decreases as weLocal averaging is not a new to estimate areA&lt;::J.ll&lt;::J.In the more common regression context, seatterplot smoothers areaveragsag, Some commontyCleveland (1979). These smooth a one cimen-smoosnerslocally weigated runnmg response against tionIn our case,VdJ,&quot;li::l.Ult:ltosmeetnea is p dimeasionat; so wecurrent rmptementancn any scatterptot11;:)Hl,VVUI':atoexperience with all those mentioned above, although all the examples were fitted using locally weighted running lines. We give a brief description; for details see Cleveland (1979).5.2.2 Locally weighted running lines smootherConsider the estimation of E (xIA), Le.a single coordinate function, based on a sample of pairs(At,XI),&quot;', (An, x n ), and assume the Ai are ordered. To estimate E(x IA), the smoother fits a straight line to the wn observations {Xj} closest in Aj to Ai. The estimate is taken to be the fitted value of the line at Ai. The fraction w of points in the neighborhood is called the span. In fitting the line, weighted least squares regression is used, The weights are derived from a symmetric kernel centered at Ai that dies smoothly to zero within the neighborhood. Specifically, if hi is the distance to the wn'th nearest neighbor, then the pointss, in the neighborhood get weights Wij = (1-I '\i.,xi r)3.5.3. A demonstration of the algorithmTo illustrate the principal curve procedure, we generated a set of 100 data points from a circle in two dimensions with independent Gaussian errors in both coordinates:X ( Xzl) =(5sin(A)) 5 cos(A)+(el) ez(5)where A is uniformly distributed on [0,211&quot;) andel and ez are independent N(o, 1).II Insert figures5a-d about hereIIFigures 5a-d show the data, the circle (dashed line) and the estimated curve (solid line) for selected steps of the iteration. The starting curve-is the fitstprincipal component, in figure 5a. Any line through the origin is a principal curve for the population model (5), but in general this will not be the case for data. Here the algorithm converges to an estimate for another population principal curve, the circle.example is admittedly artificial, but tough job.n&quot;D&lt;:&quot;,nf'&lt;:principal curve procedure with a particularlystarting guess is wholly Inappropriateprojection of the points onto this solution curve. is a new aredo not .... &quot;&quot;,..-1&quot; represeat12found points alongpoints onto the new curve. It can be seen that the new curve can beof the projeeteodifferent to the ordering along the previous curve. This enables parametrized as a function of thethe successive curves to bend to shapes that could not principal component. 5.4. Span selection for the scatterplot smootherThe crucial parameter of any local averaging smoother is the size of the neighborhood over which averaging takes place. We discuss here the choice of the span w for the locally weighted running line smoother.A fixed span .&lt;::trlltefl11.The common first guess for.f is will not be a function of the arc lengthInofth.e i:nt¢restingsituations, the final curveinitfal-curve (see:fi.gure 6). It is reached by successivelybending the original curve. We have found that if the initial span of the smoother is too small, thecurve may bend too fast, and follow the data too closely. Our most successful strategy has been to initially use a large span, and then to decrease it gradually. In particular, we start with a span of O.6n observations in each neighborhood, and let the algorithm converge (according to the criterion outlined above). We then drop the span to O.5n and iterate till convergence. Finally the same is done at OAn by which time theprocedure.h.as found fou:nd usingth.isstrategy. Spans of this magnitude have frequently been found appropriate for scatterplot smoothing in the regression context. In some applications, especially the two dimensional ones, we can plot the curve and the points and select a span that seems appropriate for the data. Other applications, such as the The curves ·in figure 5 wherecollider ring example in section 8, haveA utomatic span selection by cross-validation..c&gt;.&quot;&quot;U.'''''O the peoeedure has converged to aconsistent (with respect to the smoother) curve for the to density of data. Asspan last used. we reducedo not wantfitted curve to be too wigglyaveraee dlstaIICe decreases andlllaJ'U.,LILl;5curve follows fideUty todata more rk,;;plv wehuman eye is skllled att17adeotts between smootnnessthis jUd.gm.ent automatrcany. A similar situation arises in non-parametric regression, wea response ya covariatesmootnness juagmene autoltna.tlc:alIlY is to ensureproceeds as toU.OWS.13response Yi inusing a smooth estimated from the and define an approximately unbiased estimate ofwith the ith observationletY(i)be this predicted toocross- validated residual sum ofCVRSS =-Y(i)Y~' CVRSS/n isexpected squared prediction error. H the spanthe curve will miss features in the data, and the bias component of the prediction error will dominate. H the span is too small, the curve begins to fit the noise in the data, and the variance component of the prediction error will increase. We pick the span that corresponds to the minimum CVRSS.In the principal curve algorithm, we can use the same procedure for estimating the spans for eachcoordinate function separately, as a final smoothing step. Since most smoothers have this feature built in as an option, cross- validation in this manneris trivial to implement.Figure 7(a) shows the final curve after one more smoothing step, using cross- validation to select the span-nothing much has changed. Figure 7b,on the other .hand, shows what happens if we continue iterating with the cross- validated smoothers, The spans get successively smaller, until the curve almost interpolates the data. In some situations, such as the SLC example in section 8, this may be exactly what we want. It is unlikely, however, that In this eventcross.validationwouIdbe used.to pick the span' A possible explanation for this behavior is that the/errors in the co--ordiIl.&lt;l.tefunctions are auto--correlated; cross- validation in this situation tends to pick spans that are too small (Hart and Wehrly, 1986).5.5. Principal curves and splinesOur algorithm for estimating principal curves from samples is motivated by the algorithm for prmcrpat curves. This is analogous to the motivation for kernel smoothers and locally weighted running lines smoothers. They a conditional expectation, a population quantity that minimizes a population criterion. criterion. hand, cubic smooenrng penalty (smootlling parameter] p. minimizesXi -They do not minimize a data 5moot;hiIltg spnnes, on a set11.·.· , (An,D 2 (J) =I)i=ln(6)Ei= 1&quot;&quot;,11.14so thatDzef,&gt;')is over= tll~i i=1f(&gt;'i)lI Z + pr lIf (&gt;.)lizo; l«ll(7)fwithIiE S2[O,Notice we have confined the functions to the unit interval, Intuitively, for a fixed smoothing parameter p,and thus do not use the unit speedfunctions defined over an arbitrarily large interval can satisfy the second-derivative smoothness criterion and visit every point. It is easy to make this argument rigorous. We now apply our alternating algorithm to this criterion · given min.imi;z;ing DZ(f, &gt;.) res,ealE!d to lie · Given.&gt;'i, only involves the first part of (7) and is our usual pr()je&lt;:ticlnup. into p expressions of the form (6), one for each coordinate function.These are optimized by smoothing the p coordinates against &gt;'i using a cubic spline smoother with parameter p. The usual penalized least squares arguments show that if a minimum exists, it must be a cubic spline in each coordinate. We make no claims about its existence, or about global convergence properties of this algorithm. An advant~geofthe.sPMIle smoothin~)alg9rithm Is.that it can be c0ntputed in O(n) operations, andth&quot;Us is.a.strOIlg competitor for the kernel typesmoothers which take O(n Z) unless approximations are used. Although it is difficult to guess the smoothing parameter p, alternative methods, such as using the approximate degrees of freedom (see Cleveland, 1979) are available for assessing the amount of smoothing and thus selecting the parameter. Our current implementation of the algorithmaof smooenmg splinesQ.1111.cUlt to distinguish their performance in practice.5.6. Further illustrations and discussion of the algorithmThe procedure workedon theexample, surprjsinj!Z;, aton a number of otherartitir:iaJexamples. setCOJ£lsilier aa aAprmcrpat curve, as are observations than the circle, however15'When the principal curve procedure is startedthe circle, it does not move much, except atthe endpoints, as depicted in figure 9a. This is a consequence of the smoothers endpoint behavior in that it is not constrained to be periodic. Figure 9b shows what happens when we use a periodic version of the smoother. However, starting from the linear principal component (where theoretically it should stay), the algorithm iterates to a curve that, apart from the endpoints, appears to be attempting to model the circle (see figure 9c; this behavior occurred repeatedly over a number of simulations of thisjexample. The, ends of the curve are &quot;stuck&quot; and further iterations do not free them. The example illustrates the fact that the algorithm will tend to find curves that are minima of the distance function. This is not surprising; after all, the principal curve algorithm is a generalization of the power method for taken. Interestingly, the algorithm using the periodic smoother and starting from the linear principal component finds the identical circle to that in figure 9b. 6. Bias considerations: model and estimation bias Model bias occurs when the data are of the form z = f(&gt;..)+e, and we wish to recover f(&gt;..). In general, if f(&gt;..)hascu.ryature,itwilLnot be a. principalcurveforthedistriblltion the principal curve procedure can only find a biased version of f(&gt;&quot;), even if it As a consequence at the generating which 'e..&quot;thibits behavior. The power method will, tend to converge to an eigenvector for the largest eigenvalue, unless special precautions arecurve. This bias goes to zero with the ratio of the noise variance to the radius of curvature. Estimation bias occurs because we use scatterplot smoothers to estimate conditional expectations. The bias is introduced by averaging over neighborhoods, which usua.lly has a flattening effect. We demonstrate this bias with a simple example.6.1. A simple model for investigating biascurve f is an arc of a from a bivariateclosest to(Jwith center more mass isp,z are generatedmean cnosen uniformly onsrtuanoa. Il[lttutively: it seemscentered atarctoa point, these~:nu~nt shrinksdown toto the curve at thatbut there is always moremass outsidecircle thanimplies that the conditional expectation lies outside the circle.We can prove (Hastie, 1984) thatwherere = r*sin(8/2) 8/2 '(8)andFinally r* -:,. p as a / p -:,. O. Equation (8) nicely separates the two components of bias. Even if we had infinitely many observations, and thus would not need local averaging to estimate conditional expectation, the circle with radius p would not be a stationary point of the algorithm; the principal curve is a circle with radius r&quot; The factor&gt;p,Sin£i£2)is due to local averaging. There is clearly an optimal span at which the two biascomponents cancel exactly. In practice this is not much help since we require knowledge of the radius ofcurvatureandtb.~.error variance is needed to determine it. Typically these .quan.tities·win be cha~gingas we move along the curve. Hastie (1984) gives a demonstration that these bias patterns persist in a situation where the curvature changes along the curve.7. ExamplesThis section contains three examples that illustrate the use of the procedure.7.1. The Stanford Linear Collider (SLC) projectThis application of principal curves was implemented by a group of l'!:e()de~tlc engineers at the Stanford Linear Center (SLAC) in the software Jer()me l?'riedm'l.n of SLAC.coltidE~S twobyIntense studied bynar'tirllpcetnsion are recordedpJtlYS1:ClSl;S, is to discover new su[)-atOIJWC to accelerate a pOl,itI'onin a coilision chamberan etectroncotnder arcstoto17Each of the two collider arcs contain roughly 475 magnets (23 segments of 20 plus some extras), which guide the positron and electron beam. Ideally these magnets lie on a smooth curve with a circumference of about 3km as depicted in the schematic The collider has a third dimension, and actually resembles a floppy tennis racket. This is because the tunnel containing the magnets goes underground (whereas the accelerator is above ground). Measurement errors were inevitable in the procedure used to place the magnets. This resulted in the magnets lying close to the planned curve, but with errors in the range of :I:: 1.0mm. A consequence of these errors was that the beam could not be adequately focussed. The engineers realized that it was not necessary to move the magnets to the ideal curve, but rather to a curve through the existing magnet positions and smooth enough to allow focused bending of the beam. This strategy would hopefully reduce the amount of magnet movement necessary. The principal curve procedure was used to find this curve. The remainder of this section describes some special features of this simple but important application. Initial attempts at fitting curves used the data in the measured 3 dimensional geodetic coordinates, but it. was found that the magnet displacements 'Vere small rela,tivetoth~bias inqucedby smoothing.The theoretical arc was then &quot;removed&quot;, and subsequent curve fitting was based on the residuals. This was achieved by replacing the 3 coordinates of each magnet by three new coordinates:1) the arc length from the beginning of the arc till the point of projection onto the ideal curve (x),2) the distance from the magnet to this projection in the horizontal plane (y), and 3) the distance in the vertical plane (z). This technique effectively removed the major component of the bias, and is an illustration of how special situations lend themselves to adaptations of the basic procedure. Of course knowledge of the ideal curve is not in other applications. this application. The fitted a vertex further it iswas a nal;ur;u way of cnoosmg curve, once transformed atha:rdE~rtooriginal coordinates, can be represented by a segments is vitalimnnrt:::J!.nrpse~:mt~ntsinceit is td.latmch if Oi measures magnet specify a thresholdnextwithotlt hittj:ng no smoothing atall results in nomovement (no work), but with many magnets violating the threshold. As theamount of smoothing (span) is increased, the angles tend to decrease, and the residuals and thus the amounts of magnet movement increases. The strategy used was to increase the span until no magnets violated the angle constraint. Figure llb gives the fitted vertical and horizontal components of the chosen curve, for a section of the north arc consisting of 149 magnets. This relatively rough curve was then translated back to the original coordinates, and the appropriate adjustments for each magnet were determined. The systematic trend in these coordinate functions represent systematic departures of the magnets from the theoretical curve. It turned out that only 66% of the magnets needed to be moved, since the remaining 34% of the residuals were below 60 J.Lm in length and considered negligible. There natural constralIlts on the syste.m. Some of the magnets were ftxedby design, andthus could not be moved. The beam enters. the arc parallel to the accelerator, so the initial magnets dono beniling. Similarly there are junction points at which no bending is allowed. These constraints are accommodated by attaching weights to the points representing the magnets, and using a weighted version of the smoother in the algorithm. By giving the fixed magnets sufficiently large weights, the constraints are met. Figure llb has the parallel constraints built in at the endpoints. Finally, since some of the magnets were way off target, we used a resistant version of the fitting procedure. Points are weighted according to their distance from the fitted curve, and deviations beyond a fixed threshold are given weight O.7.2. Gold assay pairsA California based company collects computer chip waste in order to sell it for its content of gold and other precious metals. Before bidding for a particular cargo, the company takes a sample to estimate the gold content of the the lot. The Qne sub..s;:l.mple is assayed by an outside laboratory, the other by their own inhouse laboratory. The company wishes to eventually use only one of the assays. It is in their interest to know which laboratory produces on average lower gold content assays for a given sample.data:l:ji12a consistsassays. Each point represents anoh';PY'V:l.. tiC1ITl=i=1 corresponds to ozeven scatterwere many morethan larger ones (&gt; 100z per ton)). Our model for the above data is(:::) =e~)) + (:::)eji(9)where Ti is the expected gold content for sample i using the inhouse lab assay, f( Ti) is the expected assay result for the outside lab relative to the inhouse lab, and with Var(eli) = Var(e~2i) is measurement error, assumed i.i.d.'V i,This is a generalization of the linear errors in variables model or the structural model (if we regard the Ti themselves as fixed): (10) Model (10) essentially looks for deviations from the 45° line, and is estimated by the first principal component. Model (9) is a special case of the principal curve model, where one of the coordinate functions is the identity. This identifies the systematic component of variableX2random variables), or the filnctional model (if the Ti ·&lt;iXe consideredwith the arc-length parameter.Similarly we estimate ·. (9) using a natural variant. of the principal curve algorithm. Illthesmoothin~ step we smooth onlyXlagainst the current value of T, and then updateTby projecting the data ontothe curve defined by (I( T), T). The dashed curve in figure 12a is the usual scatterplet smooth ofXlagainstX2and is clearlymisleading as a scatterplot summary. The principal curve lies above the 45° line in the interva!l.4 to 4 which an untransformed of valuable. bend in the curve is real, or whether the same we and seeassay tends to be lower than that of the outside lab. The difference is reversed at lower levels, but this is of less practical importance since at these levels the lot isA natural question arising at this point is whetherlinear model (10) is sinlply calculateIf wecurvesaccess to more dataabsence of \Ve COIn'Plllte as essentrauy onto astraIl2~htad,lltion.a! sampies, we use the bootstrapresidual vectorsto simulate(9). We therefore scale them up by a factor of V2. We then sampled with replacement from this pool, and reconstructed a bootstrap replicate by adding a sampled residual vector to each of the fitted values of the original fit. For each of these bootstrapped data variance. Figure 12b shows the errors in variables principal curves obtained for 25 bootstrap samples. The spread of these curves give an idea of the variance of the fitted curve. The difference between their average and the original fit estimates the bias, which in this case is negligible. Figure 12c shows the result of a different bootstrap experiment. Our null hypothesis is. that the relationl'lh.ip is linear, we way as above, but replacing the principal curve by the linear errors in variables line. The observed .curve (thick solid curve) lies outside the band of curves fitted to 25 bootstrapped data sets, providing additional evidence that the bend is indeed real.7.3. One dimensional color datathe full curve fitting procedure was appliedand the fitted curves were saved. This method of bootstrapping is aimed at exposing both bias andThe data in this example were used by Shepard and Carroll (1966) (who cite the original source asIiBoynton and Gordon (1965» to illustrate a version of their parametric data representation techniques called proximity analysis. We give more details of this technique in the discussion.The data set consists of 23 observations on four variables. It was obtained by exposing 100 observers to light of each of 23 different wavelengths. The four variables are the percentages of observers assigning the color descriptions blue, projection of the data and the principal components. The observations lie very dose to functions of thesma.u..~rthe plane spanned by the first two linear principal curve, which provides an of summary of the ofdata set. (Note that this cannot be deduced from figure 13a alone.) Figure 13b shows the coordinate curve. Arcint;erl~stjinlZ: featurecurve turns out to be a monotone twoof the curve is that the d.ls:ta.Jlce betweenthan extreme and the mrdare.two extremes of the seectmmare judged more218. Extension to higher dimensions - principal surfacesvVe have had some success in extending the definitions and algorithms for curves to cater for two dimensional (globally parametrized) surfaces. A continuous 2-dimensional globally parametrized surface in IRP is a function.Ii S;;;I =.Ii -;. IRP form.2 ,whereI is a vector of continuous functions:h(&gt;.t,)..2)I(A) =12(&gt;\1, )..2)(11)II Insert figure 14 about hereLet X be defined as before, and let over A S;;;III denote a smooth 2-dimensional surface in JRP, parametrizedm.2·Here the projection index Af( m) is defined to be the parameter value corresponding tothe point on the surface closest to m. The principal surfaces of hare those members of (;2 which are self consistent:E(X IAf(X)= A) = I(A) for a.e.AFigure 14 illustrates the situation. We do not yet have a rigorous justification for these definitions, although we have had success in implementing an algorithm. similar to the curve algorithm; two dimensional surface smoothers are used instead of one dimensional scatterplot smoothers, vVe refer the reader to Hastie (1984) for more details of principal surfaces, the algorithm to compute them and examples.9. DiscussionOurs is not the case att,eml:&gt;t at fiIld.ing a metnca approaches tol:1&quot;l&gt;·J:l.l:&amp;,nfittiJrlg ncnlinear maIuioi,ds tomulltiV&lt;!Lriaj~eproblem we will restrict ourselves to one dimensicnal mamtotds,paper. to ours wassu~~ested22where peA) is a vector of polynomials PiCA) find the coefficients of=apeAk of prespecified degrees ](i' The goal is topolynomials and the Ai, i= 1 ... nminimizing the loss function L lIeill2.algorithm makes use of the fact that for given Al ... An the optimal polynomial coefficients can befound by linear least squares, and the loss function thus can be written as a function of the Ai only. Carroll was able to give an explicit formula for the gradient of the loss function, which is helpful in then dimensional numerical optimization required to find the optimal A'S.The model of Etezadi-Amoli and McDonald (1984) is the same as Carroll's, but they use different goodness-of-fit measures. Their goal is to minimize the off-diagonal elements of the error covariance m&lt;:WrJ.x:r: = Et E, which isin the spirit. of classical linear factor analysis. Various measures for theon-d.lagion.;;w' etements are suggested,·for example Li:pi uri' Their algorithm is similar to ours in that it· alternates betwef!n impl.&quot;ovingthe A'S for given polynomial coefficients; and finding the optimal polynomial coefficients for given A'S. The latter is a linear least squares problem, whereas the former constitutes one step of a nonlinear optimization in n parameters. Shepard and Carroll (1966) proceed from the assumption that the P dimensional observation vectors lie exactly on a smooth one dimensional manifold. In this case it is possible to find parameter values AI ... An such that for each one of the P coordinates, Xii varies smoothly with Ai. The basis of their method is a measure for the degree of smoothness of the dependence of Xii on Ai. This measure of smoothness,smnmedoverthep.coordinates, optimized with respect to the A's oI1eftndsthose values of Al ... An which make the dependence of the coordinates on the A's as smooth as possible.We will not go Into the definition and motivation of the smoothness measure; it is quite subtle,and we refer the interested reader to the original source. We just wish to point out that instead of optimizing smoothness, one could optimize a combination of smoothness and fidelity to the data as described in section 5.5, which would lead to modeling the coordinate functions should allow the method to better deal with noise in the data.as spline funCtions aIldIn view of this previous work, what do we think is the contribution of the present paper?· From the operational mal A'stir'&quot;&quot;,,,,of view it is advantageousthere is no need to specify a parametric the optiattractiveform for the cOJrdlin;a.te tunettens, Because the curve is represented as a ceordmate mactioas is easy. of pnncipar curves to data sets.·agrees a summary.prmcrpat curves as conditional sxpectatronstion of linear principal components. This dose connection is further emphasized by the fact that linear principal curves are principal components and that the algorithm converge to the largestprincipal component, if conditional expectations are replaced by least squares straight lines.Appendix - Proofs of Propositions.Throughout the appendix we make the following assumptions:AssumptionsI denote a smooth (COO) unit speed curve in lR!' parametrized over a closed, possibly infinite interval A ~ m . We assume that I does notDenote by X a random vector in lR!' with density h and finite second moments. Let1intersect itself (Al# A2I(Al) :J: I(A2)) and has finite length inside any finite ball.No.te: Under these conditioIls, the set {/(A), AE A} forms a smooth, connected l-dimenslonal manifold diffeomorphic to the interval A. &quot;'ny smooth, connected I-dimensional manifold is diffeomorphic either to an interval or to a circle (Milnor, 1965). The results and proofs below could' be slightly modified to cover the latter case (closed curves). Existence of the projection index. Existence of the projection index is a consequence of the following two lemmas. Lemma 5.1: For every z E lRP and for any r&gt; 0 the set Q = {A I liz - I(A)1I :::; 1'} is compact.Proof:&gt;Q is closed, because liz - I(A)II is a continuous function of A. It remains to show that Q is bounded. Suppose it were not. Then there would exist an unbounded monotone sequence Al' A2,'&quot; with liz - f(Ai)1l~1'. Let B denote the ball around z with radius 21'. Consider the segment of the curve between I(Ai) and I(AH1)'The segment either leaves and reenters B, or it stays entirely inside. This means that it contributes at least min(2r, IAi+! - Ail) to the length of the curve inside B. As there are mfimteiy sequenceI would have infinite length in B, which is a contradiction.0Lemma 5.2: For every z E lR!' there exists A E A for whichliz- I(A)II = PEA liz infProof: Define r attained. 01(Il)ll=B = {Il I Since B is non-empcy and compact ,&gt;&quot;,aMao.IIz-=side is=24=isProof:P , ,,~- fU\)&quot;0=d(~, f)}ll(~)5.2) and COll'lpaet(l!~mnlaand thereforehas a ''''''''0''''''' element.It is not hard to show thatis measurable; ais available upon request.Stationarity of the distance function. We willfust establish some simple facts which are of interest in themselves. Lemma 6.1: If f(lo) is a closest point to ~ and lo E A0 , the interior of the parameter interval, then :z: is in the normal hyperplane to f at f(lo):(~ - f(lo), f'(.&gt;.o»=0defined (lopoint~is called an ambiguity point for a curve fif it has more than one closest point onthe curve: cardp '''~ - f(l)1l measure O.= d(~,f)} &gt; 1.Let A denote the set of ambiguity points. Our next goal will be to show that A is measurable and has Define M A, the orthogonal hyperplane to f at l, byMA = {:z: '(:z: - f(l), f'(l» = O}point 1vfA' It will be useful to 1 into UA M . Choose p - 1 smooth vector fields nl(l), ... , np_l(l) such mapping that maps A x IRfAdefinethat for every l the vectors f'(A), nl().),···, np_l(l) are orthogonal. It is well known that such vector fields do exist. Define X : A xIRr 1 ....... IRP byX(l, v)= f(l) +and set M= x(A, :mr- 1 ) , the set of all points in IRP lying in some hyperplane for some point on the curve. The mapping X is smooth, because f and nl, ... ,np_l are assumed to be smooth.\Ve shall now present a few observations which simplify showing Lemma 6.2: peA () Proof:A has measure O.= O.Accordin,g to lemma 6.1, this is posrstble if A is a set of of this measure 0 interval forms a is measurable and hasayperplane, which has measure O.measure O.0set. It is sufficient to show forLemma 6.3: Let E be a measureE=25--Proof:open covering {Nez) Iz E JR..!' \ E} of JRP \ E contains a countable covering {N;}, because thetopology of JR.P has a countable base.0Lemma 6.4: \Ve can restrict ourselves to the case of compact A.Proof: Set An= An [-n, nJ, In = 1/An, and An the set of ambiguity points of In. Suppose z is an ambiguitypoint of Ij then {A I liz - I(A)l1= d(z,/His compact (lemma 5.1). ThereforezE An for some n, andAclh An·0\Ve are now ready to prove Proposition 6: The set of ambiguity points has measure O. Proof: Wecanrestrictollrselvesto the case of cOUlpact A (lemma 6.4). As peAnu'} = 0 (lemma 6.2) itis sufficient to show that for every z EM, with the possible exception of a set C of measure 0, there exists a neighborhoodN(z) with peA nN(z»= o. =pWe choose C to be the set of critical values ofX. (A point y E M is called a regular value ifrank(x'(z»for all z E X-ley); otherwise y is called a critical value). By Sard's theorem (Milnor, 1965), C has measure O. Pick z E M nCc. We will first show that X-l(z) is a finite set {(Al,Vl),···,(Ak,Vk)}. Suppose on the contrary that there was an infinite set {(';1,Wl),(6,'W2),&quot;'} with X(';i,'Wi) continuity of X, there would exist a cluster point ';0 of {';lJ6,&quot;= z.By compactness of A andoJand a corresponding 'Wo withx (';0, 'Wo) =z.On the other hand, z was assumed to be a regular value of x, and thus X would be a diffeomorphism between a neighborhood of (AO''WO) and a neighborhoodof.z-.This isa contradiction. Beca,l1se zlsa regular value, there are neighborhoods Li(Ai, Vi) and a neighborhood N(z) such that X is a diffeomorphism between L; and N. Actually, a stronger statement holds. We can find N(z) C N(z) for which X-leN)cU~ Lj. Suppose this were not the case. Then there would exist a sequenceZb Z2,'&quot;-z andcorresponding (';i,Wi) ~ U:=lLi with X(';i,'Wi) continuity x(.;o, 'Wa)= Zi.The set {';1,6,··-}has a cluster point.;o ~ UL;, and by= z, which is a contradiction.y Eexists exactly one pair (A;(y),V;(Y» E L; with X(A;(y),V;(Y»= y, and A;(y) is a smooth function of y. Define Ao(Y) = Am;n, Ak+l(Y) = Amaz. Set d;(y) = lIy -- I(A;(y»1I 2 · A simple calculation using the chain rule and the fact that (y -- I(A;(y», 1'(A;(y))} = 0 (lemma 6.1) shows thatgrad(d;(y»= 2(y- I(Ai(Y»). A point y E N(z) can be an ambiguity point only if y E Aii for some i =f; j= {':Elai(.:)where=curve 1(&gt;') was assumed not to intersectHowever. forsmootn,P&lt;:)SSlOlY-- dj::/: 0, becausenot connected, manifold of dimension p -- 1, which has measure 0, andNote: Weanl'a,rs extendtedmi&lt;::ality: Sard's theoreltn reenuesanHowe'ver. wef ina smoothoouneanes of the interval.26In the following, let/lg'(.\)/I::5 1. Forball and for t9gB denote the class of smooth curves parametrized over A, with /lg(.\)11 ::5 1 and E gB, define f~('A) = f(.\) +tg(.\). It is easy to see that f~ has finite length inside any finite&lt; 1, .\ Ii is well defined. Moreover, we haveLemma 4.1: If Z is not an ambiguity point fort,thenProof: We have to show that for every e &gt; 0 there exists 6 &gt; 0 such that for all t &lt; 6, 1.\1.(z) - Af(z)1C&lt; e. Set= An(.\f(z) Ze,.\j(z) + e)e and de= inf&gt;&quot;Ec liz -f(.\)II. The infimum is attained and de&gt;liz - f(.\j(z))lI,becauseis not an ambiguity point.l~f (/lz -ft(.\)/I-Ilz -ft(.\f(z»ID ?: de - 6 -lIz-f(.\j(z))/I - 6=6&gt;0oWe are now ready to prove: Proposition 4: The curve f is a principal curve of h iffdD2(h, dtIt)lt=O-0 -v 9 coB· sE vProof: Vie use the dominated convergence theorem to show that we can interchange the orders of integration and differentiation in the expressionWe need to find a pair of integrable random variables which almost surely boundZt= IIX -ft(.\I.(X»1I2 -/IX -f(.\f(X»1I2tfor all SUIUC1,ent,ly smaIl t Now&gt; O.,t;xJpaI1ldiIlg the first norm we=27- 2tand thus(13)Using the Cauchy-Schwarz inel'!uality and the assumption thatlIgU :::5 1Zt :::5 2!!X -- f(Aj(Xn!1+1IIX-- f(Ao)lI,and therefore Zt-:::5 211X -- f{Ao)1I'Vt &lt; 1 and arbitrary AO- AsSimilarly we have+1IIXII was assumed tobe integrable, so isExpanding the first normwe getZt 2: --2 (X -- f(Aft(X»)-g(Aj,{X»(14)2: --211 X -- f{Aj,{X»1I2: --21lX -- f{..\o) II ,which is once again integrable. By the dominated convergence theorem, the interchange is justified. However, from(13) and (14), and because f and g are continuous functions, we see that the limit limt _OZt existswhenever Aft (X) is continuous in t at t this limit is given by= O. We have proved this continuity for a.e.:llin lemma 4.1. Moreover,by (13) and (14). Denoting the distribution of Aj{X) by h&gt;., we getIf f{A) is a principal curve of h, then by definition E(X IAJ{X)= l) = f(A) for a.e.A, and thus~D2(h, ft)l.Conversely, suppose that= 0 va E gB#=0= A).for all g E=0asgB. l.,;orlSlder each coordinate separately, and re-expressunpties thato28Construction of densities with known principal curves. Letf be parametrized over a compact interval A. It is easy to construct densities with carrier in a tube around f, for which f is a principal curve.Denote by B,. the ball in IRP-l with radius r and center at the origin. The construction is based on thefollowing: Proposition 7: If A is compact, there exists r&gt; 0 such that X IA x B,. is a diffeomorphism. Proof: Suppose the result were not true. Pick a sequence r. - O. There would exist sequences (Ai, Vi) withi: (ei, 'Wi)111.1.11:5 ri, 1I'W.II:5 ri andx(A., Vi) = X(ei,'Wi)'The sequences Ai&gt; ei have duster points Ao, eo. We must haveAo = eo, becausefCeo) = Xceo, 0)= '·-'00 X(~i, Vi) .lim= X(AO' 0) =f(Ao),and by assumption f does not intersect itself. So there would be sequences (Ai, Vi) and(~i, 'W')converging to 0 and(AO' 0) with X(Ai' Vi)= X(ei, 'Wi).However, it is easy to see that (AO, 0) is a regular point of X, and thus maps aneighborhood of (AO, 0) diffeornorphically into a neighborhood of f(AO), which is a contradiction. Define T(f,r)X(A x B,.). Proposition. 7 assures that there.a:renoamLbi~:uij;yAj(;r;)= A for ;r; EX(A,B,.).= O. The mapping X carries the productPick a density (A) on A and a density 1/;(1.1) on B,. with fB,. 1.11/;(1.1)density (A)' 1/;(1.1) on A x B,. into a density h(;r;) on T(f, r). It is easy to verify that f is a principal curve for h. References389, Institute forMathematical Studies in the Social Sciences, Stanford University, California. Becker, R., Chambers, J. and Wilks, A. (1988), The New S Language, Wadsworth, New York. Boynton, R.M. and Gordon, J.«ljeZold-jjrtlke Hue Shift MeasuredColor-N'mling TechIliq1ue&quot; ,Journalof theCarroll, D.J.of America, 55 ,~P()lYIlonlial Factor«Al1laljrSU.&quot;, Proceedings of the 77th Annual C01nvent:ion, APA, 103-104.vi1eig;htt:~d Re~'essionRobust lJu,c;auyand Smoothmg Scatterplots&quot;~lIH''Nl.,''1of theAmerican Statisti.cal.t!sso,::at:on, 74,&lt;:S&quot;l;-(j;)O.29Efron, B. (1982), &quot;Jacknife, the Bootstrap and other Resampling Plans&quot;, SIA}d-CBlY/S, 38.Etezadi-Amoli, J. and McDonald, R.P. (1983), &quot;A Second Generation Nonlinear Factor Analysis&quot;, PsychometriJ:a,48, #3, 315-342.Golub, G.H. and van Loan, C. (1979), &quot; Total Least Squares&quot;, in Smoothing Techniques for Curve Estimation, Proceedings, Heidelberg, Springer Verlag. Hart, J. and WehrIey, T. (1986), &quot;Kernel Regression Estimation Using Repeated Mea surement Data&quot;, Journalof the American Statistical Association, 81, 1080-1088.Hastie, T.J. (1984), &quot; Principal Curves and Surfaces&quot;, Lab. for Comp. Stat. tech. rep. Statistics Departmen,t, Stanford University.# 11 and Ph.d. thesis,Milnor, J.W. (1965), Topology from the Differentiable Vie'Wpoint,University Press of Virginia, Charlottesville. Shepard,. R.N. and Carl'oll, D.J. (1966), &quot; Parametric Representations of Non- Linear Data Structures&quot;, inMultivariate Analysis (Krishnaiah, P.R., ed), Academic Press, New York.Silverman, B.'VV. (1985), &quot;Some Aspects of Spline Smoothing Approaches to Non-Parametric Regression Curve Fitting&quot;, Journal of the Royal Statistical Society B, 47, 1-52. Stone, M. (1974), &quot;Cross-validatory Choice and Assessment of Statistical Predictions (with Discussion)&quot;, Journalof the Royal Statistical Society B, 36, 111-147.Thorpe, J.A. (1979), Elementary Topics in Differential Geometry, Springer-Verlag, New York. Undergraduate Text in Mathematics. Wabba, G. and Wold, S. (1975), &quot; A Completely Automatic French Curve: Fitting Spline Functions by Crossvalidation&quot; ,Communications in Statistics, 4, 1-7. Watson, G.S. (1964) &quot;Smooth Regression Analysis&quot;, Sankya Series, A, 26, 359-372.FIGURE 1. (a) (b)regression linethe sum of squared deviations in the responseprincipal component line minimizes the sum of squared deviations in the sum of squared deviations in the responsevariables. (c) The smooth regression curvevariable, subject to smoothness constraints (d) The principal curve minimizes the sum of squared deviations in all the variables, subject to smoothness constraints. FIGURE 2. The radius of curvature is the radius of the circle tangent to the curve with the same acceleration as the curve. FIGURE 3. Each point on a principal curve is the average of the points that project there. FIGUl:tE 4·.Themean of the observations projecting onto an endpoint of the curve can be disjoint the rest of the curve. FIGURE 5. Selected iterates of the principal curve procedure for the circle data. In all the figures we see the data, the circle from which the data is generated, and the current estimate produced by the algorithm. (a)The starting curve is the principal component line, with average squared distanceD 2(j:(O = 12.91. (b) Iteration 2: D2(j:(2)) = 10.43. (c) Iteration 4: D2(f(4)) = 2.58. (d) Final ))iteration 8: D2(f(8)) = 1.55. FIGURE 6. These schematics emphasize the iterative nature of the algorithm. The curve of the the :first iteratiqnisa function ofX(O).rneasuredalongtheistarting second iteration is a function ofA(I)curve of the themeasured along the curve of the first iteration (b).FIGURE 1. (a) The final curve in figure 6 with one more smoothing step, using cross-validation separately for each of the co-ordinates - D2(f(9)) the iterations using cross-validation at every step. FIGURE 8. The points represent three dimensional data generated from a helix with gaussian errors. The dashed curve is the generating helix, and the solid the curve fitted to the data. All are projected onto a plane chosen to give a reasonable view of the results. FIGURE 9. Some curves preaueed by curve found stctrti,nl:!: is&lt;,&quot;'U·&quot;''''\L.= 1.28.(b) The curve obtained by continuingalgorithm applied to biv.lXiate spinerlcaJ. gau,ssict.Il(a)circleaJg;OrithlJ:l is started at a or a butcentered atmean.a smcotner ensures(c) The curve foundis genersrec location on is seieceec unnornuy31errors.than the generating curve. 11.( a) A rough schematic of the Stanford linear accelerator and the linear collider ring. (b) fitted coordinate for the positions for a section of SLC. The data represent residuals from the theoretical curve. Some (35%) of the deviations from the fitted curve were small enough that these magnets were not moved. FIGURE 12. (a) Plot of the log assays for the inhouse and outside labs. The solid curve is the principal curve, the dashed curve the scatterplot smooth. (b) A band of 25 bootstrap curves. Each curve is the principal curve of a bootstrap sample. A bootstrap sample is obtained by randomly assigning to the principal curve solid curve, curve). The bandof curves appears to be spread ofthe curves gives an indication ofvaria:nce. (c) Another band of 25 bootstrap curves. Each curve is the principal curve of a bootstrap sample, based on the linear errors in variables regression line (solid line). This simulation is aimed at testing the null hypothesis of no kink. There is evidence that the kink is real, since the principal curve (solid curve) lies outside this band in the region of the kink. FIGURE 13. (a) The 4 dimensional color data and the principal curve projected onto the first principal component plane. (b) The estimated co-ordinate functions plotted against the arc length of the principal curve. The arc length is a monotone function of wavelength. FIGURE 14. Each point on a principal surface is the average of the points that project there.32If,Ieoo0o oooo0o o oo o·0oI_ ._ _ _ _13e.'.-.,..· ·· ·· · · · · · · ·· · · · · · · · · .' · · · · · ·· &quot; · ··· · · ·· · · · · ·· · · · · ·· · · ·· · · · · · · ·· · · t · · · ·· · ·...-....-,,· · · · · · · ·· · ·· · · · ·· · · · ·· · · ,.....· · · . 1· · · ·· ·-·· · · · · ·&quot;.., ··· ··· ·&quot;[·start: PC line - 12.91iteration 1: - 10.43· · · ··.. ···· ·· · · · · · ·· ·· &quot;·..······ · ··· ··..· ··· ·· · ·· · ·· · ·iteration 4: - 2.58final - 1.55:'· · · ·· · · ····CV final fit - 1.28CV'd iteratively: - .12·· ···· ·· ·· · · · ····· ·············· ···8······ ···.. .· ·· ··,.· · ·· ·· ·· · ·· ·· ·· .,. · · ·,· ··· ·· · · ·· ··til· ··· .. ···· · ·I·· ·· ····:.:::6· ..·· ·· · ... ·-··· .... ·· ... .'II ··· ····~:.···· ·... ...... ...... ...&quot;-......&quot;, ,If f ff f II I I IIIIII...... '&quot; '&quot;'&quot;, ,,,, ,,II10collision chamberIlinear acceleratorIIo,o100200arc lengltl (m)'.IH.....'.. .. ,;---,--=...._.........''..&quot;l. -o100200arc lengltl (m)fib=,...oo1234inhouse assay (a)oIIa2inhouse assay(c)4612 bo,',&quot;&quot;,&quot;&quot; &quot;..-&quot;&quot; ..- &quot;o2inhouse assay (b)461:&amp; cFirst principal aXIS60rJ)\o ........,J~'red \ yellow ....~eep.~ 40Q)...,Jo,./...I\\/\/\I/ /X\.\ \.8Ha:s. \\ \JI '. I&quot;Cf20o o&quot;CQ)...,JI/,:. \...,J rJ)S .....r:i!a:s0-_/(I)I.\...........:...;..----_ ......0 0 0 0 0 0&quot; --- . _.&quot;'&quot; F . - . - .&quot;&quot;'.--......,.&quot;,........:.00 000co0o000CDo50100150200250Estimated A (wavelength)1'3 b···· ·· ·f(X) ····14-`

52 pages

#### Report File (DMCA)

Our content is added by our users. We aim to remove reported files within 1 working day. Please use this link to notify us:

Report this file as copyright or inappropriate

1226049

Notice: fwrite(): send of 200 bytes failed with errno=104 Connection reset by peer in /home/readbag.com/web/sphinxapi.php on line 531