#### Read Bilker.pmd text version

Multivariate Behavioral Research, 39 (4), 565-594 Copyright © 2004, Lawrence Erlbaum Associates, Inc.

A Two Factor ANOVA-like Test for Correlated Correlations: CORANOVA

Warren B. Bilker

Dept. of Biostatistics & Epidemiology & Center for Clinical Epidemiology & Biostatistics University of Pennsylvania

Colleen Brensinger

Dept. of Biostatistics and Epidemiology University of Pennsylvania

Ruben C. Gur

Dept. of Psychiatry University of Pennsylvania Testing homogeneity of correlations with Fisher's Z is inappropriate when correlations are themselves correlated. Suppose measurements of brain activation and performance are taken before and during a verbal memory task. Of interest are changes in activity gradients in specific regions, R1, R2, R3, and performance, V. The "correlated correlations" of interest V,R , V,R , and 1 2 , have a single variable, V, in common. We wish to compare these correlations between V,R3 males and females, across regions, and to assess an interaction of the correlation. Fisher's Z can compare pairs of correlations, and Olkin and Finn's (1990) method can test homogeneity of correlated correlations across a single within factor (based on asymptotic normality), but no current procedure can test a region by gender (within by between) interaction of correlations. We propose a nonparametric method for testing this interaction and both main effects. The procedure is analogous to two-way ANOVA, but hypotheses test homogeneity of correlations, not means. The null distributions are estimated with permutations, avoiding asymptotic distributional assumptions and enhancing applicability to smaller samples and non-normal data. Simulations demonstrated maintenance of correct level (power = alpha level under the null) for normal and non-normal data and small samples. The Olkin-Finn test had inflated level for non-normal data or small samples. The Fisher's Z had inflated level for non-normal data, but not for small samples. Our method had better efficiency across contrasts and data types and sizes. Applied to correlations between regional laterality of blood flow and verbal memory performance, the method showed sensitivity to a biologically meaningful sex by region interaction in these correlations. A SAS macro for CORANOVA is available.

We thank Drs. Raymond J. Carroll and Larry Muenz for invaluable discussions in the development process of this method. We also thank Dr. Todd C. Headrick for making his nonnormal simulation code available. MULTIVARIATE BEHAVIORAL RESEARCH 565

W. Bilker, C. Brensinger, and R. Gur

Introduction There have been many statistical procedures developed for the comparison of pairs of Pearson correlations. These include procedures for comparing correlations on the same two variables from two independent groups, such as the Fisher's z-test based on the Fisher's z-transformation (Fisher, 1921). Pairs of correlations from the same group are not independent and standard procedures for comparing independent correlations are thus not appropriate. Such correlations are referred to as "correlated correlations". Pairs of correlations from the same group that have one index variable in common are called "overlapping correlated correlations" (e.g., and x1,x3), while pairs having no index variables in common are called x1,x2 "non-overlapping correlated correlations" (e.g., x1,x2 and x3,x4). Hotelling (1940) proposed a method for comparing two correlated overlapping correlations, which is based on the t-distribution and Williams (1959) presented a modified version of Hotelling's test. Olkin (1967) presented a large sample test for comparing two correlated overlapping correlations (Olkin's z test) as did Dunn and Clark (1969), who provided several large sample tests that use the Fisher's z-transformation. All of these approaches assume that the data come from a trivariate normal distribution. Dunn and Clark (1971) and Neill and Dunn (1975) each presented comparisons of power for various such tests. Procedures for comparing two correlated nonoverlapping correlations were presented as early as 1898 by Pearson and Filon and more recently by Raghunathan, Rosenthal and Rubin (1996). Methods for testing the homogeneity of greater than two correlated correlations have also appeared in the literature. Olkin and Finn (1990) developed large sample methods for testing the homogeneity of a set of overlapping or nonoverlapping correlated correlations, both based on the assumption that the data come from a multivariate normal distribution. Choi (1977) gave two methods for testing the equality of three or more overlapping correlated correlations, one which is a parametric test but requires known variance ratios for the correlated variables and one that is nonparametric and based on Spearman correlation coefficients. Meng, Rosenthal, and Ruben (1992) extended the work of Dunn and Clark (1969), using the Fisher's z-transformation, to more than two overlapping correlated correlations. They also discussed why the traditional Hotelling's t test for comparing two overlapping correlated correlations is generally not appropriate in practice, since it is correct only when the sample variance of the correlated variables each equal their respective population variances. Cohen (1989) proposed a test for the comparison of three or more

566 MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

overlapping correlated correlations using the Fisher's z-transformation. His approach uses a bootstrap estimate of the covariance matrix of a specified set of contrasts for testing the homogeneity of the Fisher z-transformed correlated correlations and bases the resulting p-value of their test on the asymptotic distribution of the proposed test statistic. Paul (1989) presented a procedure for comparing more than two overlapping correlated correlations, based on the Fisher's z-transformation. Steiger (1980) also presented a review and comparison of methods for comparing two or more elements of a correlation matrix including overlapping and nonoverlapping correlated correlations. Tests for the comparison of the two overlapping correlated correlations, x1,x2 and x1,x3, depend on the correlation between x2 and x3, x2,x3 (Olkin, 1967), which is the correlation between pairs of within variables. In particular, the power increases with increasing values of x2,x3 (May & Hitner, 1997A, 1997B). This also extends to the case of testing the homogeneity of a set of three or more overlapping correlated correlations. Procedures for testing the homogeneity of a set of overlapping correlated correlations, such as the test presented by Olkin and Finn (1990), can be thought of as testing the homogeneity of a set of correlated correlations across a single within subject factor. For example, if measurements X1, ..., Xp are each taken from a group of subjects, these methods can be used to test the homogeneity of the set of correlations X1,Xi, (i = 2, ..., p), where X1 was selected arbitrarily here as the common variable in the set of correlations. Suppose that the variables, X1, ..., Xp, are collected in g independent groups and are denoted by Xij (i = 1, ..., p), (j = 1, ..., g), where i represents the variable and j represents the group. A single correlation can be compared between two groups using the Fisher's z-test, since these are not correlated correlations. The Olkin and Finn method can be applied to each group separately to test the homogeneity of the correlations in each group, X11,Xi1, (i = 2, ..., p), X12,Xi2, (i = 2, ..., p), ..., , (i = 2, ..., p). This method is based on asymptotic normality results. X1g,Xig However, in many areas of application, such as neuroimaging data, it is common to have small data sets in which it may not be reasonable to depend on asymptotic normality results. It may also be desirable to test the homogeneity of correlations among the groups, which is a between factor effect. Another feature of the underlying data structure that can be of importance is the interaction in the correlated correlations of the between and within factors. Such an interaction would be present when the pattern of correlated correlations across the within factor is dependent on the between factor. We present a nonparametric method for testing the within, between, and within by between interaction effects of the correlations for this type of data. The proposed method is based on the same linear contrasts and quadratic forms used in the analysis of variance of means, with vectors of correlations in

MULTIVARIATE BEHAVIORAL RESEARCH 567

W. Bilker, C. Brensinger, and R. Gur

the quadratic form rather than means. The method is thus analogous to a two factor analysis of variance on correlated correlations, including an interaction term. Hence we refer to the method as CORANOVA. The method developed here uses a bootstrap estimate of the covariance matrix for the correlated correlations and permutation tests for each of the effects being tested. We describe the methodology in the next section, which can accommodate any number of levels of the within and between factors. Simulation studies are presented in the section entitled "Simulations to Examine Efficiency and Power", with a study of the efficiency of using the Fisher's ztransformation in the section entitled "Simulations to Estimate Efficiency for Fisher's Z", a study of the power of the testing procedures of CORANOVA in the section "Simulations to Estimate Power", and a study examining the efficiency comparing the within hypothesis of CORANOVA to the parametric method of Olkin and Finn (1990) in the section "Efficiency Comparing to Parametric Method". An application of CORANOVA testing hypotheses on sex differences in the correlation of laterality of regional blood flow in the brain and memory performance is presented in the section entitled "Application". The Method: CORANOVA Suppose that for each subject in a study, the measurements (X1, ..., Xp) are taken. Suppose also that the subjects represent g independent groups of the between factor, where the number of subjects from each respective group is denoted as n1, ..., ng. The measurements will be denoted as Xijk, (i = 1, ..., p), (j = 1, ..., g), (k = 1, ..., nj), where the indices i, j, and k, represent the variable, group, and individual within group, respectively. Suppose that interest centers on the patterns in the correlations of each of the variables with a single common variable. It will be assumed, without loss of generality, that the common variable of interest in the correlations is X1. Thus, the correlations of interest are X1j,Xij, (i = 2, ..., p), (j = 1, ..., g). The goal is to test for differences in the correlations of the common variable with the other variables of the within factor, differences in these correlations between the groups (a between factor effect), and differences in the within patterns of the correlations that occur between the groups, a within by between interaction. The test presented here parallels an ANOVA, however, it is an analysis of the correlations and not the means. This distinction is illustrated by the fact that there can be substantial within, between, and even interaction effects for the correlated correlations while the groups means are equal. Let Y1 = (X11, ..., Xp1), Y2 = (X12, ..., Xp2), ..., and Yg = (X1g, ..., Xpg). A common assumption would be that Y1, Y2, ..., Ygare each distributed as pvariate normal. The approach being presented here does not require this

568 MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

assumption.

X1g,X2g

Also, let R = (

X1g,Xpg

, ...,

^ ) and denote the estimate of R by R = r = ( ^ X11 , X 21 , ...,

X11,X21

, ...,

X11,Xp1

,

X12,X22

, ...,

X12,Xp2

, ...,

^ X11 , X p1 , ^ X12 , X 22 , ..., ^ X12 , X p 2 , ..., ^ X1 g , X 2 g , ..., ^ X1 g , X pg ), where the s are population Pearson correlations and the ^s are the corresponding estimates. ^ R and R are each [g(p 1) × 1] matrices. The three hypotheses of interest are as follows. The expression MEAN ( X1 j , X 2 j , ..., X1 j , X pj ) refers to the arithmetic mean of these population parameters.

Between Effect Hypothesis H0: MEAN ( X1 j , X 2 j , ..., X1 j , X pj ) , are equal for all j = 1, ..., g H1: MEAN ( X1 j , X 2 j , ..., X1 j , X pj ) MEAN ( X1l , X 2 l , ..., X1l , X pl ) for at least one (j, l) pair, j, l = 1, ..., g, Within Effect Hypothesis H0: MEAN ( X11 , X i1 , ..., X1 g , X ig ) , are equal for all i = 2, ..., p H1: MEAN ( X11 , X i1 , ..., X1 g , X ig ) MEAN ( X11 , X l 1 , ..., X1 g , X lg ) for at least one (i, l) pair, i, l = 1, ..., p, Within by Between Interaction Effect Hypothesis H0: ( X1 g , X 2 g - X1 g , X ig ) ( X1 g , X 2 g - X1 g , X ig ) = 0, for all i = 2, ..., p, 1 1 1 1 2 2 2 2 and, g1, g2 = 1, ..., g(represent any pair of groups), g1 g2 H1: At least one Interaction contrast significant Consider a motivating application that is presented in detail in the section "Application". Suppose measurements of the blood flow rate are taken in three brain regions in both the left and right hemispheres, during resting baseline conditions. Laterality of the blood flow rate, blood flow rate in the left hemisphere minus blood flow rate in the right hemisphere, is measured for each subject. Additionally, at a different session, patient verbal memory performance is measured on a continuous scale. Each patient has one set of three regional lateralized blood flow measurements and one verbal memory score measurement. Interest centers on the correlations between the regional laterality of blood flow rates and the verbal memory scores. There is one correlation for each brain region, with all correlations having the same set of verbal memory scores as a variable in common. This experiment includes both male and female groups. For this example, the null hypothesis for the between effect implies that the average of the three correlations

MULTIVARIATE BEHAVIORAL RESEARCH 569

W. Bilker, C. Brensinger, and R. Gur

between the lateralized regional blood flow rates and verbal memory scores do not differ between males and females. The null hypothesis for the within effect implies that the average of the male and female correlations between the lateralized regional blood flow rates and verbal memory scores do not differ across the three brain regions. The null hypothesis for the within by between interaction implies that the pattern of correlations between the lateralized regional blood flow rates and verbal memory scores across regions does not differ between males and females. Let CB[(g 1) × g(p 1)], CW[(p 2) × g(p 1)], and CI[(g 1)(p 2) × g(p 1)] be the linear contrast matrices for testing the between, within, and interaction effects, respectively. The contrast matrices are defined as follows, where 1k is a row vector of 1s of length k, Ik is a k × k identity matrix, and represents the Kronecker product.

C B = 1g -1 1 p -1 | I g -1 -1 p -1 ; CW = 1g (1g - 2 | -I g - 2 ) ; C I = 1g -1 (1p - 2 | -I p - 2 ) | I g -1 ( -1p - 2 | I p - 2 ) .

Let Y1, Y2, ..., Yg be p-variate normally distributed, with V representing ^ the covariance matrix for R , and assume the sample sizes are large within ^ each of the g groups of the between factor. Also, let V represent an estimate of V. Then the hypotheses for the within, between, and interaction (1) (1) effects can be tested using the respective statistics S B , SW , S I(1) , which are based on classical asymptotic results, where

(1) ^ ^ S B = C B R C B VC B (1) SW

S I(1)

^ ( )( ) (C R ) ( g -1); ^ ^ ^ = ( C R ) ( C VC ) ( C R ) ( p - 2 ) ; ^ ^ ^ = ( C R ) ( C VC ) ( C R ) ( p - 2 )( g -1) .

-1 2 B -1 2 W W W W -1 2 I I I I

It is common that significance tests involving correlations use the Fisher's z-transformation. Similar asymptotic results also hold for the Fisher's z-transformed Pearson correlations. Let Z represent the Fisher's ztransformed vector R, where each ri of R is transformed as Z = 0.5ln(1 + ri)/ (1 ri). Also, denote the Fisher's z-transformed vector of the estimated ^ Pearson correlations as Z = z. Let VZ represent the covariance matrix for

570 MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

Z. Making the same large sample sizes assumptions as above, the hypotheses for the within, between, and interaction effects can be tested ( ( using the respective statistics S B2) , SW2) , S I( 2) , which are based on classical asymptotic results, where

(2) ^ SB = CB Z

( ( (

) ( C

B

V Z CB

^ ) (C Z )

-1 B

2

( g -1) ;

2

(2) ^ SW = CW Z

) ( C

I

W

V Z CW

Z I

^ ) (C Z )

-1 W 2 I

( p - 2);

^ S I(2) = C I Z

^ ) ( C V C ) (C Z )

-1

( p - 2 )( g -1) .

Sample sizes commonly seen in many areas of application do not allow for the use of such an asymptotic normality result. Additionally, it is often desirable to avoid the assumption of multivariate normality. Therefore, the tests for the hypotheses of concern will be based on permutation estimates of the distributions (1) (1) ( ( of the test statistics, S B , SW and S I(1), or S B2), SW2) and SI( 2), and will not assume asymptotic convergence to the Chi-Square distribution. Examination of the efficiency, comparing approaches with and without applying the Fisher's ztransformation, indicated that the hypothesis tests should be based on the test ( ( statistics SB = S B2) for the between hypothesis, SW = SW2) for the within hypothesis, and SI = S I(1) (no Fisher's z-tranformation) for the within by between interaction hypothesis. These will be used in the proposed CORANOVA method and justification will be provided in the section entitled "Simulations to Estimate Efficiency for Fisher's Z" for using the Fisher's ztransformation for the within and between hypotheses, but not for the within by between interaction hypothesis. An estimate for the covariance matrix for R, V [g(p 1) × g(p 1)], is required. A bootstrap procedure is used to obtain an estimate of V, denoted ^ as VBoot . Although the bootstrap is based on asymptotics, it yields valid results for relatively small sample sizes. A bootstrap estimate for the covariance matrix for Z, V Z[g(p 1) × g(p 1)], is also required, and is obtained using the same bootstrap procedure on Z, rather than R, and is denoted as V Z Boot . Bootstrap Procedure for Estimating the Covariance Matrix of R, V: 1. Select a bootstrap sample from the original data set, using the following procedure: Randomly select a bootstrap sample, with replacement, from group 1, consisting of n1 observations. Repeat this process for each group, selecting bootstrap samples from groups 2, ..., g of sizes n2, ..., ng

MULTIVARIATE BEHAVIORAL RESEARCH 571

W. Bilker, C. Brensinger, and R. Gur

respectively. The collection of bootstrap samples from the g groups is henceforth called the "bootstrap sample". 2. Compute the correlations in the vector R based on the bootstrap sample. 3. Repeat steps 1 and 2 B times, yielding B estimates of the vector R, where B is a large integer. 4. Compute the covariance matrix R based on the B bootstrapped Rs, ^ VBoot , which is a bootstrap estimate of V. The test of the between effect requires the distribution of the test statistic, SB, under the null hypothesis of no between effect. Specifically, the right tail probability of this distribution is required to obtain the p-value for the test of significance. The permutation distribution of SB is the distribution of SB over all possible permutations of the assignment of observations to levels of the between factor. The permutation distribution of SB under the null hypothesis of no between effect approximates the underlying distribution of SB under the null hypothesis. When the number of possible permutations is large, which is often true even with small data sets, a large number of randomly selected permutations are used in lieu of evaluating SB for all possible permutations. Hence, this distribution is estimated by selecting random permutations of the between group variable labels. For each random ^ permutation, the bootstrap estimate of V, VBoot is estimated in the process of estimating SB. The p-value for the one-tailed significance test is then estimated by the right tail probability beyond the observed SB on the permutation distribution of SB. Individual Pearson correlations are not impacted by location shifts. However, differences in the means across the g groups of the between factor or differences in the means across the p groups of the within factor can result in large changes in the correlations of interest for individual permutations of the three test statistics. This problem can be avoided by centering the means such that the means across the g groups of the between factor and the means across the p groups of the within factor are all equal and centering the means of the common variable to be equal for all between groups. Centering the means has no impact on the underlying correlations. This centering is incorporated into the CORANOVA procedure. Similar to SB, the test of the within effect requires the distribution of the test statistic, SW, under the null hypothesis of no within effect. The permutation distribution of SW under the null hypothesis of no within effect approximates the underlying distribution of SW under the null hypothesis. This distribution is estimated by selecting random permutations of the within variable labels (2, ..., p) separately within each observation. The p-value for the one-tailed significance test is then estimated by the right tail probability

572 MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

beyond the observed SW on the permutation distribution of SW. Also, the test of the interaction of the between and within effects requires the distribution of the test statistic, SI, under the null hypothesis of no within by between interaction. The permutation distribution of SI under the null hypothesis is obtained by first randomly permuting the group labels between all observations and then separately within each observation randomly permuting the within variable labels. The means are first centered as described above and the p-value is computed as described above. The CORANOVA procedure is delineated below. The computed values of SB, SW and SI based on permutations are referred to as SB(P), SW(P) and SI(P) respectively. The CORANOVA Procedure for Testing the Three Hypotheses of Interest 1. Center the means such that the means across all levels of the between factor are equal, the means across all levels of the within factor are equal, and the mean of the common variable is equal for all between groups. ^ 2. Estimate the vector r and the matrix VBoot for the within by between interaction hypothesis. Estimate the vector z and the matrix V Z Boot for the between and within hypotheses. All of these are estimated from the observed data. 3. Compute SB, SW, and SI for the observed data. Recall that SB and SW are based on the Fisher's z-transformed correlations, while SI is based on untransformed correlations. 4. Use permutations to assess the p-value for the between hypothesis, which is based on the Fisher's z-transformed correlations. a. Randomly permute the group labels between all observations b. Compute the vector z, the matrix V Z Boot , and SB(P) for the permuted data c. Repeat steps 4a and 4b L times d. Determine the p-value from the permutation distribution, Pr[SB > SB(P)] 5. Use permutations to assess the p-value for the within hypothesis, which is based on the Fisher's z-transformed correlations. a. Randomly permute variable labels (2, ..., p) separately within each observation b. Compute the vector z, the matrix V Z Boot , and SW(P) for the permuted data c. Repeat steps 5a and 5b L times

MULTIVARIATE BEHAVIORAL RESEARCH

573

W. Bilker, C. Brensinger, and R. Gur

d. Determine the p-value from the permutation distribution, Pr[SW > SW(P)] 6. Use permutations to assess the p-value for within by between interaction hypothesis, which is based on the untransformed correlations. a. Randomly permute the group labels between all observations b. Randomly permute variable labels (2, ..., p) separately within each observation ^ c. Compute the vector r, the matrix VBoot , and SI(P) for the permuted data d. Repeat steps 6a, 6b and 6c L times e. Determine the p-value from the permutation distribution, Pr[SI > SI(P)] Simulations to Examine Efficiency and Power Simulations were utilized to estimate efficiencies and power curves for the proposed method. All simulations were performed on a SUN Microsystems E4000 Enterprise server with 4 SPARC Version 9 64-bit processors, running SunOS 5.8. These were done using SAS software (SAS, 2000). The "uniform" function in SAS was used for random number generation and is used in the available SAS macro. Efficiencies were computed to assess the impact of the Fisher's z-transformation on each of the three hypotheses, to compare the within hypothesis of CORANOVA to the parametric alternative method of Olkin and Finn, and to compare the between hypothesis of CORANOVA to the parametric alternative for each pair of correlations, the Fisher Z-test. The simulation procedure for the efficiencies is described first, which encompasses the simulation procedure used for estimating power. Simualtions to Estimate Efficiency for Fisher's Z To assess the use of the Fisher's z-transformation in this procedure, separate efficiencies were estimated for each of the three hypotheses, using simulations, to compare the procedure with and without the transformation. The efficiencies were computed as the ratio of the sample sizes needed to achieve either 80% or 70% power for each hypothesis, at detectable differences in correlations of 0.42 and 0.35, and at three values of the correlation between pairs for the within variables, 0.22, 0.40, 0.60. The simulations are based on g = 2 independent groups, with an equal number of observations per group, n1 = n2, and p = 6, which includes 1 common variable to be correlated with 5 within subject variables.

574

MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

For these simulations, the underlying distribution for each group was taken to be a 6-variate normal distribution. The simulated dataset for each group was generated from the 6-variate normal distribution with the specified set of parameters for that group, where the covariance matrices were set to have a specified correlation structure. A detailed description of the parameters of the 6-variate normal distribution follows. Let Y1 = (X11, X21, ..., X61) be distributed as Normal( 1, 1) and Y2 = (X12, X22, ..., X62) be distributed as Normal( 2, 2). The common variable to be correlated is considered to be in the first position for presentation purposes. The means of both distributions were set to 1 = 2 = (50, 75, 75, 75, 75, 75). Note that the means of all of the within variables are identical and that the means of both levels of the between variable are identical. Thus, any differences detected by CORANOVA in any of the three hypotheses tested are not due to differences in the means. The covariance matrices, j, j = 1, 2, and the corresponding correlation matrices, j, j = 1, 2, were taken to have the following form for all simulations.

10 j = j j j j 1 j = j j j j

j j j j j j j j j j

45

j j j j

45

j j j

j

45

j j

45

j

j j j j 45

j

j

j j j j j j

j j j j j j

j j j j j j

1 1

j

1 1

j

j j j j 1

j

For level j of the between variable, the covariance and correlation between the common variable and levels 1, 2, 3 and 4 of the within variable are j and j, and the covariance and correlation between the common variable and level 5 of the within variable are j and j. The covariance and correlation between pairs of the within variables, [cov(Xkj, Xlj); k, l 1, k l] and [ (Xkj, Xlj); k, l 1, k l] are j and j. The covariance between the common variable and levels 1, 2, 3 and 4 of the within variable for group 1, 1, was fixed at 13. This corresponds to

MULTIVARIATE BEHAVIORAL RESEARCH 575

W. Bilker, C. Brensinger, and R. Gur

fixing the correlation between these variables for group 1 to 1 = 0.613. The values of the other covariances 1, 1, 2, 2, and 2, in 1 and 2,were selected to achieve specified differences in the correlations, 1, 1, 2, 2, and 2, in the correlation matrices, 1 and 2. For example, suppose a difference of = 0.42 is desired between the correlations for levels 1 and 5 of the within variable with the common variable, 1 1 = 0.613 1 = = 0.42. This is achieved by setting 1 = 0.193, which corresponds to setting the covariance between the common variable and level 5 of the within variable, , to 4.090. 1 The three hypotheses being tested in these simulations correspond to the following structures for the correlation vector, R. 1. Within Hypothesis R = ( 0.613, 0.613, 0.613, 0.613, 0.613 , (Group 1 correlations) 0.613, 0.613, 0.613, 0.613, 0.613 ) (Group 2 correlations) 2. Between Hypothesis R = ( 0.613, 0.613, 0.613, 0.613, 0.613, 0.613 , 0.613 , 0.613 , 0.613 , 0.613 ) 3. Within by Between Interaction Hypothesis R = ( 0.613, 0.613, 0.613, 0.613, 0.613 0.613, 0.613, 0.613, 0.613, 0.613 ) The computation of the efficiency as laid out below required the computation of power. The power for each of the three hypotheses of CORANOVA were considered separately in terms of the simulations. The power for each of the hypotheses, for a particular set of parameters, was estimated using the following procedure. Based on the specified set of parameters of the 6-variate normal distribution for each of the two groups, 300 complete data sets were simulated, where each complete data set consisted of a randomly generated set of data from each group of the specified sample size, n1 = n2. The CORANOVA procedure for the hypothesis under consideration was then applied to each of the 300 simulated data sets, yielding 300 p-values. The power at the specified set of parameters was then estimated by the percentage of times the 300 simulated data sets yielded p-values less than = 0.05 (Beran, 1986). In the evaluation of the p-value for each data set, the CORANOVA procedure used 300 bootstraps to estimate the covariance matrix of the correlations and 300 permutations to estimate the distribution of the test statistic under the null hypotheses. This was repeated for the CORANOVA procedure with and without the use of the Fisher's ztransformation. The efficiencies for each hypothesis and each set of parameters were computed using the following approach, in lieu of a closed form for the

576 MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

efficiency. Iterative trials were used to determine the two sample sizes between which 80% power is achieved. These sample sizes were obtained for the cases with and without the use of the Fisher's z-transformation. The interpolated floating point values of the sample sizes per group that corresponds to 80% were determined. The estimate of the efficiency for each scenario is computed as the ratio of the sample size required for 80% power using the Fisher's z-transformation relative to not using the transformation, where an efficiency < 1 indicates that use of the Fisher's ztransformation is more efficient. The same procedure as used to obtain the efficiencies at 70% power. The results indicate that using the Fisher's z-transformation provides substantial improvement in efficiency for the within hypothesis relative to not using this transformation. Considering a power of 80%, a detectable difference of = 0.42, and a correlation between pairs of within variables, = 1 = 2 = 0.22, the efficiency is 0.749, indicating a loss of 25% efficiency by not using the Fisher's z-transformation. The efficiency is 0.925 for = 0.40 and 0.717 for = 0.60. For = 0.22, 0.40, and 0.60, for 70% power with = 0.42, the efficiencies are 0.805, 0.927, and 0.886, respectively. At these values of , for 80% power and = 0.35, the efficiencies are 0.881, 0.796, and 0.900. For 70% power and = 0.35, the efficiencies are 0.784, 0.882, and 0.845, respectively. For the between hypothesis, the efficiencies at 80% power, = 0.42, and = 0.22, 0.40, and 0.60 respectively are 0.926, 0.983, and 1.041. At 70% power and = 0.42, the efficiencies are 0.896, 0.980, and 1.010. For 80% power and = 0.35, the efficiencies are 0.898, 0.954, and 1.017. At 70% power and = 0.35, the efficiencies are 0.898, 0.950, and 1.010. Thus, there appears to be a slight improvement in efficiency for the between hypothesis when the Fisher z-transformation is used in cases where the correlation between pairs of within variables is low, which disappears for higher values of this correlation. The simulations for the within by between interaction hypothesis tell a different story. The efficiencies at 80% power, = 0.42, and = 0.22, 0.40, and 0.60 respectively are 1.162, 1.047, and 1.193. At 70% power and = 0.42, the efficiencies are 1.094, 1.029, and 1.148. For 80% power and = 0.35, the efficiencies are 1.063, 1.199, and 0.948. At 70% power and = 0.35, the efficiencies are 1.154, 1.167, and 0.987. Thus, with a correlation between pairs of within variables of 0.40, at a power of 80% and at = 0.35, there is a 1 (1/1.99) = 16.6% loss in efficiency when using the Fisher's z-transformation and a 16.2% loss in efficiency for a correlation of 0.60, at = 0.42 at 80% power. Thus, use of the Fisher's z-transformation for the within by between interaction hypothesis would result in a significant

MULTIVARIATE BEHAVIORAL RESEARCH 577

W. Bilker, C. Brensinger, and R. Gur

efficiency loss. Based on these results, the CORANOVA procedure is defined to use the Fisher's z-transformation for the between and within hypotheses, and to use the untransformed Pearson correlations for the within by between interaction hypothesis. Simulations to Estimate Power A series of simulations was performed to estimate power curves (Beran, 1986) of the three hypothesis tests in the CORANOVA procedure. As in the simulations for efficiencies, these simulations are based on g = 2 independent groups, with n1 = n2 observations per group and p = 6, which includes 1 common variable to be correlated with 5 within subject variables. The power simulations estimated the power curves at four sample sizes, n1 = n2 = 15, 30, 50, and 75, and at three values of the correlation between pairs of within variables, the within variable correlation, 0.22, 0.40, and 0.60. Each power curve was estimated at 9 points, representing detectable differences in the correlations of = 0.07 × (0...8) or 0 to 0.56. For each point of each power curve, 300 data sets were simulated, each consisting of a randomly generated data set per group, having the specified sample size and based on the specified set of parameters of the 6-variate normal distributions. The CORANOVA procedure was then applied to each of the 300 simulated data sets, yielding 300 simulated p-values for each of the three effects. The power for each effect with the specified set of correlations for each group and at the selected sample sizes was estimated by the percentage of times the 300 simulated data sets yielded p-values less than 0.05 for that effect. In the evaluation of the p-value for each data set, the CORANOVA model used 300 bootstraps to estimate the covariance matrix of the correlations and 300 permutations to estimate each of the distributions of the test statistics under the null hypotheses. Simulating more than 300 data sets was not feasible due to computing constraints. Each CORANOVA model took about 9 minutes on average to complete. The results of the simulations based on underlying multivariate data are presented in Tables 1, 2, and 3, which give the power for each scenario considered for the within, between and within by between interactions, respectively. The where 80% and 50% power is achieved under each scenario is provided at the bottom of each table. The specific correlations and parameters used in the simulations are also provided in these tables. Figures 1, 2, and 3 display the estimated power results for a within variable correlation of 0.4, for sample sizes per group of 15, 30, 50, and 75, for the within, between and within by between interactions, respectively. Additionally, Figures 4, 5, and 6 display the

578 MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

Table 1 Power for Within Factor Differences The correlations in these simulations are of the following form:

R = ( 0.613, 0.613, 0.613, 0.613, 0.613 , 0.613, 0.613, 0.613, 0.613, 0.613 ) ( 1, ( 1, 0.00 0.00 0.00 0.07 0.07 0.07 0.14 0.14 0.14 0.21 0.21 0.21 0.28 0.28 0.28 0.35 0.35 0.35 0.42 0.42 0.42 0.49 0.49 0.49 0.56 0.56 0.56

1

) ) 1

( 2, ( 2,

2

) ) 2

1

= 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27

2

1

=

2

n per between level 15 30 50 75 0.077 0.067 0.057 0.050 0.067 0.060 0.110 0.083 0.103 0.143 0.153 0.180 0.207 0.220 0.390 0.250 0.390 0.583 0.383 0.480 0.807 0.563 0.730 0.920 0.670 0.823 0.983 -- 0.543 0.418 0.465 0.426 0.320 0.050 0.050 0.050 0.080 0.087 0.070 0.163 0.147 0.250 0.227 0.360 0.467 0.467 0.580 0.800 0.673 0.817 0.937 0.840 0.957 0.997 0.957 0.983 1.000 0.993 0.997 1.000 0.403 0.345 0.280 0.291 0.255 0.217 0.053 0.047 0.030 0.110 0.087 0.087 0.207 0.310 0.373 0.477 0.697 0.803 0.747 0.867 0.983 0.907 0.990 1.000 0.990 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.303 0.253 0.209 0.216 0.174 0.161 0.040 0.043 0.040 0.137 0.117 0.193 0.390 0.437 0.633 0.710 0.830 0.967 0.940 0.983 1.000 0.993 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.237 0.205 0.175 0.164 0.151 0.119

579

(0.613, 0.613) (0.613, 0.613) (13.000, 13.000) (13.000, 13.000) (0.613, 0.543) (13.000, 11.515) (0.613, 0.543) (13.000,11.515)

(0.613, 0.473) (0.613, 0.473) (13.000, 10.030) (13.000, 10.030) (0.613, 0.403) (13.000, 8.545) (0.613, 0.333) (13.000, 7.060) (0.613, 0.263) (13.000, 5.575) (0.613, 0.193) (13.000, 4.090) (0.613, 0.123) (13.000, 2.606) (0.613,0.053) (13.000, 1.121) (0.613, 0.403) (13.000, 8.545) (0.613, 0.333) (13.000, 7.060) (0.613, 0.263) (13.000, 5.575) (0.613, 0.193) (13.000, 4.090) (0.613, 0.123) (13.000, 2.606) (0.613,0.053) (13.000, 1.121)

0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60

where 80% power is achieved (interpolated) where 50% power is achieved (interpolated)

MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

Table 2 Power for Between Factor Differences The correlations in these simulations are of the following form:

R = ( 0.613, 0.613, 0.613, 0.613, 0.613 , 0.613 , 0.613 , 0.613 , ( 1, ( 1, 0.00 0.00 0.00 0.07 0.07 0.07 0.14 0.14 0.14 0.21 0.21 0.21 0.28 0.28 0.28 0.35 0.35 0.35 0.42 0.42 0.42 0.49 0.49 0.49 0.56 0.56 0.56

1

0.613, 0.613 ) n per between level 15 30 50 75 0.047 0.043 0.053 0.073 0.073 0.070 0.173 0.110 0.083 0.233 0.130 0.153 0.403 0.253 0.177 0.503 0.313 0.243 0.583 0.450 0.380 0.730 0.573 0.417 0.797 0.647 0.493 0.563 -- -- 0.348 0.448 0.566 0.047 0.057 0.037 0.110 0.103 0.057 0.273 0.183 0.157 0.547 0.313 0.257 0.713 0.560 0.373 0.847 0.673 0.517 0.930 0.777 0.660 0.967 0.920 0.753 0.990 0.937 0.837 0.325 0.431 0.529 0.198 0.263 0.342 0.053 0.060 0.033 0.157 0.103 0.080 0.460 0.313 0.233 0.750 0.593 0.433 0.913 0.767 0.587 0.987 0.923 0.757 0.997 0.967 0.860 1.000 0.977 0.907 1.000 0.993 0.987 0.231 0.295 0.379 0.150 0.187 0.240 0.083 0.050 0.027 0.240 0.177 0.113 0.633 0.468 0.417 0.917 0.773 0.607 0.970 0.903 0.780 1.000 0.983 0.910 1.000 0.993 0.980 1.000 1.000 0.990 1.000 1.000 1.000 0.181 0.224 0.291 0.116 0.147 0.171

) ) 1

( 2, ( 2,

2

) ) 2

1

= 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27

2

1

=

2

(0.613, 0.613) (0.613, 0.613) (13.000, 13.000) (13.000, 13.000) (0.613, 0.613) (0.543, 0.543) (13.000, 13.000) (11.515, 11.515) (0.613, 0.613) (0.473, 0.473) (13.000, 13.000) (10.030, 10.030) (0.613, 0.613) (13.000, 13.000) (0.613, 0.613) (13.000, 13.000) (0.613, 0.613) (13.000, 13.000) (0.613, 0.613) (13.000, 13.000) (0.613, 0.613) (13.000, 13.000) (0.613, 0.613) (13.000, 13.000) (0.403, 0.403) (8.545, 8.545) (0.333, 0.333) (7.060, 7.060) (0.263, 0.263) (5.575, 5.575) (0.193, 0.193) (4.090, 4.090) (0.123, 0.123) (2.606, 2.606) (0.053, 0.053) (1.121, 1.121)

0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60

where 80% power is achieved (interpolated) where 50% power is achieved (interpolated)

580

MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

Table 3 Power for Within by Between Interactions The correlations in these simulations are of the following form:

R = ( 0.613, 0.613, 0.613, 0.613, 0.613, 0.613, 0.613, 0.613, 0.613, 0.613 ) ( 1, ( 1, 0.00 0.00 0.00 0.07 0.07 0.07 0.14 0.14 0.14 0.21 0.21 0.21 0.28 0.28 0.28 0.35 0.35 0.35 0.42 0.42 0.42 0.49 0.49 0.49 0.56 0.56 0.56

1

) ) 1

( 2, ( 2,

2

) ) 2

1

= 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27 10 18 27

2

1

=

2

n per between level 15 30 50 75 0.060 0.040 0.047 0.047 0.037 0.060 0.060 0.057 0.083 0.057 0.070 0.077 0.090 0.083 0.127 0.073 0.147 0.140 0.117 0.167 0.197 0.147 0.193 0.257 0.257 0.213 0.367 -- -- -- -- -- -- 0.067 0.050 0.050 0.060 0.047 0.063 0.070 0.097 0.113 0.093 0.077 0.100 0.120 0.180 0.203 0.217 0.247 0.313 0.267 0.367 0.493 0.360 0.410 0.593 0.453 0.550 0.713 -- -- -- -- 0.535 0.425 0.053 0.033 0.037 0.070 0.057 0.060 0.087 0.097 0.127 0.167 0.147 0.220 0.220 0.250 0.367 0.323 0.397 0.540 0.470 0.563 0.783 0.587 0.700 0.860 0.673 0.810 0.957 -- 0.554 0.435 0.438 0.394 0.334 0.053 0.060 0.040 0.073 0.037 0.060 0.130 0.137 0.153 0.200 0.250 0.317 0.320 0.440 0.590 0.517 0.603 0.770 0.657 0.717 0.937 0.830 0.870 0.967 0.890 0.940 0.987 0.478 0.458 0.363 0.344 0.306 0.257

581

(0.613, 0.613) (0.613, 0.613) (13.000, 13.000) (13.000, 13.000) (0.613, 0.613) (0.613, 0.543) (13.000, 13.000) (13.000, 11.515) (0.613, 0.613) (0.613, 0.473) (13.000, 13.000) (13.000, 10.030) (0.613, 0.613) (13.000, 13.000) (0.613, 0.613) (13.000, 13.000) (0.613, 0.613) (13.000, 13.000) (0.613, 0.613) (13.000, 13.000) (0.613, 0.613) (13.000, 13.000) (0.613, 0.613) (13.000, 13.000) (0.613, 0.403) (13.000, 8.545) (0.613, 0.333) (13.000, 7.060) (0.613, 0.263) (13.000, 5.575) (0.613, 0.193) (13.000, 4.090) (0.613, 0.123) (13.000, 2.606) (0.613, 0.053) (13.000, 1.121)

0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60 0.22 0.40 0.60

where 80% power is achieved (interpolated) where 50% power is achieved (interpolated)

MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

Figure 1

Power Curves for the Within Factor Effect for Between Variable Correlation, Sample Sizes of 15, 30, 50, and 75 per Group

1

=

2

= 0.4, for

Figure 2

Power Curves for the Between Factor Effect for Between Variable Correlation, for Sample Sizes of 15, 30, 50, and 75 per Group 582

1

=

2

= 0.4,

MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

Figure 3

Power Curves for the Within by Between Interaction for Between Variable Correlation, = 0.4, for Sample Sizes of 15, 30, 50, and 75 per Group

1

=

2

Figure 4

Power Curves for the Within Factor Effect for a Sample Size of 50 per Group, for Between Variable Correlations 1 = 2 = 0.22, 0.4, and 0.60 MULTIVARIATE BEHAVIORAL RESEARCH 583

W. Bilker, C. Brensinger, and R. Gur

Figure 5

Power Curves for the Between Factor Effect for a Sample Size of 50 per Group, for Between Variable Correlations 1 = 2 = 0.22, 0.4, and 0.60

Figure 6

Power Curves for the Within by Between Interaction for a Sample Size of 50 per Group, for Between Variable Correlations 1 = 2 = 0.22, 0.4, and 0.60 584 MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

estimated power results for a sample size per group of 50, for within variable correlations of 0.22, 0.4, and 0.6, for the within, between and within by between interactions, respectively. A sample size of 75 per between level, with a between variable correlation of 0.4, achieves a power of 80% (50%) to detect a difference in correlations of = 0.205 (0.151) for the within effect, 0.224 (0.147) for the between effect, and 0.458 (0.306) for the within by between interaction. A sample size of 30 per between level, with a between variable correlation of 0.4, achieves a power of 80% (50%) to detect a difference in correlations of 0.345 (0.255) for the within effect and 0.431 (0.263) for the between effect. The within by between interaction does not achieve 80% power considering values of up to 0.56, but does achieve 50% power at = 0.535. It is important to note the differential impact of the correlation between pairs of within variables, , on the power. For a fixed sample size and a fixed detectable difference in correlations, the power increases with increasing for the within effect, decreases with increasing for the between effect, and increases with increasing for the within by between interaction (illustrated in Tables 1-3 and Figures 4-6). There are a small number of instances where it appears that these patterns do not hold, however, in each such case the deviation from the expected pattern is within the 95% confidence intervals for the power estimates. The conditional tests in CORANOVA are based on average correlations (or average Fisher Z-transformed correlations). In ANOVA, the conditional tests are based on means, which are linear combinations of the data, whereas correlations are not based on similar linear combinations. Simulations described here demonstrate that this distinction is not important in this case, and that we are testing the homogeneity of the correlations with respect to one factor while controlling for the other factor. When estimating each power curve in the presence of a within (withinsubject) effect, the associated power curve for the between (betweensubject) effect was also estimated. For each set of parameters considered in these simulations, the power for the between effect remained at the significance level, 0.05, as expected. Figure 7 shows the power curves for the within and between effects for the case where there is a within effect present but there is no between effect present, with a between variable correlation of 1 = 2 = 0.4 and a sample size of 75 per group. The power for the between effect while controlling for a within effect remains at approximately the significance level of 0.05 over the full range of differences in the correlations, deltas, as expected. Similarly, Figure 8 shows the power curves for the within and between effects for the case where there is a between effect present but there is no within effect present, with a between

MULTIVARIATE BEHAVIORAL RESEARCH 585

W. Bilker, C. Brensinger, and R. Gur

Figure 7

Power Curves for the Within and Between Factor Effects in the Presence of a Within Factor Effect with no Between Factor Effect, with a Between Variable Correlation of 1 = 2 = 0.4 and a Sample Size of 75 per Group

Figure 8

Power Curves for the Within and Between Factor Effects in the Presence of a Between Factor Effect with no Within Factor Effect, with a Between Variable Correlation of 1 = 2 = 0.4 and a Sample Size of 75 per Group 586 MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

variable correlation of 1 = 2 = 0.4 and a sample size of 75 per group. The power for the within effect while controlling for a between effect remains at approximately the significance level of 0.05 over the full range of differences in the correlations, deltas, as expected. Similar information has been obtained for between variable correlations of 1 = 2 = 0.22 and 0.6 as well as different sample sizes, all with the same results. These data illustrate that the proposed test of the within effect provides a test of within-subject homogeneity controlling for between-subject heterogeneity, while the proposed test of the between effect provides a test of between-subject homogeneity controlling for within-subject heterogeneity. The above results are all based on multivariate normal data. Some assessment of the power of the test under non-normal data was also made. Data were randomly generated from 3-variate multivariate nonnormal distributions, where each of the three variables are distributed as a chisquare with 2 degrees of freedom, and the three variables have a specified correlation matrix between variables. This was accomplished using the method of Headrick and Sawilowsky (1999). There are three variables within each observation, with one of them being the common variable. Each estimate of power was based on 1000 simulated nonnormal datasets, using the same procedures applied to the multivariate normal data. The power under the null hypothesis of no difference in the correlations being tested, = 0, should be the significance level, 0.05 in these simulations, for a test that has the correct level. When the level is exceeded the test is not valid. Consider the case of one group. For this case, only the within hypothesis of CORANOVA is computed and this represents a nonparametric alternative to the Olkin and Finn (1990) test. For the within hypothesis with = 0 (both correlations = 0.613), a between variable correlation of 0.6, and a sample size 15, the estimated power is 0.061 for CORANOVA and 0.118 for Olkin and Finn. For a sample size of 75, these are 0.061 and 0.168, respectively. For a sample size of 1000, these are 0.038 and 0.154, respectively. If the between variable correlation is set to 0.4, then the respective power estimates are 0.065 and 0.107 for a sample size of 15 and are 0.057 and 0.145 for a sample size of 75. The results indicate that for the highly skewed nonnormal data considered, the level of the CORANOVA within test is approximately correct. The level of the Olkin and Finn test exceeds the 0.5 level, and thus the Olkin and Finn procedure should not be used for this type of nonnormal data. Consider the case of two groups, with only two variables per observation. For this case, there is only one correlation for each group, and only the between hypothesis of CORANOVA is computed, which is a nonparametric alternative to the Fisher Z-test. Data were generated from a 2-variate distribution, where each

MULTIVARIATE BEHAVIORAL RESEARCH 587

W. Bilker, C. Brensinger, and R. Gur

variable was chi-square with 2 degrees of freedom, with a specified correlation between the two variables. For the between hypothesis with = 0 ( both correlations = 0.613) and a sample size 75 per group, the estimated power is 0.045 for CORANOVA and 0.131 for the Fisher Z-test. For a sample size of 1000 per group, the estimated power is 0.041 for CORANOVA and 0.148 for the Fisher Z-test. The results indicate that the level of the CORANOVA between test is approximately correct. However, the level of the Fisher Z-test for this nonnormal data exceeds the level of 0.05, and thus the Fisher Z-test should not be used for such nonnormal data. Another potential concern with using parametric asymptotic procedures is the level of the tests for small sample sizes. Consider one group where both the within hypothesis of CORANOVA and the test of Olkin and Finn (1990) are applied to test the hypothesis that the within correlations are all equal, where data are generated under the null hypothesis of no correlation differences. The results show that for a between variable correlation of 0.22, the power for sample sizes of 10 and 15 were 0.185 and 0.154, respectively, for the Olkin and Finn test. For CORANOVA these were 0.053 and 0.051. Considering a between variable correlation of 0.4, the power estimates were 0.104 and 0.093 for Olkin and Finn and 0.049 and 0.052 for CORANOVA. For a between variable correlation of 0.60, the power estimates were 0.060 and 0.053 for Olkin and Finn and 0.056 and 0.050 for CORANOVA. These results show that for small sample sizes, the test of Olkin and Finn is not appropriate when the between variable correlation is low, but it appears to be valid when this correlation is high. Similar simulations were performed to assess the level of the between CORANOVA hypothesis and the Fisher Z-test for the case where there is only one within variable. The results indicate that the Fisher Z-test may well be appropriate for small sample sizes. For a between variable correlation of 0.22, and sample sizes of 10 and 15, the power was 0.032 and 0.050, respectively, for the Fisher Z-test. The power was 0.044 and 0.049, respectively, for the CORANOVA between hypothesis, indicating that the CORANOVA procedure has the correct level. Efficiency Comparing to Parametric Method The method of Olkin and Finn (1990) is a large sample asymptotic method for testing the homogeneity of a set of correlated correlations based on the assumption that the data come from a multivariate normal distribution. In the special case where there is one group g = 1 and multiple within levels, the within hypothesis of the CORANOVA procedure is a nonparametric alternative to the Olkin and Finn procedure.

588 MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

The efficiency comparing these two approaches is desired. The simulations were performed similarly to those described above. At each sample size considered, 300 data sets were randomly generated from a 6-variate normal distribution for the single group. The difference between the correlations, , for levels 1 and 5 of the within variable with the common variable was set to 0.42 as in the section entitled "Simuations to Estimate Efficiency for Fisher's Z". The values of 0.22, 0.40, and 0.60, were considered for the correlation between pairs of within variables. Iterative trials and then interpolation were used to determine the floating point values of the sample sizes per group that corresponds to 80% and 70% power. The efficiency for each between pairs correlation is computed as the ratio of the sample size required for 80% power using the Olkin and Finn (1990) procedure, relative to the required sample size using the within hypothesis test of the CORANOVA procedure, which uses the Fisher's z-transformation. An Efficiency < 1 indicates that the Olkin and Finn parametric procedure has greater efficiency than the nonparametric procedure presented here. The results show that there is only a small loss of efficiency when using the CORANOVA nonparametric approach with underlying normal data. The efficiency estimates are 0.933 and 0.887 for correlations between pairs of within variables of 0.22 and 0.40 respectively. The estimate of the efficiency for a correlation of 0.6 is 1.057, which likely indicates no loss of efficiency considering simulation variation. For = 0.35, the efficiency estimates were 0.827, 0.931, and 0.945 for correlations between pairs of within variables of 0.22, 0.40 and 0.60 respectively. Similar efficiencies were found at 70% power. Similar estimates of efficiency were also obtained for the Fisher Z-test compared to the between group hypothesis of CORANOVA. For a between variable correlation of 0.22, the efficiencies for 80% power were 0.972 and 0.946 for = 0.42 and 0.35 respectively. The efficiencies were 0.965 and 0.984 for 70% power. Application It is known that in humans verbal memory is served primarily by the left temporal-limbic regions of the brain, and it is expected that as a subject performs a verbal memory task, there would be increased blood flow to the left hemisphere, relative to the right hemisphere, in these regions. It has also been established that women have better verbal memory than men (e.g., Saykin et al., 1995). While the underlying biological reasons for this are unknown, a leading hypothesis links sex differences in performance to the degree of lateralized cerebral blood flow (CBF) perfusion. To test the laterality hypothesis, the following experiment was conducted in 28

MULTIVARIATE BEHAVIORAL RESEARCH 589

W. Bilker, C. Brensinger, and R. Gur

participants, 14 men and 14 women. Under resting baseline conditions (eyes open, ears unoccluded, awake) the blood flow rate was measured in multiple regions of the brain, in both the left and right hemispheres, using PET (positron emission tomography). Laterality for blood flow rate, defined as the blood flow rate in the left hemisphere minus the rate in the right hemisphere, was measured in the temporal (mid-temporal), frontal, and subcortical regions of the brain. Standard memory tasks, such as the California Verbal Learning Test (CVLT) and the Wechsler Memory ScaleRevised (WMS-R), were administered in a separate session within two weeks of the PET study. Thus, for each subject the data included the laterality of blood flow in three brain regions and a verbal memory score. The correlations under consideration are shown in Table 4, where the subscripts T, S, and F, represent the laterality of blood flow in the temporal, subcortical, and frontal regions respectively and V represents the verbal memory score. Scatterplots of the laterality of cerebral blood flow and verbal memory performance in males (n = 14) and females (n = 14) for the Temporal, Frontal, and Subcortical brain regions are provided in Figure 9. A regression line from verbal memory performance regressed on activation is provided for each scatterplot in Figure 9. A plot of the correlations between laterality (L-R) of cerebral blood flow and verbal memory performance for the temporal, frontal, and subcortical brain regions in males and females is provided in Figure 10. This data has one within factor, brain region, and one between factor, gender, and has the verbal memory score as the common variable. Thus, the CORANOVA method is appropriate for examining the patterns in the correlated correlations. The sample sizes are small, as is common in functional neuroimaging, especially when the study protocol uses expensive imaging procedures such as PET. The correlation between the laterality of cerebral blood flow and verbal memory performance does not differ across gender (p = 0.248). It is the

Table 4 Gender Differences in Laterality of Blood Flow in Three Brain Regions Gender Temporal Male Female

590

M V ,T F V ,T

Blood Flow Rate Laterality SubCortical

M V ,S F V ,S

Frontal

M V ,F F V ,F

= 0.340 = +0.812

= +0.641 = +0.491

= 0.032 = 0.212

MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

Figure 9

Scatterplots of Laterality (L-R) of Cerebral Blood Flow and Verbal Memory Performance in Males (n = 14) and Females (n = 14) for the Temporal, Frontal, and Subcortical Brain Regions

Figure 10

Correlations Between Laterality (L-R) of Cerebral Blood Flow and Verbal Memory Performance in Males (n = 14) and Females (n = 14), for the Temporal, Frontal, and Subcortical Brain Regions MULTIVARIATE BEHAVIORAL RESEARCH 591

W. Bilker, C. Brensinger, and R. Gur

average of the three regional correlations for each gender. The correlation between the laterality of cerebral blood flow and verbal memory performance does not differ across brain region (p = 0.096). This is not significant, but since there appears to be no correlation in the frontal region and a strong correlation in the subcortical region, this would be important to explore with future larger studies. If such a difference were present it would be indicative of better verbal memory performance associated with relatively higher activation of the left hemisphere. An interaction between gender and region is present (p = 0.026), and the differences in the regional correlations vary between gender groups. Females show a high correlation of verbal memory score and laterality of blood flow in the temporal and subcortical regions while males show a high correlation of verbal memory and laterality of blood flow in the subcortical region but not the temporal region. Laterality of blood flow of a brain region implies that one hemisphere of the brain has a higher blood flow rate for that region. A plausible biological hypothesis supported by these results is that left temporal and subcortical activation play a combined role in increasing the verbal memory in females compared to males. This hypothesis should be explored in future neuropsychological research. The above results were all based on using CORANOVA with the Pearson correlation coefficient as the measure of association. When these same analyses are performed using the Spearman correlation coefficient (an option available in the CORANOVA SAS macro), there is little change. The p-values are 0.550 for the between hypothesis, 0.108 for the within hypothesis, and 0.006 for the interaction hypothesis. Discussion Multiple methodologies have been developed for the comparison of correlated correlations. These include methods for greater than two overlapping correlated correlations, both under normality assumptions and without these assumptions. We presented a nonparametric approach to the problem of testing a set of overlapping correlated correlations where there is one between factor and one within factor. The method is akin to a two factor analysis of variance for correlated correlations. It is common to see small sample sizes in medical data, such as in imaging studies (PET, MRI, fMRI, etc.), which are very costly. In such cases, the normality assumption is not generally desirable. Unlike other existing methods for testing the homogeneity of a set of overlapping correlated correlations, this approach does not depend on asymptotic normality assumptions. The method presented also provides the ability to test

592 MULTIVARIATE BEHAVIORAL RESEARCH

W. Bilker, C. Brensinger, and R. Gur

the interaction of the within and between factors of the correlated correlations. LISREL provides an alternative approach to testing these hypotheses for the case of large sample multivariate normal data, which unfotunately, is not typical in many areas of research. A simulation study guided the decision to use the Fisher's ztransformation for the within and between tests, but not for the within by between interaction test, since application of this transformation results in a loss of efficiency for this test. Simulations demonstrated that there is a relatively small loss of efficiency when using the within hypothesis test of CORANOVA relative to the Olkin and Finn (1990) procedure. It was also demonstrated that there is a relatively small loss of efficiency when using the between hypothesis test of CORANOVA relative to the two-sample Fisher Z-test for comparing two independent correlations. Simulation studies also showed that the CORANOVA method is valid both for non-normal data and for small sample sizes. The Olkin and Finn (1990) method is shown to have an inflated level for both small sample sizes and non-normal data, indicating that this test is not valid in these cases. The Fisher Z-test is shown to have the correct level for small sample sizes, but not for non-normal data. Thus, the Fisher Z-test test should not be applied for non-normal data. CORANOVA permits the testing of correlations across a single within and a single between factor as well as the interaction of these factors. This approach can easily be adapted to perform these comparisons of correlations based on partial correlations, adjusting all correlations for covariates. The joint distribution of the sample partial correlation coefficients is the same as the joint distribution of the sample (non-partialed) correlation coefficients, but with an effectively reduced sample size (Andersen, 1958; Dunn & Clark, 1969). Therefore, any method used for testing the homogeneity of the sample overlapping correlation coefficients is also appropriate for testing the homogeneity of the sample overlapping partial correlation coefficients. This result also shows the validity of the between, within, and interaction hypotheses of CORANOVA adapted to the case of partial correlations. The CORANOVA method has been implemented in a SAS software macro (SAS, 2000) and is available at www.cceb.upenn.edu/main/people/ docs/coranova.sas or www.med.upenn.edu/bbl/pubs/publications/coranova.sas. The macro includes options to perform CORANOVA on partial correlations and to use Pearson or Spearman correlations.

MULTIVARIATE BEHAVIORAL RESEARCH

593

W. Bilker, C. Brensinger, and R. Gur

References

Andersen, T. W. (1958). An introduction to multivariate statistical analysis. New York: John Wiley & Sons. Beran, R. (1986). Simulated power functions. Annals of Statistics, 14, 151-173. Choi, S. C. (1977). Tests of equality of dependent correlation coefficients. Biometrika, 64, 645-647. Cohen, A. (1989). Comparison of correlated correlations. Statistics in Medicine, 8, 1485-1495. Dunn, O. J. & Clark, V. (1969). Correlation coefficients measured on the same individuals. Journal of the American Statistical Association, 64, 366-377. Dunn, O. J. & Clark, V. (1971). Comparison of tests of the equality of dependent correlation coefficients. Journal of the American Statistical Association, 66, 904-908. Fisher, R. A. (1921). On the probable error of a coefficient of correlation deduced from a small sample. Metron, 1, 3-32. Headrick, T. C. & Sawilowsky, S. S. (1999). Simulating correlated multivariate nonnormal distributions: Extending the Fleishman power method. Psychometrika, 64, 25-35. Hotelling, H. (1940). The selection of variates for use in prediction with some comments on the general problem of nuisance parameters. Annals of Mathematical Statistics, 11, 271-283. May, K. & Hittner, J. B. (1997a). A note on statistics for comparing dependent correlations. Psychological Reports, 80, 475-480. May, K. & Hittner, J. B. (1997b). Tests for comparing dependent correlations revisited: A Monte Carlo study. The Journal of Experimental Education, 65, 257-269. Meng, X. I., Rosenthal, R., & Ruben, D. (1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111, 172-175. Neill, J. J. & Dunn, O. J. (1975). Equality of dependent correlation coefficients. Biometrics, 31, 531-543. Olkin, I. (1967). Correlations revisited. In J. Stanley (Ed.), Proceedings of the Symposium on Educational Research: Improving experimental design and statistical analysis, (pp. 102156). Chicago: Rand McNally. Olkin, I. & Finn, J. (1990). Testing correlated correlations. Psychological Bulletin, 108, 330-333. Paul, S. R. (1989). Testing for the equality of several correlation coefficients. Canadian Journal of Statistics , 17, 217-227. Pearson, K. & Filon, L. N. G. (1898). Mathematical contributions to the theory of evolution. IV. On the probable errors of frequency constants and on the influence of random selection on variation and correlation. Philosophical Transactions of the Royal Society of London, Series A, 191, 229-311. Raghunathan, T. E., Rosenthan, R., & Rubin, D. B. (1996). Comparing correlated but nonoverlapping correlations. Psychological Methods, 1, 178-183. SAS Institute (2000). SAS software, Version 8.1. Cary, NC: Author. Saykin, A. J., Gur, R. C., Gur, R. E., Shtasel, D. L., Flannery, K. A., Mozley, L. H., Malamut, B. L., Watson, B., & Mozley, P. D. (1995). Normative neuropsycholgical test performance: Effects of age, education, gender and ethnicity. Applied Neuropsychology, 2, 79-88. Steiger, J. H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87, 245-251. Williams, E. J. (1959). Significance of difference between two nonindependent correlation coefficients. Biometrics, 15, 135-136.

Accepted April, 2004.

594 MULTIVARIATE BEHAVIORAL RESEARCH

#### Information

##### Bilker.pmd

30 pages

#### Report File (DMCA)

Our content is added by our users. **We aim to remove reported files within 1 working day.** Please use this link to notify us:

Report this file as copyright or inappropriate

1238960

### You might also be interested in

^{BETA}