# Standardized regression coefficients as indices of effect sizes in meta-analysis

TABLE OF CONTENTS

LIST OF TABLES

................................ ................................ ................................ .....................

VIII

LIST OF FIGURES

................................ ................................ ................................ ......................

IX

ABSTRACT

................................ ................................ ................................ ................................ ...

X

1. CHAPTER ONE :

INTRODUCTION

................................ ................................ ............................

1

Purpose of the Dissertation

................................ ................................ ................................ .........

3

The Organization of the Dissertation

................................ ................................ ..........................

3

2. CHAPTER TWO :

LITERATURE REVIEW

................................ ................................ ................

5

Combining Regression Coefficients

................................ ................................ ...........................

5

Scaling Issue

................................ ................................ ................................ ...........................

6

Standardized Regression Slopes

................................ ................................ .............................

7

Other Indices

of Regression Slopes

................................ ................................ ........................

8

Summarizing t Statistics

................................ ................................ ................................ ........

11

Partial Standardized Mean Difference Effect Size

................................ ...............................

12

Applied Meta - analyses Using Regression Coefficients

................................ ............................

13

3. CHAPTER THREE :

METHODS ................................ ................................ ................................ .

15

The Case of the Two Predictor Regression Model

................................ ................................ ...

15

Variance - Covariance Matrix for Standardized Slopes

................................ .............................

20

The Cas e of the Two Predictor Regression Model

................................ ...............................

20

The Case of the Three Predictor Regression Model

................................ .............................

21

Alternative Ways of Obtaining the Standard Error ................................ ................................ ...

22

Using t Statistics for Slopes

................................ ................................ ................................ ..

23

vi

Using Standard Deviations of Variables and the SE of the Raw Slope

................................

23

Using the Variance Inflation Factor

................................ ................................ .....................

24

Summary of the Relation among Raw

and

Stan dardized

Slopes , and Semi - partial

Correlations

................................ ................................ ................................ .........................

26

Investigating the Difference between the Standardized Slope and Correlation Coefficient

....

26

Comparison of the Standardized Regression Slope and th e Semi - partial Correlation

.............

29

Two Predictor Model

................................ ................................ ................................ ............

29

Comparison Results

................................ ................................ ................................ ..............

30

4. CHAPTER FOUR :

EXAMPLE

................................ ................................ ................................ ...

33

Data Description

................................ ................................ ................................ .......................

33

Homogeneity Test

................................ ................................ ................................ .....................

35

Random - effects Model ................................ ................................ ................................ ..............

36

5. CH APTER FIVE :

SIMULATION

................................ ................................ ...............................

38

Simulation Conditions

................................ ................................ ................................ ..............

38

Data Generati on

................................ ................................ ................................ ........................

38

Two Predictor Model

................................ ................................ ................................ ............

39

Five Predictor Model

................................ ................................ ................................ ............

39

Ten Predictor Model

................................ ................................ ................................ .............

40

Data Evaluation

................................ ................................ ................................ .........................

41

Bias of the

Estimated Standardized Regression Slope

................................ ..........................

41

Mean Squared Error of the Estimated Standardized Regression Slope

...............................

42

Simulation Results

................................ ................................ ................................ ....................

42

Two Predictor Model

42

vii

Five or Ten Predictor Model

................................ ................................ ................................

44

6. CHAPTER SIX :

DISCUSSION

................................ ................................ ................................ ...

54

Conclusion

................................ ................................ ................................ ................................

54

Practical Implications ................................ ................................ ................................ ................

55

Advantages

and Limitations

................................ ................................ ................................ .....

56

APPENDIX A

SIMULATION CODE

................................ ................................ ........................

58

REFERENCES

................................ ................................ ................................ .............................

61

BIOGRAPHICAL SKETCH

................................ ................................ ................................ ........

64

viii

L IST OF T ABLES

Table 3.1. Comparison of th e Standardized Regression Slope and the Semi - partial Correlatio n

. 3 1

Table 4.1. Example data ................................ ................................ ................................ ................. 34

Table 5.1. Summary Statistics for Two Predictor Model

................................ ............................. 45

Table 5. 2 . Summary Statistics for Five

Predictor Model ................................ .............................. 4 6

Table 5. 3 . Summary Statistics for T en

Pred ictor Model

................................ .............................. 4 7

ix

L IST OF F IGURES

Figure 3.1. The differences between standardized slopes ( b 1 * ) and correlation coefficients ( r y 1 ) as a function of r y 2

and r 12

for the two predictor model

................................ ................................ ..... 28

Figure 3.2. Comparison of the Standardized Regression Slope

and the Semi - partial Correlation . ................................ ................................ ................................ ................................ .......................

32

Figure 4.1. Standardized slopes with 95% confidence intervals

................................ ..................

37

Figure 5. 1 . Histograms of b*

for Two

Predictor Model Varying Intercorrelation and Sample Size. ................................ ................................ ................................ ................................ .......................

49

Figure 5.2. Histograms of b*

for Five Predictor Model Varying Intercorrelation and Sample Size. ................................ ................................ ................................ ................................ .......................

49

Figure 5.3. Histograms of b*

for Ten Predictor Model Varying Intercorrelation and Sample Size. ................................ ................................ ................................ ................................ .......................

50

Figure 5.4. Histograms of SE( b* ) for Two Predictor Model Varying Intercorrelation and Sample Size.

................................ ................................ ................................ ................................ ...............

51

Figure 5.5. Histograms of SE( b* ) for Five Predictor Model Varying Intercorrelation and Sample

Size.

................................ ................................ ................................ ................................ ...............

52

Figure 5.6. Histograms of SE( b* ) for Five Predictor Model Varying Intercorrelation and Sample Size.

................................ ................................ ................................ ................................ ...............

53

x

A BSTRACT

When conducting a meta - analysis, it is common to find many collected studies that report regression analyses, because multiple regression analysis is widely used in many fields. Meta - analysis uses effect sizes drawn from individual studies as a means of synthesizing a collection of results. However, indices of effect size f rom regression analyses have not been studied extensively. Standardized regression coefficients from multiple regression analysis are scale free estimates of the effect of a predictor on a single outcome. Thus these coefficients can be used as effect – size indices for combining studies of the effect of a focal predictor on a target outcome.

I begin with a discussion of the statistical properties of standardized regression coefficients when used as measures of effect size in meta - analysis. The main purpose of

this dissertation is the presentation of methods for obtaining standardized regression coefficients and their standard errors from reported regression results. An example of this method is demonstrated using selected studies from a published meta - analysis

on teacher verbal ability and school outcomes (Aloe & Becker, 2009). Last, a simulation is conducted to examine the effect of multicollinearity (intercorrelation among predictors), as well as the number of predictors on the distributions of the estimated standardized regression slopes and their variance estimates. This is followed by an examination of the empirical distribution of estimated standardized regression slopes and their variances from simulated data for different conditions.

The estimated standa rdized regression slopes have larger variance and get close to zero when predictors are highly correlated via the simulation study.

1

CHAPTER ONE

INTRODUCTION

In nearly every field, large quantities of empirical research studies have been conducted over the

decades. It is very common for researchers to have similar research questions. Meta - analysis is a technique to combine studies with similar research questions to increase precision and assess the generalizability of results (Cohn & Becker, 2003; Glass, 19 76). Glass (1976) defined meta - analysis as “the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings.” (Glass, 1976, p.3).

Meta - analysis uses effect sizes drawn from individual s tudies. In order to investigate associations among variables of interest, r family effect - size indices, specifically Pearson correlation coefficients, are widely used in meta - analysis research (Borenstein, Hedges, Higgins, & Rothstein, 2009; Hedges &

Olkin , 1985; Stankowich & Blumstein, 2005). Pearson correlation coefficients represent the bivariate relationship between two variables without controlling for the effects of other variables.

Many researchers

have

argued for combining correlation coefficients v ia various approaches ( e.g., Becker ,

1992 , 1995 ;

Hedges &

Olkin ,

1985 ;

Hunter & Schmidt ,

1990 ,

1994; Shadish &

Haddock , 1994). Hedges and Olkin (1985) introduced synthesi s

method s

for simple correlation coefficients. Becker (1992, 1995) introduced methods for combining correlation matrices using a generalized least squares estimation approach

for the case of multivariate meta - analysis.

Multiple regression analysis is widely used in primary studies in education, economics, social science, and to a lesser ext ent in medical research (Armitage, Berry, & Matthews, 2002;

2

Cohen, Cohen, West, & Aiken, 2003; Howell, 2010; Kieffer, Reese, & Thompson, 2001). Multiple regression analysis allows for the control of the effects of multiple variables. Thus research on indic es for combining effects of studies using multiple regression analysis is needed. However, methods for synthesis of regression slopes have rarely been studied (exceptions include Aloe, 2009; Becker & Wu, 2007). In this paper, I will investigate the combina tion of regression slopes and propose the standardized regression slope as a metric of effect size in meta - analysis.

It is well - known that

the standardized regression coefficient in a

bivariate regression model is the same as the bivariate correlation coef ficient between the

independent and dependent variables. In multiple regression analysis, standardized regression coefficients are scale free estimates and are related to correlation coefficient s , but the relationship is much more complex than in bivariate

regression .

For example, consider a regression model with two independent variables, , where y i

is the score on the dependent variable of the i th

subject, x 1 i

and x 2 i

are the values of the independent variables for the i th

subject, 0 , 1 , and 2

are population regression coefficients, and is a residual term, often assumed to be normally distributed with mean of zero and constant variance. The associated standardized regression model is , where and are the standardized regression coefficients in population. The error term, , is assumed to be normally distributed with mean of zero and variance of . The l east s quare s

e stimates of the standardized regression slope s

are

and ,

where

is the simple correlation coefficient between Y

and x 1 ,

is the simple correlation coefficient between Y

and x 2 , and

is the simple correlation coefficient between x 1

and x 2 .

The

3

standardized regression coefficient for the first independent variable, , is a f unction of all the correlation coefficients among the variables. When the intercorrelation between the two independent variables is zero (i.e., = 0), the standardized regression coefficient, , is equal

to the correlation coefficient, . For multiple regression models with more predictors, these formulas are more complex, but the simplification that =

holds if all the interco rrelation values among the predictors are zero.

Purpose of the Dissertation

The purpose of t his

study is to illustrate how the standardized slope can be an effect - size index

in

meta - analysis , as well as to discuss its strengths

and limitations.

I will

examine and compare the interpretations of various available indices of regression results. Furthermore, I will focus on the statistical properties of standardized regression coefficients as measure s

of effect size in meta - analysis. In addition, a method will be presented for obtaining the standard errors of standardized regression slopes.

A practical meta - analysis example is used to illustrate combining standardized slopes drawn from a literature on teacher verbal ability and school outcomes (Aloe & Becke r, 2009). Finally, a simulation study is conducted in order to examine

the effect of multicollinearity

and the number of predictors on the distributions of the estimated standardized regression slopes and the ir

variance

estimates .

The Organization of the

Dissertation

4

I begin with a brief literature review. This is followed by a method for obtaining standardized slope estimates and a derivation of their variance will be presented. Next, I will illustrate the use of meta - analysis techniques for combining r egression slopes. A simulation study will be presented to examine the empirical distributions of the estimated standardized regression slopes and their variance estimates. Finally I will discuss

t he advantages and limitations of using the standardized regr ession slope

for combining effects.

5

CHAPTER TWO

LITERATURE REVIEW

In this section I will summarize the methods for issues in combining regression coefficients (Becker & Wu, 2007; Greenland, Maclure, Schlesselman, Poole, & Morgenstern, 1991; Greenland, Schlesselman, & Criqui, 1986; Peterson &

Brown , 2005 ). Next I will discuss the proposed indices for regression slopes in multiple regression models ( Aloe , 2009 ; Greenwald , Hedges ,

& Laine, 1996; Stavig , 1 977 ). The applied meta - analysis literature on regres sion coefficients will be reviewed (Paul, Lipps,

& Madden, 2006; Yin, Schmidt, &

Besag , 2006) , and a method of combining t statistics from regression slopes will be discussed ( Stanley, Doucouliagos, & Jarrell , 2008; Stanley & Jarrell, 1989, 2005). Finally , the partial effect size for dummy slopes in regression models will be discussed (Keef & Roberts, 2004).

Combining Regression Coefficients

Becker and Wu (2007) addressed the issue of synthesizing regression slopes. They described some existing methods

for summarizing regression slopes including summaries of raw slopes or t statistics of slopes,

i terative least squares regression method s , and weighed least squares method s

for summarizing regression slopes , a s well as a

multivariate B ayesian approach . In addition, they discussed

a new synthesis approach based on generalized least squares

(GLS)

estimation using raw regression coefficients .

6

The authors considered combining all regression coefficients from individual studies to take into account the effects of predictors on a target outcome. However, raw regression coefficients depend on the scale of variables. If variables from collected studies have the same measure or same scale, it is possible to apply this method. In general, the standardized measure for

effect size is required for combining regression coefficients.

The GLS estimation is given by , where W

represents a design matrix which contains zeros and ones to identify coefficients from each study,

is a blockwise diagonal matrix which contains the variance - covariance matrices of

the

regression coefficients in each study, and b

represents the stack of reported regression coefficients from the collected studies. The proposed GLS method requires covariances and variances among regression coefficients in each study, which are often challenging to obtain.

Becker and Wu (2007) also discussed problems in synthesizing regression slopes in terms of scaling of variables and differences in additional independent var iables across studies. I review these issues here.

Scaling Issue

An essential property of most measures of effect size is that they are scale free, which means the magnitudes of effect sizes are comparable across studies. The Pearson correlation coeffici ent is a scale free index, therefore the correlation coefficient is widely used as an effect size to represent associations among variables. The magnitudes of estimated unstandardized slope parameters in multiple regression depend on the scales of the pred ictor and outcome variables. However, the standardized slope is interpreted as the estimated number of standard

7

deviations of change in the dependent variable for one standard deviation unit change in the independent variable ,

controlling for other indepen dent variables. This index can be compared across studies in much the same way that standardized mean difference effects are compared.

Standardized Regression Slopes

Peterson and Brown (2005) investigate d

the empirical relationship between simple correla tion coefficients and standardized regression slopes. In this study 1,504 standardized regression coefficients and correlation coefficients from published

articles in behavioral journals were collected. The authors provided the estimated slope of the regre ssion of

the

standardized regression slope on the simple correlation coefficient, and found a strong relation between simple correlation coefficients and standardized regression slopes. The main focus of this study was to combine correlation coefficients i n meta - analysis, specifically dealing with the case of reporting standardized regression coefficients when correlation coefficients are not reported. The authors proposed a formula to impute the standardized regression slope whe n studies have no informatio n on

correlation coefficients. However, this paper does not discuss computational methods for effect - size variances.

Greenland et al. (1986) and Greenland et al. (1991) criticized the use of the standardized regression coefficient as a measure of effect size, especially for logistic - regression - analysis results. They argued that the range of standard deviations of variables in multiple regression analysis is wide across studies. Another problem they identified, also discussed by Becker and Wu (2007), was t hat the covariates in multiple regression models can vary across studies. Their argument thus was that comparisons of standardized slopes were not meaningful because the

8

standard deviations of variables are different from study to study. Specifically, they

claim that the interpretation of standardized slopes in biological or public - health contexts is not meaningful because the population standard deviations of predictors may vary. Their argument applies to logistic regression analysis because the outcome va riable is dichotomous. Thus, the standard deviation of dichotomous outcome is not meaningful when interpreting the standardized regression slope. In this dissertation, the outcome variable is assumed to be continuous, which allows for meaningful interpreta tion of the standardized regression coefficients.

Other Indices of Regression Slopes

Semi - standarized and half - standardized regression coefficients also have been proposed as effect - size measures. Semi - standarized regression slopes provide the effects of

standardized predictors on unstandardized outcome variables, or the effects of unstandardized predictors on standardized outcome variables. The half - standardized regression coefficient

represents the effect of

an unstandardized predictor on a standardized

outcome variable,

and is the one of the semi - standarized regression slopes provided in Stavig (1977). The semi - partial correlation i ndex has been proposed as

an

effect - size index in multiple regression analysis

(Aloe, 2009) . The semi - partial correlation i ndex represents the unique effect of the predictor on the outcome variable ,

partialling out the effects of other predictors in the regression model.

S emi - standardized Partial R egression C oefficient

Stavig (1977) introduced two types of semi - standardized regression coefficient s

as a function of standardized or unstandardized slopes and the standard deviation

of the independent

9

or dependent variable, specifically,

or , where

is a standar dized regression slope

for j th

predictor , is the raw slope

in a simple regression model, and

and

are the standard deviations for dependent and independent variable s , respectively. The s emi - standarized regression slopes, ,

represent the effects of standardized predictors on unstandardized outcome variables, and

represent the effects of unstandardized predictors on standardized outcome variabl es

holding constant other predictors in the model . The author directly applied this formula to slopes from multiple regression models.

The author compared simple correlation s , standardarized and unstandardized coefficients with an example from Anderson, Ro sh, and McClary (1973) . Stavig (1977)

did not provide the standard error of the semistandardized slope.

Half - standardized Partial Regression Coefficient

Greenwald et al. (1996 ) proposed the h alf - standardized partial regression coefficient as a measure of

effect size .

T he h alf - standardized partial regression coefficient was computed by dividing the unstandardized regression slope by the standard deviation of the dependent variable, defined as .

The half - standardized slope indicates t he change in standard - deviation units on the outcome for one unit change of the predictor

controlling for other predictors . The half - standardized slope is one of the semi - standarized regression slopes introduced by Stavig (1977) as described earlier. In th is study, half - standardized regression slopes were calculated from 60 collected studies. Two meta - analyses with combined significance testing and half - standardized partial regression coefficients were conducted with the median effect size as

the

10

final repr esentation of the combined effect magnitude. Thus, the authors did not compute a weighted average of the effects to take into account the precision of effect size. The variance of effect sizes represents the precision of effect size in each study. This stu dy did not discuss the precision of the effect sizes in the collected studies.

Semi - partial Correlation Index

Aloe (2009) proposed the semi - partial correlation index for synthesizing slopes in multiple regression models . The index he proposed was

,

where t

is the t

statistic for the focal predictor, R 2 is the variance explained by the model, n is the total sample size, and p

is the number of predictors. The semi - partial correlation index represents the unique effect of the focal pre dictor on the target outcome partialling out the effects of other predictors in the model. This index does not include the common effect shared with other predictors on the outcome. Thus, when the number of predictors is increased, the values of the semi - p artial correlation index tend to be smaller. A comparison of the semi - partial correlation with the standardized regression slope, while varying the number of predictors and intercorrelations among predictors, is presented in Chapter III.

Aloe (2009) derive d a formula of the estimated variance of the semi - partial correlation using the delta

method,

,

11

where

represents the proportion of variance explained by the p

predictor regression model,

represents the proportion of variance explained by the ( p - 1) predictor regression model (excluding the p th

predictor in the first model), which can be obtained by the formula , and n

is the total sample size. Since this formula is complicated an alternative way of obtaining the variance of the semi - partial correlation index is presented in the next chapter.

Aloe (2009)

also pointed out that the standardized regression slope was one possible index in multiple regression models, but h e mentioned that “ one of the major shortcomings of synthesizing standardized slopes is that generally primary researchers report beta weights without standard errors.” (Aloe, 2009, p. 10) . This weakness is addressed in the next chapter.

Summarizing t Sta tistics

Stanley and Jarrell (1989, 2005, and 2008) argued for the synthesis of multiple regression slopes. The model used in their article was

,

w here b j

is the reported estimated slope for the target variable from the j th

study, β is the true value of that slope, W ja

are study characteristic variables

(typically called meta - independent variables ) , a

is the effect of the a t h

study characteristic

in study j , and k

is the total number of studies. Stanley and Jarrell also introduced th e idea of integrating t

statistics for the slopes from economics research studies

( divid ing

the slope by its standard error ) . They argued that meta -

12

regression analysis control s

for variation from the different predictors in the studies. The meta - regression

model is

,

where

is the standard error of the slope.

The authors mentioned that the “ t - statistic is a standardized measure of the critical parameter of interest” (Stanley &

Jarrell, 1989, p. 304). However, Becker and Wu (2007) criticized this remark, noting that “they did not say what the parameter of interest is. Clearly t

is not an estimator of ” (Becker &Wu, 2007, p. 418). Also, combinations of the t

statistics would not explain the magnitude of the effe ct of interest. The value of the t statistic represents statistical significance for the null hypothesis about the slope parameter .

In other words, the t

statistic cannot tell us the extent of the effect of the focal predictor on the outcome variable.

Par tial Standardized Mean Difference Effect Size

Keef and Roberts (2004) introduced another partial effect size

from the multiple regression model. In their work, the slope represents the

standardized mean difference between two groups, so the predictor of i nterest in the primary studies is a dummy variable. Thus, their multiple regression model was specifically an analysis - of - covariance (ANCOVA) model. The proposed partial effect size is

computed by dividing the dummy slope from

the

model by the

square root of the

estimated error variance (the mean square error of the ANCOVA model). The dummy slope represents the adjusted mean difference between two groups (for example, experimental and control group s ) controlling for other variables or covariates in the mode l. Keef

13

and Roberts focused on this standardized function of the dummy slope to represent

the

d - type effect size, but did not discuss

effect sizes for slopes of continuous variables.

Applied M eta - analys e s Using Regression Coefficients

Two research pape rs described

below conducted meta - analyses which combin ed

regression coefficients. Both accessed raw data to obtain regression coefficients for each study. Yin

et al. (2006) combined standardized regression coefficients and Paul et al. (2006) combined inte rcepts and raw slopes.

Yin

et al. (2006) use d

standardized slope s

as effect size s in synthesizing studies, but it is not clear what standard errors they used. They investigated trends in student achievement tests over a 3 -

or 4 year period across 17 urba n school districts. The achievement score and time interval variables were

standardized . They also calculated the standardized slope s

for those two variables across districts. Standardized slopes were computed with raw data for each district and used to ob tain the inverse variance weighted mean. Yin et al. did not

discuss

how to calculate the standard error of the standardized regression slope from reported regression results.

Paul et al. (2006)

conducted a meta - analysis with a synthesi s of regression slo pes and intercepts from bivariate regression models.

They summarized the unstandardized slopes and intercepts from 126 studies of the relation between deoxynivalenol content of harvested wheat grain and Fusarium head blight index. The focal predictor and t he target outcome had same scale of measure ment

across the collected studies.

From p.14, b efore conducting the meta - analysis, the

14

authors estimated intercepts and slopes in

a simple regression model from 126 studies. Next overall mean effect sizes in terms

of average slope and average intercept under the random - effects model were estimated. However, in general, unstandardized raw regression coefficients depend on the scales of variables. Last, they conducted moderator analys e s with study - characteristic vari ables under the mixed - effects model.

15

CHAPTER THREE

METHODS

In this section, the estimation of standardized regression coefficients in the cases of two or three predictor regression models is addressed. The

ordinary

least squares estimator s of the stand ardized regression parameters are presented. In addition, a

method is presented for obtaining the standard errors of standardized regression slopes.