• unlimited access with print and download
    $ 37 00
  • read full document, no print or download, expires after 72 hours
    $ 4 99
More info
Unlimited access including download and printing, plus availability for reading and annotating in your in your Udini library.
  • Access to this article in your Udini library for 72 hours from purchase.
  • The article will not be available for download or print.
  • Upgrade to the full version of this document at a reduced price.
  • Your trial access payment is credited when purchasing the full version.
Buy
Continue searching

Large N, T asymptotic analysis of panel data models with incidental parameters

ProQuest Dissertations and Theses, 2011
Dissertation
Author: Martin Weidner
Abstract:
This dissertation contributes to the econometrics of panel data models and their application to economic problems. In particular, it considers "large T " panels, where in addition to the cross-sectional dimension N also the number of time periods T is relatively large. Chapter 1 provides an introduction to the field of large T panel data econometrics and explains the contribution of the dissertation to this field. Chapter 2 analyzes linear panel regression models, allowing for unobserved factors (interactive fixed effects) in the error structure of the model. In particular, it is shown that, under appropriate assumptions, the limiting distribution of the Gaussian quasi maximum likelihood estimator for the regression coefficients is independent of the number of factors used in the estimation. The important practical implication of this result is that for inference on the regression coefficients there is no need to estimate the number of factors consistently. Chapter 3 extends the Berry, Levinsohn and Pakes (1995) random coefficients discretechoice demand model by adding interactive fixed effects to the unobserved product characteristics of this model. The interactive fixed effects can be arbitrarily correlated with the observed product characteristics, which accommodates endogeneity, and they can capture strong persistence in market shares across products and markets. A two step least squares-minimum distance procedure is proposed to estimate the model, and the asymptotic properties of this estimator are derived. This methodology is then applied to the estimation of US automobile demand. Chapter 4 proposes a new approach for higher order bias correction in large T non-linear panel data models that is based on inference on the individual effect distribution. Under appropriate assumptions it is shown that the incidental parameter bias for the estimator of the parameters of interest can converge to zero at an arbitrary polynomial rate in T, i.e. that the incidental parameter problem can vanish very rapidly in this approach as T increases. This has important implications in particular for applications where T is modestly large and N is much larger than T.

Table of Contents Acknowledgments ii List of Tables vi List of Figures vii Abstract viii Chapter 1:Introduction 1 Chapter 2:Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Effects 7 2.1 Introduction...................................7 2.2 Model,QMLE and Consistency........................11 2.3 Asymptotic Profile Likelihood Expansion..................16 2.3.1 When R = R 0 ..............................17 2.3.2 When R > R 0 ..............................21 2.4 Justification of Assumptions 2.4 and 2.5...................25 2.5 Monte Carlo Simulations............................30 2.6 Conclusions...................................32 Chapter 3:Estimation of Random Coefficients Logit Demand Models with Inter- active Fixed Effects 33 3.1 Introduction...................................33 3.2 Model......................................36 3.3 Estimation...................................40 3.3.1 Extension:regressor endogeneity with respect to e jt ........43 3.4 Consistency and Asymptotic Distribution of ˆα and ˆ β............45 3.5 Monte Carlo Results..............................51 3.6 Empirical application:demand for new automobiles,1973-1988......54 3.7 Conclusion...................................61 Chapter 4:Semiparametric Estimation of Nonlinear Panel Data Models with Generalized Random Effects 63 4.1 Introduction...................................63 iv

4.2 Model......................................69 4.3 Description of Estimators and Main Results.................71 4.3.1 Sampling Issues (Generalized Random Effect Assumption).....72 4.3.2 Identification Issues (Smoothness Assumption on π)........75 4.3.3 Main Results..............................78 4.4 Asymptotic Analysis of the Estimators....................80 4.4.1 Uniform Consistency of ˆ θ(π).....................81 4.4.2 Score and Hessian of the Integrated Likelihood...........82 4.4.3 Joint Maximum Likelihood Estimation of θ and π.........87 4.5 Generalized Random Effects..........................91 4.5.1 Imposing an Appropriate Smoothness Assumption.........92 4.5.2 Computation..............................96 4.6 Monte Carlo Simulations............................99 4.7 Conclusions...................................107 Bibliography 110 Appendix A 117 A.1 Proof of Consistency..............................117 A.2 Proof of Likelihood Expansion........................118 Appendix B 129 B.1 Alternative GMM approach..........................129 B.2 Details for Section 3.4 (Consistency and Asymptotic Distribution)....133 B.2.1 Formulas for Asymptotic Bias Terms.................133 B.2.2 Assumptions for Consistency.....................134 B.2.3 Additional Assumptions for Asymptotic Distribution and Bias Cor- rection..................................136 B.2.4 Bias and Variance Estimators.....................138 B.3 Proofs......................................140 B.3.1 Proof of Consistency..........................140 B.3.2 Proof of Limiting Distribution....................145 B.3.3 Consistency of Bias and Variance Estimators............152 Appendix C 153 C.1 Assumptions..................................153 C.1.1 Assumptions for Consistency.....................153 C.1.2 Further Regularity Conditions on the Model.............154 C.2 Proofs......................................156 C.2.1 Proofs for Section 4.4.1........................156 C.2.2 Proofs for Section 4.4.2........................158 C.2.3 Proofs for Section 4.4.3........................167 C.3 Further Discussions for Section 4.5......................169 C.3.1 Approximating Unknown Distributions...............169 C.3.2 Approximate Identification of π(α|x).................171 v

List of Tables 2.1 Simulation results for the bias and standard error (std) of the QMLE ˆ β R .31 2.2 Simulation results for the quantiles of √ NT

ˆ β R −β 0

............31 3.1 Simulation results for specification 1 (no heteroscedasticity)........52 3.2 Simulation results for specification 2 (heteroscedasticity in e 0 jt ).......53 3.3 Parameter estimates (and t-values) for automobile demand estimation...56 3.4 Parameter estimates (and t-values) for model specification A and B....56 3.5 Summary statistics for the 23 product-aggregates used in estimation...58 3.6 Estimated price elasticities for specification B in t = 1988..........59 3.7 Estimated price elasticities for specification C (BLP case) in t = 1988...60 4.1 Monte Carlo Results for T = 12........................105 4.2 Same as Table 4.1,but with T = 24 and only for σ π = 0.7..........108 vi

List of Figures 4.1 For T = 12 (left) and T = 24 (right) we plot

T −1 I −1 (α,θ 0 ,y i0 ).....101 4.2 Examples of “basis functions” for T = 12..................102 4.3 Same as Figure 4.2,but for T = 24.......................102 4.4 True and estimated individual effect distributions..............104 4.5 Same as Figure 4.4,but with T = 24 and only for σ π = 0.7.........107 B.1 Example for multiple local minima in the objective function L(β).....132 vii

Abstract This dissertation contributes to the econometrics of panel data models and their applica- tion to economic problems.In particular,it considers “large T” panels,where in addition to the cross-sectional dimension N also the number of time periods T is relatively large. Chapter 1 provides an introduction to the field of large T panel data econometrics and explains the contribution of the dissertation to this field. Chapter 2 analyzes linear panel regression models,allowing for unobserved factors (interactive fixed effects) in the error structure of the model.In particular,it is shown that,under appropriate assumptions,the limiting distribution of the Gaussian quasi maximumlikelihood estimator for the regression coefficients is independent of the number of factors used in the estimation.The important practical implication of this result is that for inference on the regression coefficients there is no need to estimate the number of factors consistently. Chapter 3 extends the Berry,Levinsohn and Pakes (1995) randomcoefficients discrete- choice demand model by adding interactive fixed effects to the unobserved product char- acteristics of this model.The interactive fixed effects can be arbitrarily correlated with the observed product characteristics,which accommodates endogeneity,and they can capture strong persistence in market shares across products and markets.A two step least squares-minimum distance procedure is proposed to estimate the model,and the asymptotic properties of this estimator are derived.This methodology is then applied to the estimation of US automobile demand. viii

Chapter 4 proposes a new approach for higher order bias correction in large T non- linear panel data models that is based on inference on the individual effect distribution. Under appropriate assumptions it is shown that the incidental parameter bias for the estimator of the parameters of interest can converge to zero at an arbitrary polynomial rate in T,i.e.that the incidental parameter problem can vanish very rapidly in this approach as T increases.This has important implications in particular for applications where T is modestly large and N is much larger than T. ix

Chapter 1 Introduction This thesis is concerned with the statistical analysis of some particular panel data models and their application to economic problems.A panel data set consists of observations on many cross-sectional units (individuals,products,firms,countries,etc.) in multiple time periods.The information contained in such a data set is generally much richer than in a pure cross-sectional data set or in a pure time-series data set.In particular,the avail- ability of panel data makes it possible to differentiate the heterogeneous influences and characteristics that are unique to each cross-sectional unit from the structural relations or common trends that are common to all cross-sectional units. Several textbooks have been written on the subject of panel data (e.g.Hsiao (2003), Arellano (2003),Wooldridge (2010)),and it continues to be a very active research area. The goal of panel data analysis typically is to estimate and do inference on a few param- eters of interest (regression coefficients,marginal effects,etc.),while at the same time controlling for unobserved heterogeneity,which is often modeled by unobserved individ- ual effects.There are well-established techniques for handling the unobserved hetero- geneity in purely linear panel data models,but this issue is still a serious econometric challenge for many non-linear models.One of the difficulties in non-linear models is that the parameters of interest may not be uniquely identified from the data in the presence of the unobserved individual effects.Based on how to handle this potential identifica- tion problem,and which asymptotic theory is considered,one can broadly classify the existing panel data literature as follows. Firstly,there is the “classic” panel data literature that considers point identification and point estimation under an asymptotic where the number of time periods T remains constant,while the cross-sectional size N goes to infinity.Obtaining a consistent point 1

estimator for the parameters of interest at fixed T is desirable and can indeed be achieved for some non-linear models (e.g.Rasch (1960),Andersen (1970),Chamberlain (1984), Hausman,Hall and Griliches (1984),Manski (1987),Honor´e (1992),Horowitz and Lee (2004),Bonhomme (2010)).However,at fixed T a non-linear panel data model may not be point identified,or may not possess a √ N-consistent estimator,as discussed by Chamberlain (2010) for the static binary choice model.Furthermore,an incidental parameter problem (Neyman and Scott (1948),see e.g.Lancaster (2000) for a review) usually appears in fixed T estimation of non-linear panel data models since the num- ber of incidental parameters (individual effects) grows with the sample size.Resolving this problem usually requires a model specific augmentation of standard estimation pro- cedures like maximum likelihood.We refer e.g.to Chamberlain (1984),Arellano and Honor´e (2001),and the above mentioned textbooks for reviews of this branch of the literature. Secondly,there is the “large T” panel data literature,which includes e.g.Phillips and Moon (1999),Hahn and Kuersteiner (2002),Lancaster (2002),Woutersen (2002), Hahn and Kuersteiner (2004),Hahn and Newey (2004),Carro (2007),Arellano and Bon- homme (2009),Fern´andez-Val (2009),Bester and Hansen (2009),Bai (2009b),Dhaene and Jochmans (2010);a review is provided by Arellano and Hahn (2007).This liter- ature considers an asymptotic where both panel dimensions N and T go to infinity. This large T asymptotic guarantees point identification of a very large class of models under weak regularity conditions,and provides an asymptotic solution to the incidental parameter problem.Namely,the (maximum likelihood) estimator for the parameters of interest is shown to have a bias of order 1/T,which thus vanishes asymptotically,and bias correction techniques are discussed that augment the convergence rate of the bias further. Finally,there are a couple of papers that acknowledge the fact that many non-linear panel data models are not point identified at fixed T and consequently discuss set iden- tification (bound analysis) for the parameters of interest or for certain policy parameters 2

like marginal effects.These include e.g.Chernozhukov,Hahn and Newey (2005),Honor´e and Tamer (2006),Chernozhukov,Fern´andez-Val,Hahn and Newey (2009a) and Cher- nozhukov,Fern´andez-Val and Newey (2009b). These three estimation approaches for non-linear panel data models should be viewed as complements rather than substitutes.If for fixed T a ( √ N-) consistent estimator is available for the particular model under consideration,then it probably should be used. If this is not the case,then chances are that the model may not be point identified at fixed T and in particular for small values of T one needs to consider inference using bound analysis.However,the above cited papers on set identification all point out that the bounds can be very tight and shrink rather rapidly as T grows.Thus,if T is sufficiently large one can safely ignore the fact that the model might only be set-identified and simply use the large T estimation methodology. Each of the following three chapters analyzes a particular class of panel data models with distinct motivations and different potential applications.The common theme to all three chapters is that they consider cases where both panel dimensions N and T are relatively large,i.e.they contribute to the branch of “large T” panel data mentioned above.All three chapters consider an asymptotic where both N and T go to infinity, which,as noted above,is theoretical appealing since it overcomes the identification problem and provides an approximate solution to the incidental parameter problem. In addition,this asymptotic is motivated by the fact that many panel data sets that are available nowadays indeed have relatively large N and T.This applies to microeconomic surveys (e.g.the Panel Study of Income Dynamic or the National Longitudinal Survey of Youth),to macroeconomic datasets (e.g.OECD,Eurostat and UNESCOprovide data of many countries over multiple decades),financial datasets (e.g.the Center for Research in Security Prices and COMPUSTAT provide data on prices,earnings,ratings,etc.of thousands of companies over many decades),and other areas. Chapter 2 considers a linear panel regression model with interactive fixed effects. While the model has a linear regression specification it is non-linear in the specification 3

of the unobserved error structure,which contains (multiple) individual effects and time effects that interact multiplicatively.We refer to this multiplicative specification as a factor structure or an interactive effect,and both the individual effects and the time effects are estimated as parameters,i.e.treated as fixed effects.We show how to analyze the fixed effect Gaussian quasi maximum likelihood estimator (QMLE) of this model under the N,T →∞asymptotic.In particular,under appropriate assumptions we show that the limiting distribution of the QMLE for the regression coefficients is independent of the number of interactive fixed effects used in the estimation,as long as this number does not fall below the true number of interactive fixed effects present in the data. The important practical implication of this result is that for inference on the regression coefficients one does not need to estimate the number of interactive effects consistently, but can simply rely on any known upper bound of this number to calculate the QMLE. Further details and references are provided in the introduction of the chapter itself.The research of this part of the thesis also entered into Moon and Weidner (2010b),which is a joint paper with Hyungsik Roger Moon. In Chapter 3 we extend the Berry,Levinsohn and Pakes (BLP,1995) random coef- ficients discrete-choice demand model,which underlies much recent empirical work in industrial organization.In the context of demand estimation the panel structure is given by observing market shares and characteristics of multiple products (the cross- sectional units) in different markets (e.g.over time).Analogous to the linear regression model in Chapter 2 we add interactive fixed effects in the form of a factor structure to the unobserved product characteristics,which is the structural error of the BLP demand model.The interactive fixed effects can be arbitrarily correlated with the observed prod- uct characteristics (including price),which accommodates endogeneity and,at the same time,captures strong persistence in market shares across products and markets.We pro- pose a two step least squares-minimum distance procedure to calculate the estimator. This estimator is easy to compute,Monte Carlo simulations show that it performs well, and we discuss an empirical application to US automobile demand.Further details and 4

references are again provided in the chapter itself.This chapter of the thesis also gave rise to a joint paper with Hyungsik Roger Moon and Matthew Shum (Moon,Shum and Weidner (2010)). Both Chapter 2 and Chapter 3 introduce interactive fixed effects into the error struc- ture of the respective model,i.e.there are individual effects (factor loadings) and time effects (factors) that are both estimated as parameters.Under the asymptotic where both panel dimensions become large one therefore faces an incidental parameter prob- lem not only in the cross-sectional dimension,but also in the time dimension,which turns out to give rise to a bias of order 1/N (or 1/J,in the notation of Chapter 3) in the estimator for the parameters of interest,in addition to the “standard” cross-sectional incidental parameter bias of order 1/T.The chapters show how to derive the magnitude of these biases asymptotically,which then also allows for bias correction.Regarding bias correction in linear factor regression models we also refer to Bai (2009b) and Moon and Weidner (2010a).Bias correction in other non-linear large dimensional panel data models with individual and time effects is discussed in Fern´andez-Val and Weidner (2010). Chapter 4 of the thesis does not consider time effects,but studies relatively general non-linear panel data models with individual effects when both N and T become large. As mentioned above,it is well-known that the fixed effect estimation approach in these models typically results in an incidental parameter bias of order 1/T.We showthat under appropriate assumptions this incidental parameter bias can be substantially reduced if instead of a fixed effect approach one estimates the distribution of the individual effects jointly with the parameter of interest by maximumlikelihood,thereby treating the individual effect distribution non-parametrically.The convergence rate of the incidental parameter bias in this approach is shown to be only limited by the smoothness properties of the true individual effect distribution.To allow inference on this distribution we make a “generalized random effect” assumption,which requires the cross-sectional units to be partitioned into groups and imposes a random effect assumption in each group.In Monte Carlo simulations we consider the dynamic binary choice model,and we find 5

the finite sample properties of our estimator to be in accordance with the asymptotic results.The results on higher order bias correction in this chapter are particularly important in applications where T is only modestly large,while N is much larger than T,which is the case in many microeconomic panel surveys.For very large values of N bias correction becomes particularly important,since the standard error of the estimator of the parameters of interest is very small,so that even a small remaining bias may easily dominate the properties of the estimator.For further details and references we again refer to the introduction of the chapter itself.This chapter was the basis for my job-market paper (Weidner (2011)). 6

Chapter 2 Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Effects 2.1 Introduction Panel data models typically incorporate individual and time effects to control for het- erogeneity in cross-section and across time-periods.While often these individual and time effects enter the model additively,they can also be interacted multiplicatively,thus giving rise to so called interactive effects,which we also refer to as a factor structure. The multiplicative form captures the heterogeneity in the data more flexibly,since it allows for common time-varying shocks (factors) to affect the cross-sectional units with individual specific sensitivities (factor loadings).It is this flexibility that motivated the discussion of interactive effects in the econometrics literature,e.g.Holtz-Eakin,Newey and Rosen (1988),Ahn,Lee,Schmidt (2001;2007),Pesaran (2006),Bai (2009b;2009a), Zaffaroni (2009),Moon and Weidner (2010a). Analogous to the analysis of individual specific effects,one can either choose to model the interactive effects as random (random effects/correlated effects) or as fixed (fixed effects),with each option having its specific merits and drawbacks,that have to be weighed in each empirical application separately.In this chapter,we consider the interac- tive fixed effect specification,i.e.we treat the interactive effects as nuisance parameters, 7

which are estimated jointly with the parameters of interest. 1 The advantages of the fixed effects approach are for instance that it is semi-parametric,since no assumption on the distribution of the interactive effects needs to be made,and that the regressors can be arbitrarily correlated with the interactive effect parameters. Let R 0 be the true number of interactive effects (number of factors) in the data, and let R be the number of interactive effects used by the econometrician in the data analysis.A key restriction in the existing literature on interactive fixed effects is that R 0 is assumed to be known, 2 i.e.R = R 0 .This is true both for the quasi-differencing analysis in Holtz-Eakin,Newey and Rosen (1988) 3 and for the least squares analysis of Bai (2009b).Assuming R 0 to be known could be quite restrictive,since in many empirical applications there is no consensus about the exact number of factors in the data or in the relevant economic model,so that an estimator which is not robust towards some degree of mis-specification of R 0 should not be used.The goal of the present chapter is to overcome this problem. For a linear panel regression model with interactive fixed effects we consider the Gaussian quasi maximum likelihood estimator (QMLE), 4 which jointly minimized the sum of squared residuals over the regression parameters and the interactive fixed effects parameters (see Kiefer (1980),Bai (2009b),and Moon and Weinder (2010a)).We employ an asymptotic where both the number of cross-sectional and the number of time-serial 1 Note that Ahn,Lee,Schmidt (2001;2007) take a hybrid approach in that they treat the fac- tors as non-random,but the factor loadings as random.The common correlated effects estimator of Pesaran (2006) was introduced in a context,where both the factor loadings and the factors follow cer- tain probability laws,but also exhibits some properties of a fixed effects estimator.When we refer to interactive fixed effects we mean that both factors and factor loadings are treated as non-random parameters. 2 In the literature,consistent estimation procedures for R 0 are established only for pure factor models, not for the model with regressors. 3 Holtz-Eakin,Newey and Rosen (1988) assume just one interactive effect,but their approach could easily be generalized to multiple interactive effects,as long as their number is known 4 The QMLE is sometimes called concentrated least squares estimator in the literature. 8

dimensions becomes large,while the number of interactive effects R 0 (and also R) is constant. The main finding of the chapter is that under appropriate assumptions the QMLE of the regression parameters has the same limiting distribution for all R ≥ R 0 .Thus,the QMLE is robust towards inclusion of extra interactive effects in the model,and within the QMLE framework there is no asymptotic efficiency loss from choosing R larger than R 0 .This result is surprising because the conjecture in the literature is that the QMLE with R > R 0 might be consistent but could be less efficient than the QMLE with R 0 (e.g.,see Bai (2009b)). 5 The important empirical implication of our result is that as long as a valid upper bound on the number of factors is known one can use this upper bound to construct the QMLE,and need not worry about consistent estimation of the number of factors.Since the limiting distribution of the QMLE with R > R 0 is identical to the one with R = R 0 the results of Bai (2009b) and Moon and Weidner (2010a) regarding inference on the regression parameters become applicable. In order to derive the asymptotic theory of the QMLE with R ≥ R 0 we study the properties of the profile likelihood function,which is the quasi likelihood function after integrating out the interactive fixed effect parameters.Concretely,we derive an approximate quadratic expansion of this profile likelihood in the regression parameters. This expansion is difficult to perform,since integrating out the interactive fixed effects results in an eigenvalue problem in the formulation of the profile likelihood.For R = R 0 we show how to overcome this difficulty by performing a joint expansion of the profile likelihood in the regression parameters and in the idiosyncratic error terms.Using the perturbation theory of linear operators we prove that the profile quasi likelihood function is analytic in a neighborhood of the true parameter,and we obtain explicit formulas for 5 For R < R 0 the QMLE could be inconsistent,since then there are interactive fixed effects in the residuals of the model which can be correlated with the regressors but are not controlled for in the estimation. 9

the expansion coefficients,in particular analytic expressions for the approximated score and the approximated Hessian for R = R 0 . To generalize the result to R > R 0 we then show that the difference between the profile likelihood for R = R 0 and for R > R 0 is just a constant term plus a term whose dependence on the regression parameters is sufficiently small to be irrelevant for the asymptotic distribution of the QMLE.Due to the eigenvalue problem in the likelihood function,the derivation of this last result requires some very specific knowledge about the eigenvectors and eigenvalues of the random covariance matrix of the idiosyncratic error matrix.We provide high-level assumptions under which the results hold,and we show that these high-level assumptions are satisfied,when the idiosyncratic errors of the model are independent and identically normally distributed.As we explain in section 2.4,the justification of our high-level assumptions for more general distribution of the idiosyncratic errors requires some further progress in the Random Matrix Theory of real random covariance matrices,both regarding the properties of their eigenvalues and of their eigenvectors (see Bai (1999) for a review of this literature). The chapter is organized as follows.In Section 2.2 we introduce the interactive fixed effect model,its Gaussian quasi likelihood function,and the corresponding QMLE,and also discuss consistency of the QMLE.The asymptotic profile likelihood expansion is derived in Section 2.3.Section 2.4 provides a justification for the high-level assumptions that we impose,and discusses the relation of these assumptions to the random matrix theory literature.Monte Carlo results which illustrate the validity of our conclusion at finite sample are presented in Section 2.5,and the conclusions of the chapter are drawn in Section 2.6. A few words on notation.The transpose of a matrix A is denoted by A ′ .For a column vectors v its Euclidean norm is defined by v = √ v ′ v.For the n-th largest eigenvalues (counting multiple eigenvalues multiple times) of a symmetric matrix B we write n (B).For an m×n matrix A the Frobenius or Hilbert Schmidt norm is A HS =

Tr(AA ′ ),and the operator or spectral norm is A = max 0=v∈R n Av v ,or equivalently 10

A =

1 (A ′ A).Furthermore,we use P A = A(A ′ A) −1 A ′ and M A = 1 −A(A ′ A) −1 A ′ , where 1 is the m× m identity matrix,and (A ′ A) −1 denotes some generalized inverse if A is not of full column rank.For square matrices B,C,we use B > C (or B ≥ C) to indicate that B −C is positive (semi) definite.We use “wpa1” for “with probability approaching one”,and A = d B to indicate that the random variables A and B have the same probability distribution. 2.2 Model,QMLE and Consistency A linear panel regression model with cross-sectional dimension N,time-serial dimension T,and interactive fixed effects of dimension R 0 ,is given by Y = K

k=1 β 0 k X k + ε,ε = λ 0 f 0′ +e,(2.1) where Y,X k ,ε and e are N ×T matrices,λ 0 is a N×R 0 matrix,f 0 is a T ×R 0 matrix, and the regression parameters β 0 k are scalars — the superscript zero indicates the true value of the parameters.We write β for the K-vector of regression parameters,and introduce the notation β X ≡

K k=1 β k X k .All matrices,vectors and scalars in this chapter are real valued.A choice for the number of interactive effects R used in the estimation needs to be made,and we may have R = R 0 since the true number of factors R 0 may not be known accurately.Given the choice R,the quasi maximum likelihood estimator (QMLE) for the parameters β 0 ,λ 0 and f 0 is given by 6

ˆ β R , ˆ Λ R , ˆ F R

= argmin {β∈ R K ,Λ∈ R N×R ,F∈ R T×R }

Y − β X − ΛF ′

2 HS .(2.2) The square of the Hilbert-Schmidt normis simply the sumof the squared elements of the argument matrix,i.e.the QMLE is defined by minimizing the sum of squared residuals, 6 The optimal ˆ Λ R and ˆ F R in (2.2) are not unique,since the objective function is invariant under right-multiplication of Λ with a non-degenerate R×R matrix S,and simultaneous right-multiplication of F with (S −1 ) ′ .However,the column spaces of ˆ Λ R and ˆ F R are uniquely determined. 11

which is equivalent to minimizing the likelihood function for iid normal idiosyncratic errors.The estimator is the quasi MLE since the idiosyncratic errors need not be iid normal and since R might not equal R 0 .The QMLE for β 0 can equivalently be defined by minimizing the profile quasi likelihood function,namely ˆ β R = argmin β∈ R K L R NT (β),(2.3) where L R NT (β) = min {Λ∈ R N×R ,F∈ R T×R } 1 NT

Y − β X − ΛF ′

2 HS = min F∈ R T×R 1 NT Tr

(Y −β X) M F (Y −β X) ′

= 1 NT T

t=R+1

t

(Y −β X) ′ (Y −β X)

.(2.4) Here,we first concentrated out Λ by use of its own first order condition.The resulting optimization problem for F is a principal components problem,so that the the optimal F is given by the R largest principal components of the T × T matrix (Y −β X) ′ (Y −β X).At the optimum the projector M F therefore exactly projects out the R largest eigenvalues of this matrix,which gives rise to the final formulation of the profile likelihood function as the sum over its T −R smallest eigenvalues. 7 This last formulation of L R NT (β) is very convenient since it does not involve any explicit optimization over nuisance parameters.Numerical calculation of eigenvalues is very fast,so that the numerical evaluation of L R NT (β) is unproblematic for moderately large values of T.The function L R NT (β) is not convex in β and might have multiple local minima,which have to be accounted for in the numerical calculation of ˆ β R .We write L 0 NT (β) for L R 0 NT (β),which is the profile likelihood obtain from the true number of factors.In order to show consistency of ˆ β R we impose the following assumptions. 7 Since the model is symmetric under N ↔ T,Λ ↔ F,Y ↔ Y ′ ,X k ↔ X ′ k there also exists a dual formulation of L R NT (β) that involves solving an eigenvalue problem for an N ×N matrix. 12

Full document contains 186 pages
Abstract: This dissertation contributes to the econometrics of panel data models and their application to economic problems. In particular, it considers "large T " panels, where in addition to the cross-sectional dimension N also the number of time periods T is relatively large. Chapter 1 provides an introduction to the field of large T panel data econometrics and explains the contribution of the dissertation to this field. Chapter 2 analyzes linear panel regression models, allowing for unobserved factors (interactive fixed effects) in the error structure of the model. In particular, it is shown that, under appropriate assumptions, the limiting distribution of the Gaussian quasi maximum likelihood estimator for the regression coefficients is independent of the number of factors used in the estimation. The important practical implication of this result is that for inference on the regression coefficients there is no need to estimate the number of factors consistently. Chapter 3 extends the Berry, Levinsohn and Pakes (1995) random coefficients discretechoice demand model by adding interactive fixed effects to the unobserved product characteristics of this model. The interactive fixed effects can be arbitrarily correlated with the observed product characteristics, which accommodates endogeneity, and they can capture strong persistence in market shares across products and markets. A two step least squares-minimum distance procedure is proposed to estimate the model, and the asymptotic properties of this estimator are derived. This methodology is then applied to the estimation of US automobile demand. Chapter 4 proposes a new approach for higher order bias correction in large T non-linear panel data models that is based on inference on the individual effect distribution. Under appropriate assumptions it is shown that the incidental parameter bias for the estimator of the parameters of interest can converge to zero at an arbitrary polynomial rate in T, i.e. that the incidental parameter problem can vanish very rapidly in this approach as T increases. This has important implications in particular for applications where T is modestly large and N is much larger than T.