# Large N, T asymptotic analysis of panel data models with incidental parameters

Table of Contents Acknowledgments ii List of Tables vi List of Figures vii Abstract viii Chapter 1:Introduction 1 Chapter 2:Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Eﬀects 7 2.1 Introduction...................................7 2.2 Model,QMLE and Consistency........................11 2.3 Asymptotic Proﬁle Likelihood Expansion..................16 2.3.1 When R = R 0 ..............................17 2.3.2 When R > R 0 ..............................21 2.4 Justiﬁcation of Assumptions 2.4 and 2.5...................25 2.5 Monte Carlo Simulations............................30 2.6 Conclusions...................................32 Chapter 3:Estimation of Random Coeﬃcients Logit Demand Models with Inter- active Fixed Eﬀects 33 3.1 Introduction...................................33 3.2 Model......................................36 3.3 Estimation...................................40 3.3.1 Extension:regressor endogeneity with respect to e jt ........43 3.4 Consistency and Asymptotic Distribution of ˆα and ˆ β............45 3.5 Monte Carlo Results..............................51 3.6 Empirical application:demand for new automobiles,1973-1988......54 3.7 Conclusion...................................61 Chapter 4:Semiparametric Estimation of Nonlinear Panel Data Models with Generalized Random Eﬀects 63 4.1 Introduction...................................63 iv

4.2 Model......................................69 4.3 Description of Estimators and Main Results.................71 4.3.1 Sampling Issues (Generalized Random Eﬀect Assumption).....72 4.3.2 Identiﬁcation Issues (Smoothness Assumption on π)........75 4.3.3 Main Results..............................78 4.4 Asymptotic Analysis of the Estimators....................80 4.4.1 Uniform Consistency of ˆ θ(π).....................81 4.4.2 Score and Hessian of the Integrated Likelihood...........82 4.4.3 Joint Maximum Likelihood Estimation of θ and π.........87 4.5 Generalized Random Eﬀects..........................91 4.5.1 Imposing an Appropriate Smoothness Assumption.........92 4.5.2 Computation..............................96 4.6 Monte Carlo Simulations............................99 4.7 Conclusions...................................107 Bibliography 110 Appendix A 117 A.1 Proof of Consistency..............................117 A.2 Proof of Likelihood Expansion........................118 Appendix B 129 B.1 Alternative GMM approach..........................129 B.2 Details for Section 3.4 (Consistency and Asymptotic Distribution)....133 B.2.1 Formulas for Asymptotic Bias Terms.................133 B.2.2 Assumptions for Consistency.....................134 B.2.3 Additional Assumptions for Asymptotic Distribution and Bias Cor- rection..................................136 B.2.4 Bias and Variance Estimators.....................138 B.3 Proofs......................................140 B.3.1 Proof of Consistency..........................140 B.3.2 Proof of Limiting Distribution....................145 B.3.3 Consistency of Bias and Variance Estimators............152 Appendix C 153 C.1 Assumptions..................................153 C.1.1 Assumptions for Consistency.....................153 C.1.2 Further Regularity Conditions on the Model.............154 C.2 Proofs......................................156 C.2.1 Proofs for Section 4.4.1........................156 C.2.2 Proofs for Section 4.4.2........................158 C.2.3 Proofs for Section 4.4.3........................167 C.3 Further Discussions for Section 4.5......................169 C.3.1 Approximating Unknown Distributions...............169 C.3.2 Approximate Identiﬁcation of π(α|x).................171 v

List of Tables 2.1 Simulation results for the bias and standard error (std) of the QMLE ˆ β R .31 2.2 Simulation results for the quantiles of √ NT

ˆ β R −β 0

............31 3.1 Simulation results for speciﬁcation 1 (no heteroscedasticity)........52 3.2 Simulation results for speciﬁcation 2 (heteroscedasticity in e 0 jt ).......53 3.3 Parameter estimates (and t-values) for automobile demand estimation...56 3.4 Parameter estimates (and t-values) for model speciﬁcation A and B....56 3.5 Summary statistics for the 23 product-aggregates used in estimation...58 3.6 Estimated price elasticities for speciﬁcation B in t = 1988..........59 3.7 Estimated price elasticities for speciﬁcation C (BLP case) in t = 1988...60 4.1 Monte Carlo Results for T = 12........................105 4.2 Same as Table 4.1,but with T = 24 and only for σ π = 0.7..........108 vi

List of Figures 4.1 For T = 12 (left) and T = 24 (right) we plot

T −1 I −1 (α,θ 0 ,y i0 ).....101 4.2 Examples of “basis functions” for T = 12..................102 4.3 Same as Figure 4.2,but for T = 24.......................102 4.4 True and estimated individual eﬀect distributions..............104 4.5 Same as Figure 4.4,but with T = 24 and only for σ π = 0.7.........107 B.1 Example for multiple local minima in the objective function L(β).....132 vii

Abstract This dissertation contributes to the econometrics of panel data models and their applica- tion to economic problems.In particular,it considers “large T” panels,where in addition to the cross-sectional dimension N also the number of time periods T is relatively large. Chapter 1 provides an introduction to the ﬁeld of large T panel data econometrics and explains the contribution of the dissertation to this ﬁeld. Chapter 2 analyzes linear panel regression models,allowing for unobserved factors (interactive ﬁxed eﬀects) in the error structure of the model.In particular,it is shown that,under appropriate assumptions,the limiting distribution of the Gaussian quasi maximumlikelihood estimator for the regression coeﬃcients is independent of the number of factors used in the estimation.The important practical implication of this result is that for inference on the regression coeﬃcients there is no need to estimate the number of factors consistently. Chapter 3 extends the Berry,Levinsohn and Pakes (1995) randomcoeﬃcients discrete- choice demand model by adding interactive ﬁxed eﬀects to the unobserved product char- acteristics of this model.The interactive ﬁxed eﬀects can be arbitrarily correlated with the observed product characteristics,which accommodates endogeneity,and they can capture strong persistence in market shares across products and markets.A two step least squares-minimum distance procedure is proposed to estimate the model,and the asymptotic properties of this estimator are derived.This methodology is then applied to the estimation of US automobile demand. viii

Chapter 4 proposes a new approach for higher order bias correction in large T non- linear panel data models that is based on inference on the individual eﬀect distribution. Under appropriate assumptions it is shown that the incidental parameter bias for the estimator of the parameters of interest can converge to zero at an arbitrary polynomial rate in T,i.e.that the incidental parameter problem can vanish very rapidly in this approach as T increases.This has important implications in particular for applications where T is modestly large and N is much larger than T. ix

Chapter 1 Introduction This thesis is concerned with the statistical analysis of some particular panel data models and their application to economic problems.A panel data set consists of observations on many cross-sectional units (individuals,products,ﬁrms,countries,etc.) in multiple time periods.The information contained in such a data set is generally much richer than in a pure cross-sectional data set or in a pure time-series data set.In particular,the avail- ability of panel data makes it possible to diﬀerentiate the heterogeneous inﬂuences and characteristics that are unique to each cross-sectional unit from the structural relations or common trends that are common to all cross-sectional units. Several textbooks have been written on the subject of panel data (e.g.Hsiao (2003), Arellano (2003),Wooldridge (2010)),and it continues to be a very active research area. The goal of panel data analysis typically is to estimate and do inference on a few param- eters of interest (regression coeﬃcients,marginal eﬀects,etc.),while at the same time controlling for unobserved heterogeneity,which is often modeled by unobserved individ- ual eﬀects.There are well-established techniques for handling the unobserved hetero- geneity in purely linear panel data models,but this issue is still a serious econometric challenge for many non-linear models.One of the diﬃculties in non-linear models is that the parameters of interest may not be uniquely identiﬁed from the data in the presence of the unobserved individual eﬀects.Based on how to handle this potential identiﬁca- tion problem,and which asymptotic theory is considered,one can broadly classify the existing panel data literature as follows. Firstly,there is the “classic” panel data literature that considers point identiﬁcation and point estimation under an asymptotic where the number of time periods T remains constant,while the cross-sectional size N goes to inﬁnity.Obtaining a consistent point 1

estimator for the parameters of interest at ﬁxed T is desirable and can indeed be achieved for some non-linear models (e.g.Rasch (1960),Andersen (1970),Chamberlain (1984), Hausman,Hall and Griliches (1984),Manski (1987),Honor´e (1992),Horowitz and Lee (2004),Bonhomme (2010)).However,at ﬁxed T a non-linear panel data model may not be point identiﬁed,or may not possess a √ N-consistent estimator,as discussed by Chamberlain (2010) for the static binary choice model.Furthermore,an incidental parameter problem (Neyman and Scott (1948),see e.g.Lancaster (2000) for a review) usually appears in ﬁxed T estimation of non-linear panel data models since the num- ber of incidental parameters (individual eﬀects) grows with the sample size.Resolving this problem usually requires a model speciﬁc augmentation of standard estimation pro- cedures like maximum likelihood.We refer e.g.to Chamberlain (1984),Arellano and Honor´e (2001),and the above mentioned textbooks for reviews of this branch of the literature. Secondly,there is the “large T” panel data literature,which includes e.g.Phillips and Moon (1999),Hahn and Kuersteiner (2002),Lancaster (2002),Woutersen (2002), Hahn and Kuersteiner (2004),Hahn and Newey (2004),Carro (2007),Arellano and Bon- homme (2009),Fern´andez-Val (2009),Bester and Hansen (2009),Bai (2009b),Dhaene and Jochmans (2010);a review is provided by Arellano and Hahn (2007).This liter- ature considers an asymptotic where both panel dimensions N and T go to inﬁnity. This large T asymptotic guarantees point identiﬁcation of a very large class of models under weak regularity conditions,and provides an asymptotic solution to the incidental parameter problem.Namely,the (maximum likelihood) estimator for the parameters of interest is shown to have a bias of order 1/T,which thus vanishes asymptotically,and bias correction techniques are discussed that augment the convergence rate of the bias further. Finally,there are a couple of papers that acknowledge the fact that many non-linear panel data models are not point identiﬁed at ﬁxed T and consequently discuss set iden- tiﬁcation (bound analysis) for the parameters of interest or for certain policy parameters 2

like marginal eﬀects.These include e.g.Chernozhukov,Hahn and Newey (2005),Honor´e and Tamer (2006),Chernozhukov,Fern´andez-Val,Hahn and Newey (2009a) and Cher- nozhukov,Fern´andez-Val and Newey (2009b). These three estimation approaches for non-linear panel data models should be viewed as complements rather than substitutes.If for ﬁxed T a ( √ N-) consistent estimator is available for the particular model under consideration,then it probably should be used. If this is not the case,then chances are that the model may not be point identiﬁed at ﬁxed T and in particular for small values of T one needs to consider inference using bound analysis.However,the above cited papers on set identiﬁcation all point out that the bounds can be very tight and shrink rather rapidly as T grows.Thus,if T is suﬃciently large one can safely ignore the fact that the model might only be set-identiﬁed and simply use the large T estimation methodology. Each of the following three chapters analyzes a particular class of panel data models with distinct motivations and diﬀerent potential applications.The common theme to all three chapters is that they consider cases where both panel dimensions N and T are relatively large,i.e.they contribute to the branch of “large T” panel data mentioned above.All three chapters consider an asymptotic where both N and T go to inﬁnity, which,as noted above,is theoretical appealing since it overcomes the identiﬁcation problem and provides an approximate solution to the incidental parameter problem. In addition,this asymptotic is motivated by the fact that many panel data sets that are available nowadays indeed have relatively large N and T.This applies to microeconomic surveys (e.g.the Panel Study of Income Dynamic or the National Longitudinal Survey of Youth),to macroeconomic datasets (e.g.OECD,Eurostat and UNESCOprovide data of many countries over multiple decades),ﬁnancial datasets (e.g.the Center for Research in Security Prices and COMPUSTAT provide data on prices,earnings,ratings,etc.of thousands of companies over many decades),and other areas. Chapter 2 considers a linear panel regression model with interactive ﬁxed eﬀects. While the model has a linear regression speciﬁcation it is non-linear in the speciﬁcation 3

of the unobserved error structure,which contains (multiple) individual eﬀects and time eﬀects that interact multiplicatively.We refer to this multiplicative speciﬁcation as a factor structure or an interactive eﬀect,and both the individual eﬀects and the time eﬀects are estimated as parameters,i.e.treated as ﬁxed eﬀects.We show how to analyze the ﬁxed eﬀect Gaussian quasi maximum likelihood estimator (QMLE) of this model under the N,T →∞asymptotic.In particular,under appropriate assumptions we show that the limiting distribution of the QMLE for the regression coeﬃcients is independent of the number of interactive ﬁxed eﬀects used in the estimation,as long as this number does not fall below the true number of interactive ﬁxed eﬀects present in the data. The important practical implication of this result is that for inference on the regression coeﬃcients one does not need to estimate the number of interactive eﬀects consistently, but can simply rely on any known upper bound of this number to calculate the QMLE. Further details and references are provided in the introduction of the chapter itself.The research of this part of the thesis also entered into Moon and Weidner (2010b),which is a joint paper with Hyungsik Roger Moon. In Chapter 3 we extend the Berry,Levinsohn and Pakes (BLP,1995) random coef- ﬁcients discrete-choice demand model,which underlies much recent empirical work in industrial organization.In the context of demand estimation the panel structure is given by observing market shares and characteristics of multiple products (the cross- sectional units) in diﬀerent markets (e.g.over time).Analogous to the linear regression model in Chapter 2 we add interactive ﬁxed eﬀects in the form of a factor structure to the unobserved product characteristics,which is the structural error of the BLP demand model.The interactive ﬁxed eﬀects can be arbitrarily correlated with the observed prod- uct characteristics (including price),which accommodates endogeneity and,at the same time,captures strong persistence in market shares across products and markets.We pro- pose a two step least squares-minimum distance procedure to calculate the estimator. This estimator is easy to compute,Monte Carlo simulations show that it performs well, and we discuss an empirical application to US automobile demand.Further details and 4

references are again provided in the chapter itself.This chapter of the thesis also gave rise to a joint paper with Hyungsik Roger Moon and Matthew Shum (Moon,Shum and Weidner (2010)). Both Chapter 2 and Chapter 3 introduce interactive ﬁxed eﬀects into the error struc- ture of the respective model,i.e.there are individual eﬀects (factor loadings) and time eﬀects (factors) that are both estimated as parameters.Under the asymptotic where both panel dimensions become large one therefore faces an incidental parameter prob- lem not only in the cross-sectional dimension,but also in the time dimension,which turns out to give rise to a bias of order 1/N (or 1/J,in the notation of Chapter 3) in the estimator for the parameters of interest,in addition to the “standard” cross-sectional incidental parameter bias of order 1/T.The chapters show how to derive the magnitude of these biases asymptotically,which then also allows for bias correction.Regarding bias correction in linear factor regression models we also refer to Bai (2009b) and Moon and Weidner (2010a).Bias correction in other non-linear large dimensional panel data models with individual and time eﬀects is discussed in Fern´andez-Val and Weidner (2010). Chapter 4 of the thesis does not consider time eﬀects,but studies relatively general non-linear panel data models with individual eﬀects when both N and T become large. As mentioned above,it is well-known that the ﬁxed eﬀect estimation approach in these models typically results in an incidental parameter bias of order 1/T.We showthat under appropriate assumptions this incidental parameter bias can be substantially reduced if instead of a ﬁxed eﬀect approach one estimates the distribution of the individual eﬀects jointly with the parameter of interest by maximumlikelihood,thereby treating the individual eﬀect distribution non-parametrically.The convergence rate of the incidental parameter bias in this approach is shown to be only limited by the smoothness properties of the true individual eﬀect distribution.To allow inference on this distribution we make a “generalized random eﬀect” assumption,which requires the cross-sectional units to be partitioned into groups and imposes a random eﬀect assumption in each group.In Monte Carlo simulations we consider the dynamic binary choice model,and we ﬁnd 5

the ﬁnite sample properties of our estimator to be in accordance with the asymptotic results.The results on higher order bias correction in this chapter are particularly important in applications where T is only modestly large,while N is much larger than T,which is the case in many microeconomic panel surveys.For very large values of N bias correction becomes particularly important,since the standard error of the estimator of the parameters of interest is very small,so that even a small remaining bias may easily dominate the properties of the estimator.For further details and references we again refer to the introduction of the chapter itself.This chapter was the basis for my job-market paper (Weidner (2011)). 6

Chapter 2 Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Eﬀects 2.1 Introduction Panel data models typically incorporate individual and time eﬀects to control for het- erogeneity in cross-section and across time-periods.While often these individual and time eﬀects enter the model additively,they can also be interacted multiplicatively,thus giving rise to so called interactive eﬀects,which we also refer to as a factor structure. The multiplicative form captures the heterogeneity in the data more ﬂexibly,since it allows for common time-varying shocks (factors) to aﬀect the cross-sectional units with individual speciﬁc sensitivities (factor loadings).It is this ﬂexibility that motivated the discussion of interactive eﬀects in the econometrics literature,e.g.Holtz-Eakin,Newey and Rosen (1988),Ahn,Lee,Schmidt (2001;2007),Pesaran (2006),Bai (2009b;2009a), Zaﬀaroni (2009),Moon and Weidner (2010a). Analogous to the analysis of individual speciﬁc eﬀects,one can either choose to model the interactive eﬀects as random (random eﬀects/correlated eﬀects) or as ﬁxed (ﬁxed eﬀects),with each option having its speciﬁc merits and drawbacks,that have to be weighed in each empirical application separately.In this chapter,we consider the interac- tive ﬁxed eﬀect speciﬁcation,i.e.we treat the interactive eﬀects as nuisance parameters, 7

which are estimated jointly with the parameters of interest. 1 The advantages of the ﬁxed eﬀects approach are for instance that it is semi-parametric,since no assumption on the distribution of the interactive eﬀects needs to be made,and that the regressors can be arbitrarily correlated with the interactive eﬀect parameters. Let R 0 be the true number of interactive eﬀects (number of factors) in the data, and let R be the number of interactive eﬀects used by the econometrician in the data analysis.A key restriction in the existing literature on interactive ﬁxed eﬀects is that R 0 is assumed to be known, 2 i.e.R = R 0 .This is true both for the quasi-diﬀerencing analysis in Holtz-Eakin,Newey and Rosen (1988) 3 and for the least squares analysis of Bai (2009b).Assuming R 0 to be known could be quite restrictive,since in many empirical applications there is no consensus about the exact number of factors in the data or in the relevant economic model,so that an estimator which is not robust towards some degree of mis-speciﬁcation of R 0 should not be used.The goal of the present chapter is to overcome this problem. For a linear panel regression model with interactive ﬁxed eﬀects we consider the Gaussian quasi maximum likelihood estimator (QMLE), 4 which jointly minimized the sum of squared residuals over the regression parameters and the interactive ﬁxed eﬀects parameters (see Kiefer (1980),Bai (2009b),and Moon and Weinder (2010a)).We employ an asymptotic where both the number of cross-sectional and the number of time-serial 1 Note that Ahn,Lee,Schmidt (2001;2007) take a hybrid approach in that they treat the fac- tors as non-random,but the factor loadings as random.The common correlated eﬀects estimator of Pesaran (2006) was introduced in a context,where both the factor loadings and the factors follow cer- tain probability laws,but also exhibits some properties of a ﬁxed eﬀects estimator.When we refer to interactive ﬁxed eﬀects we mean that both factors and factor loadings are treated as non-random parameters. 2 In the literature,consistent estimation procedures for R 0 are established only for pure factor models, not for the model with regressors. 3 Holtz-Eakin,Newey and Rosen (1988) assume just one interactive eﬀect,but their approach could easily be generalized to multiple interactive eﬀects,as long as their number is known 4 The QMLE is sometimes called concentrated least squares estimator in the literature. 8

dimensions becomes large,while the number of interactive eﬀects R 0 (and also R) is constant. The main ﬁnding of the chapter is that under appropriate assumptions the QMLE of the regression parameters has the same limiting distribution for all R ≥ R 0 .Thus,the QMLE is robust towards inclusion of extra interactive eﬀects in the model,and within the QMLE framework there is no asymptotic eﬃciency loss from choosing R larger than R 0 .This result is surprising because the conjecture in the literature is that the QMLE with R > R 0 might be consistent but could be less eﬃcient than the QMLE with R 0 (e.g.,see Bai (2009b)). 5 The important empirical implication of our result is that as long as a valid upper bound on the number of factors is known one can use this upper bound to construct the QMLE,and need not worry about consistent estimation of the number of factors.Since the limiting distribution of the QMLE with R > R 0 is identical to the one with R = R 0 the results of Bai (2009b) and Moon and Weidner (2010a) regarding inference on the regression parameters become applicable. In order to derive the asymptotic theory of the QMLE with R ≥ R 0 we study the properties of the proﬁle likelihood function,which is the quasi likelihood function after integrating out the interactive ﬁxed eﬀect parameters.Concretely,we derive an approximate quadratic expansion of this proﬁle likelihood in the regression parameters. This expansion is diﬃcult to perform,since integrating out the interactive ﬁxed eﬀects results in an eigenvalue problem in the formulation of the proﬁle likelihood.For R = R 0 we show how to overcome this diﬃculty by performing a joint expansion of the proﬁle likelihood in the regression parameters and in the idiosyncratic error terms.Using the perturbation theory of linear operators we prove that the proﬁle quasi likelihood function is analytic in a neighborhood of the true parameter,and we obtain explicit formulas for 5 For R < R 0 the QMLE could be inconsistent,since then there are interactive ﬁxed eﬀects in the residuals of the model which can be correlated with the regressors but are not controlled for in the estimation. 9

the expansion coeﬃcients,in particular analytic expressions for the approximated score and the approximated Hessian for R = R 0 . To generalize the result to R > R 0 we then show that the diﬀerence between the proﬁle likelihood for R = R 0 and for R > R 0 is just a constant term plus a term whose dependence on the regression parameters is suﬃciently small to be irrelevant for the asymptotic distribution of the QMLE.Due to the eigenvalue problem in the likelihood function,the derivation of this last result requires some very speciﬁc knowledge about the eigenvectors and eigenvalues of the random covariance matrix of the idiosyncratic error matrix.We provide high-level assumptions under which the results hold,and we show that these high-level assumptions are satisﬁed,when the idiosyncratic errors of the model are independent and identically normally distributed.As we explain in section 2.4,the justiﬁcation of our high-level assumptions for more general distribution of the idiosyncratic errors requires some further progress in the Random Matrix Theory of real random covariance matrices,both regarding the properties of their eigenvalues and of their eigenvectors (see Bai (1999) for a review of this literature). The chapter is organized as follows.In Section 2.2 we introduce the interactive ﬁxed eﬀect model,its Gaussian quasi likelihood function,and the corresponding QMLE,and also discuss consistency of the QMLE.The asymptotic proﬁle likelihood expansion is derived in Section 2.3.Section 2.4 provides a justiﬁcation for the high-level assumptions that we impose,and discusses the relation of these assumptions to the random matrix theory literature.Monte Carlo results which illustrate the validity of our conclusion at ﬁnite sample are presented in Section 2.5,and the conclusions of the chapter are drawn in Section 2.6. A few words on notation.The transpose of a matrix A is denoted by A ′ .For a column vectors v its Euclidean norm is deﬁned by v = √ v ′ v.For the n-th largest eigenvalues (counting multiple eigenvalues multiple times) of a symmetric matrix B we write n (B).For an m×n matrix A the Frobenius or Hilbert Schmidt norm is A HS =

Tr(AA ′ ),and the operator or spectral norm is A = max 0=v∈R n Av v ,or equivalently 10

A =

1 (A ′ A).Furthermore,we use P A = A(A ′ A) −1 A ′ and M A = 1 −A(A ′ A) −1 A ′ , where 1 is the m× m identity matrix,and (A ′ A) −1 denotes some generalized inverse if A is not of full column rank.For square matrices B,C,we use B > C (or B ≥ C) to indicate that B −C is positive (semi) deﬁnite.We use “wpa1” for “with probability approaching one”,and A = d B to indicate that the random variables A and B have the same probability distribution. 2.2 Model,QMLE and Consistency A linear panel regression model with cross-sectional dimension N,time-serial dimension T,and interactive ﬁxed eﬀects of dimension R 0 ,is given by Y = K

k=1 β 0 k X k + ε,ε = λ 0 f 0′ +e,(2.1) where Y,X k ,ε and e are N ×T matrices,λ 0 is a N×R 0 matrix,f 0 is a T ×R 0 matrix, and the regression parameters β 0 k are scalars — the superscript zero indicates the true value of the parameters.We write β for the K-vector of regression parameters,and introduce the notation β X ≡

K k=1 β k X k .All matrices,vectors and scalars in this chapter are real valued.A choice for the number of interactive eﬀects R used in the estimation needs to be made,and we may have R = R 0 since the true number of factors R 0 may not be known accurately.Given the choice R,the quasi maximum likelihood estimator (QMLE) for the parameters β 0 ,λ 0 and f 0 is given by 6

ˆ β R , ˆ Λ R , ˆ F R

= argmin {β∈ R K ,Λ∈ R N×R ,F∈ R T×R }

Y − β X − ΛF ′

2 HS .(2.2) The square of the Hilbert-Schmidt normis simply the sumof the squared elements of the argument matrix,i.e.the QMLE is deﬁned by minimizing the sum of squared residuals, 6 The optimal ˆ Λ R and ˆ F R in (2.2) are not unique,since the objective function is invariant under right-multiplication of Λ with a non-degenerate R×R matrix S,and simultaneous right-multiplication of F with (S −1 ) ′ .However,the column spaces of ˆ Λ R and ˆ F R are uniquely determined. 11

which is equivalent to minimizing the likelihood function for iid normal idiosyncratic errors.The estimator is the quasi MLE since the idiosyncratic errors need not be iid normal and since R might not equal R 0 .The QMLE for β 0 can equivalently be deﬁned by minimizing the proﬁle quasi likelihood function,namely ˆ β R = argmin β∈ R K L R NT (β),(2.3) where L R NT (β) = min {Λ∈ R N×R ,F∈ R T×R } 1 NT

Y − β X − ΛF ′

2 HS = min F∈ R T×R 1 NT Tr

(Y −β X) M F (Y −β X) ′

= 1 NT T

t=R+1

t

(Y −β X) ′ (Y −β X)

.(2.4) Here,we ﬁrst concentrated out Λ by use of its own ﬁrst order condition.The resulting optimization problem for F is a principal components problem,so that the the optimal F is given by the R largest principal components of the T × T matrix (Y −β X) ′ (Y −β X).At the optimum the projector M F therefore exactly projects out the R largest eigenvalues of this matrix,which gives rise to the ﬁnal formulation of the proﬁle likelihood function as the sum over its T −R smallest eigenvalues. 7 This last formulation of L R NT (β) is very convenient since it does not involve any explicit optimization over nuisance parameters.Numerical calculation of eigenvalues is very fast,so that the numerical evaluation of L R NT (β) is unproblematic for moderately large values of T.The function L R NT (β) is not convex in β and might have multiple local minima,which have to be accounted for in the numerical calculation of ˆ β R .We write L 0 NT (β) for L R 0 NT (β),which is the proﬁle likelihood obtain from the true number of factors.In order to show consistency of ˆ β R we impose the following assumptions. 7 Since the model is symmetric under N ↔ T,Λ ↔ F,Y ↔ Y ′ ,X k ↔ X ′ k there also exists a dual formulation of L R NT (β) that involves solving an eigenvalue problem for an N ×N matrix. 12