# Parameter Estimation in Social Network Models

Contents List of Tables vii List of Figures viii 1 Introduction 1 1.1 Motivation:social network models....................1 1.1.1 Why model networks?......................7 1.2 Exponential random graph models...................7 1.2.1 Intractable normalizing constant.................11 1.2.2 Covariate data..........................11 1.3 Examples of ERGMs...........................12 1.3.1 Erd˝os-R´enyi-Gilbert model....................13 1.3.2 The p 1 model...........................14 1.3.3 Markov graph model.......................16 1.4 ERGM parameter estimation......................17 1.4.1 Maximum pseudolikelihood method...............17 1.4.2 Newton-Raphson.........................19 1.4.3 Markov chain Monte Carlo maximum likelihood........20 1.4.4 Steplength MCMC-MLE.....................25 1.4.5 Stochastic approximation.....................25 1.4.6 Summary of parameter estimation methods..........27 iii

Contents iv 1.5 Non-existent MLEs in exponential families...............28 1.6 Research overview.............................32 1.6.1 Long range search algorithm overview..............33 2 Background Theory 37 2.1 Exponential family theory........................38 2.2 Convex analysis..............................39 2.3 Mean value parameterization.......................41 2.4 MLE existence in exponential families..................42 2.4.1 Approach of Barndorﬀ-Nielsen (1978)..............42 2.4.2 Approach of Geyer (2009a)....................42 2.4.3 Limiting conditional model....................49 2.4.4 Generic direction of recession..................55 2.5 One-sided conﬁdence interval......................59 3 Algorithm 61 3.1 Long range search algorithm.......................61 3.2 Reﬁnements of the algorithm.......................71 3.3 Search directions.............................71 3.4 Step size..................................74 3.5 MCMC approximations..........................75 3.6 Combining with other algorithms....................77 4 Computational Geometry 79 4.1 Algorithm pseudocode extension.....................80 4.2 Convex polytope representation.....................83 4.2.1 V-representation.........................83 4.2.2 H-representation.........................84

Contents v 4.2.3 V-rep to H-rep and back.....................85 4.2.4 Finding the convex hull.....................85 4.3 Point in exterior,interior,and boundary of a convex hull.......91 4.3.1 Using H-representation of convex hull..............92 4.3.2 Strongly separating hyperplane.................93 4.4 Linearity..................................95 4.4.1 The linearity function.....................96 4.4.2 Finding an empirical face.....................99 4.5 Normal and tangent cones........................99 4.5.1 Boundary point,one-dimensional face..............100 4.5.2 Boundary point,zero-dimensional face.............102 4.6 Calculating a GDOR...........................102 4.6.1 A GDOR when g(y obs ) on one-dimensional face........103 4.6.2 A GDOR when g(y obs ) on zero-dimensional face........104 4.6.3 Empirical GDORs........................104 4.7 Subsampling................................105 5 Examples 107 5.1 Example:logistic regression.......................108 5.2 Example:Ising model...........................111 5.3 Example:Sampson’s monastery data..................117 5.4 Example:Faux Magnolia High School..................118 5.5 Example:9-node network........................122 5.5.1 Two-dimensional suﬃcient statistic...............125 5.5.2 Case:MLE exists.........................126 5.5.3 Case:MLE does not exist;observed statistic in one-dimensional face................................126

Contents vi 5.5.4 Case:MLEdoes not exist;observed statistic in zero-dimensional face................................132 5.5.5 Case:MLE exists but observed statistic is very near boundary 133 6 Discussion 139 6.1 Areas for further research........................141 6.1.1 Convergence............................141 6.1.2 Step size search..........................141 6.1.3 Switch criteria...........................141 6.1.4 Monte Carlo standard errors...................142 References 143 A Brute Force Network Statistic Counting 152

List of Tables 1.1 Sample space size for undirected networks with diﬀerent numbers of actors....................................12 1.2 Comparison of exact parameter estimation methods..........27 1.3 Comparison of MCMC parameter estimation methods.........28 5.1 Comparison of MLEs for β in logistic regression example.......110 5.2 Comparison of step sizes for SAand our algorithmfor logistic regression example..................................111 5.3 Comparison of MLEs for η in Ising model example...........114 5.4 Comparisons of step sizes for SA and our algorithm for Ising model example..................................115 5.5 Long range algorithm applied to Sampson’s monastery data.....118 5.6 Comparison of MLEs for η for MCMC-MLE and our algorithmin Faux Magnolia High example..........................120 5.7 Comparison of probabilities assigned by

LCMand original MLE model to an empirical face of a 9-node network................137 vii

List of Figures 1.1 Padgett’s (1994) Florentine marriage network.............2 1.2 Sampson’s (1968) monastery aﬃnity network..............3 1.3 E.coli transcriptional regulation network of Shen-Orr et al.(2002).6 1.4 Log likelihood ratio approximations in Sampson’s monastery example 24 1.5 Sample space of two-dimensional statistic for 9-node network model.30 1.6 Acceptable region for step size α along search direction p according to modiﬁed curvature condition.......................35 2.1 Log likelihood convergence to LCM...................54 3.1 Acceptable region for α according to curvature condition (3.3) when restricting to direction p k .........................64 3.2 Contour plots of ERGM log likelihood when MLE does not exist...73 4.1 Convex hulls for two separate samples..................91 5.1 A realized sample from an Ising model on a 32 ×32 lattice with η at phase transition value...........................112 5.2 Histogram of the η2 component for 1000 MLEs for an Ising model on a 32 ×32 lattice with η at phase transition value............116 5.3 Faux Magnolia High friendship network.................119 5.4 Network statistics for 10,000 Monte Carlo samples when MLE exists.127 viii

List of Figures ix 5.5 Network statistics for 10,000 Monte Carlo samples when MLE does not exist....................................130 5.6 Network statistics for 10,000 MC samples when observed statistic is very close to a boundary.........................136

Chapter 1 Introduction 1.1 Motivation:social network models Is it possible to build a statistical model that captures the behavioral tendencies of individuals in how they form relationships? This is the question that led us to study parameter estimation methods for expo- nential families,with a particular interest in models used to describe social network data.Formally,a social network is the collection of actors and the relations,or ties, between each pair of actors.Social scientists have studied social networks as a disci- pline since the 1930s when Jacob L.Moreno introduced the sociogram,a diagramthat corresponds to the mathematical graph with individuals in a group represented by nodes and the presence of a relation between pairs of individuals by an edge (Wasser- man and Faust,1994,Chapter 3).A sociogram depicting the marriage network data among sixteen important families in Renaissance Florence (Padgett,1994) is depicted in Figure 1.1 and another depicting the aﬃnity,or “liking” relation,among 18 monks in a monastery in New England in the late 1960s (Sampson,1968) is depicted in Figure 1.2.In such settings,a social scientist is often interested in understanding whether relations arise out of friendliness or a strategy for alliance building,that is, 1

1.1.Motivation:social network models 2 driven by actor-speciﬁc attributes or by the structure of the relations themselves. Acciaiuoli Albizzi Barbadori Bischeri Castellani Ginori Guadagni Lamberteschi Medici Pazzi Peruzzi Pucci Ridolfi Salviati Strozzi Tornabuoni Figure 1.1:Padgett’s (1994) Florentine marriage network.Padgett recorded the mar- riage network among 16 Florentine families around 1430.At the time,two factions, one revolving around the Medicis and the other around the Strozzis,vied for political control of the city.Data is available through and plotted using the ergm package (Handcock,Hunter,Butts,Goodreau,and Morris,2010) in R (R Development Core Team,2010). Stochastic network models were developed as early as 1959 in the seminal works of Gilbert (1959) and Erd¨os and R´enyi (1959),resulting in a simple probabilistic model that is now referred to as the Bernoulli model or the Erd¨os-R´enyi-Gilbert

1.1.Motivation:social network models 3 Figure 1.2:Sampson’s (1968) monastery aﬃnity network.Sampson collected data to deﬁne a “liking” network among 18 monks in a New England monastery in the late 1960s.Since “liking” is a directional relation,the presence of a relation is depicted by an arrow rather than a line.Data is available through and plotted using the ergm package (Handcock et al.,2010) in R. model.Over the last forty years,more sophisticated stochastic network models have ﬂourished;notable landmarks include the p 1 model of Holland and Leinhardt (1981) which captures the reciprocal tendency of relationships in directed networks,and the p ∗ model of Frank and Strauss (1986) which describes the transitive tendency of relationships in undirected networks.These models laid the foundation for the more general exponential random graph models (ERGM),a class of exponential family models that are now routinely used to model network data and are presented in detail

1.1.Motivation:social network models 4 in Section 1.3. 1 At their core,stochastic network models attempt to describe whether or not a tie forms between each pair of actors in a group.It has long been observed that relations, especially those involving humans,do not form in isolation;rather,they form in an interdependent manner.Whether or not a relation forms between individuals A and B may very well depend on whether or not relations form between B and C and C and A.This has profound implications for the social scientist,necessitating a new “network” perspective that recognizes the relational structures,such as a triangle of relations between actors A,B,and C,as factors in the analysis (Wasserman and Faust,1994,Chapter 1).This perspective has garnered increasing attention across diﬀerent social and behavioral science disciplines over the last forty years,as evidenced by the rapid growth in publication of social network related papers (Knoke and Yang, 2008,Chapter 1). For the statistician,the network perspective means that the model should not break the network down into independent components,each containing two actors. Instead,a good model will consider important relational structures in the observed network and clarify their contribution in shaping the global outcome.Accompanying the model must also be a computational algorithm to calibrate the model parameters to the network data.In fact,writing down the expression for an uncalibrated ERGM turns out to be quite straightforward;it is ﬁtting the model to data that is an open research problem and the focus of this dissertation. Typically,parameter estimation for statistical models is done through the method of maximum likelihood in which values for the parameters called maximum likelihood estimators (MLE) that index the model to make the observed data most likely are calculated.Many methodologies have been developed over the years to address the dif- 1 To be precise,they should be called “exponential family random graph models”,but apparently this is too cumbersome.

1.1.Motivation:social network models 5 ﬁculties in this process for exponential families including Besag’s pseudolikelihood ap- proach (Besag,1974;Strauss and Ikeda,1990),Geyer and Thompson’s (1992) Markov chain Monte Carlo Maximum Likelihood (MCMC-ML) approach,and stochastic ap- proximation. 2 In fact,it was the absence of usable methodologies in parameter estima- tion that restricted earlier models like p 1 and p ∗ to their simpler modeling capabilities. Once model parameters are properly estimated,the ﬁtted model can be used to simulate new random networks whose distribution retains essential characteristics of the observed network.Researchers can then use this distribution to test hypotheses about the process of relationship formation.In this dissertation,we consider the ability to do statistical inference as the end goal of parameter estimation.Although we focus on an algorithm to ﬁnd MLEs,our end goal is the maximum likelihood distribution,which can then be used to construct conﬁdence intervals and evaluate hypotheses. Finally,it should be noted that although problems fromthe social sciences provide much of the motivation for social network analysis and our research here,network models can in fact be applied to problems in a broad range of disciplines including po- litical science,biology,epidemiology,and computer science and may be of interest to business practitioners in the utility or transportation sectors.Actors often represent individuals but they may also represent entities like nation-states,protein molecules, airports,computers,or corporations.The relation that connects actors is often friend- ship or actor A “liking” actor B,as in Sampson’s monk data depicted in Figure 1.2, but the relation can be any kind,such as a business transaction or the Internet connectedness between computers.Figure 1.3 depicts the Escherichia coli gene tran- scription network among 423 mRNA molecules (Shen-Orr,Milo,Mangan and Alon, 2002;Salgado,Santos-Zavaleta,Gama-Castro,Mill´an-Z´arate,D´ıaz-Peredo,S´anchez- 2 Stochastic approximation has more general applications in root ﬁnding,but is frequently used for parameter estimation.

1.1.Motivation:social network models 6 Solano,P´erez-Rueda,Bonavides-Mart´ınez and Collado-Vides,2001).This data was modeled with an ERGM by Saul and Filkov (2007) and again by Hummel,Hunter and Handcock (2010). Figure 1.3:E.coli transcriptional regulation network of Shen-Orr et al.(2002).Each node is an operon (a group of genes transcribed into a single mRNA molecule) and a directed edge from one node to another indicates that the ﬁrst encodes the transcrip- tion factor that regulates the second.Data is available through and plotted using the ergm package (Handcock et al.,2010) in R.

1.2.Exponential random graph models 7 1.1.1 Why model networks? Before we step into a lengthy of discussion of how we model network data,it is worth reﬂecting upon the question of why network models are desirable or interesting.We have already given one answer to this question in the context of the social scientist trying to diﬀerentiate between competing underlying forces that can shape the global structure of relationships:a network model will allow the researcher to identify if one or more actor speciﬁc variables and network structures contribute to the characteristic of the overall network.That is,a good statistical model can see past noise in the data to provide clarity of the underlying forces that shape the structure of an observed network (Goodreau,Kitts and Morris,2009). In the context of epidemiology,a model for a contact network,which is a network of the potential disease-causing ties between individuals,may be useful in under- standing the general mechanism by which an epidemic can spread.This may in turn help formulate an intervention strategy (Welch,Bansal and Hunter,to appear).In computational biology,network models may be used to better understand the pro- cesses by which proteins interact (Goldenberg,Zheng,Fienberg and Airoldi,2009). Some other applications of networks are the links between sites on the Internet or the international relations between countries.At its essence,a network is a conduit for ﬂow,whether it be the ﬂow of diseases,commodities,data,capital,or ideas,and network models can help us understand the mechanism of this ﬂow (Kolaczyk,2010). 1.2 Exponential random graph models A network can be modeled as a random n ×n matrix Y,where n is the number of actors.Each entry Y ij in the randommatrix Y is itself a randomvariable representing

1.2.Exponential random graph models 8 a relation from actor i to actor j,such that: Y ij = ⎧ ⎪ ⎨ ⎪ ⎩ 1 if a relation exists from actor i to actor j (notation:i →j) 0 otherwise where i and j take values in 1,...,n,i = j.Note that Y ij take only values of 0 or 1, reﬂecting our restriction on networks to those with dichotomous relations,that is,the relation between a pair of actors is either present or absent.In addition,we do not allow i →i and always have Y ii = 0.In the special case that Y ij = Y ji and thus the matrix Y is symmetric,the network is referred to as an undirected network or graph, such as in the case of the Florentine marriage network depicted in Figure 1.1.A network is directed if it is not undirected,as in the case of monastery aﬃnity network depicted in Figure 1.2. The exponential family random graph model (ERGM) commonly used in the network literature for Y has probability mass function of the following form: f η (y) = P η (Y = y) = 1 κ(η) e η,g(y) y ∈ Y,(1.1) where g(y) is a d-vector of natural statistics,η is a d-vector of natural parameters, ·,· denotes the bilinear form η,g = d

i=1 g i η i , and Y is the sample space of all possible networks with n actors.So that (1.1) integrates to 1,κ(η) is a normalizing constant such that κ(η) =

e η,g(x) dμ(x) (1.2)

1.2.Exponential random graph models 9 where μ is a measure on Y.In fact,(1.1) is exactly the form of an exponential family distribution in canonical form (Lehmann,1983,Chapter 1.4).The only distinction that makes this an ERGM is that Y is speciﬁed to be the discrete state space of all possible network conﬁgurations. We rely on many properties of exponential families.Deﬁne the natural parameter space Ξ as the set of points η = (η 1 ,...,η d ) that are parameter values indexing distributions in the model.An exponential family is full if the natural parameter space is Ξ = {η ∈ R d :κ(η) < ∞},(1.3) and regular if,in addition,Ξ is an open set.We say an exponential family is min- imal if neither the natural parameter nor the natural statistic is concentrated on a hyperplane.Minimality guarantees that if an MLE,denoted ˆη MLE ,exists,it is unique (Geyer,2009a). Deﬁne the cumulant function as c(η) = log κ(η) and denote the observed data as y obs .We can then express the log likelihood of (1.1) as (η) = η,g(y obs ) −c(η).(1.4) The appeal of exponential families in the setting of complex dependence phenom- ena such as networks stems from their simplicity and maximum entropy property (Jaynes,1978;Geyer and Thompson,1992).By choosing statistics of interest on the data,one fully speciﬁes a model that,from a maximum entropy perspective,gives the most reasonable inference possible derived solely from those statistics.Further- more,exponential families have been well-studied (Barndorﬀ-Nielsen,1978;Brown, 1986) and utilized over the decades and have desirable properties such as the MLE uniqueness noted above.

1.2.Exponential random graph models 10 Ideally then,a network researcher need only specify relational structures of in- terest to deﬁne an ERGM.Wasserman and Pattison (1996);Pattison and Wasser- man (1999);Anderson,Wasserman and Crouch (1999);Robins,Pattison,Kalish and Lusher (2007a) describe many of the classical network statistics that one might in- clude in the natural statistic vector g(y).In these works,the researchers’ primary consideration in deﬁning a network statistic is a relational structure’s scientiﬁc inter- pretability.For example,in a directed aﬃnity network,a sociologist may be interested in the propensity for individuals to form reciprocal relations,where ties exist i → j and j →i,or transitive relations,where ties exist i →j,j →k,i →k.The statistics vector g(y) that captures these can be deﬁned by g(y) =

i

i =j =k y ij y jk y ik

where its components count the number of reciprocal and transitive relational struc- tures in the network y.Such a model,with parameters appropriately calibrated to the aﬃnity network,should then generate networks that exhibit a frequency of recip- rocal and transitive relations similar to that of the observed data.We will not discuss the merits of all the diﬀerent network statistics here;in fact,there is essentially an unlimited number of potential network statistics.What is important is that these statistics can be transparently calculated for a given network and the inclusion of them in g(y),paired with their parameter components in η,allows the model (1.1) to calculate probabilities of diﬀerent global network outcomes y.Hunter,Goodreau and Handcock (2011) even provide an interface for one to customize and create her own network statistic of interest and model it in the ergm package in R. More recently,Snijders,Pattison,Robins and Handcock (2006);Hunter and Hand- cock (2006);Robins,Snijders,Wang,Handcock and Pattison (2007b) developed new network statistics with particular emphasis on the properties of the distributions as

1.2.Exponential random graph models 11 opposed to the scientiﬁc interpretability of the statistics themselves.It had repeat- edly been discovered that for some ﬁtted models,the random networks generated from them did not at all resemble the observed network.These new statistics are intended to reduce the occurrence of such cases.A fairly complete description of these more recent network statistics is in Morris,Handcock and Hunter (2008).We make use of one of these network statistics in the example in Section 5.4. 1.2.1 Intractable normalizing constant We now return to the issue of why parameter estimation for ERGMs and similar exponential family models on discrete state spaces can be challenging.Despite the tremendous appeal of the exponential family framework,one immediate problem is that the summation in (1.2) for the normalizing constant κ(η) is over all possible networks in the sample space Y and can be prohibitively expensive to evaluate for networks of even moderate size.For an undirected network with n actors,there are 2 ( n 2 ) diﬀerent possible networks in Y.Table 1.1 shows how rapidly this number grows; unless dealing with networks of 9 actors or less,the likelihood function should not be directly evaluated.See Appendix A for our approach for counting simple network statistics in the 9-node network relying on numerous tricks in data representation and computation. 1.2.2 Covariate data Information about a particular actor,say gender,can be incorporated as covariate data into the model with little diﬃculty.Often,a social scientist looks to include such information because she is interested in whether or not there is an assortative mixing eﬀect,that is,whether individuals of the same “type” tend to make more relations with others of that type (Goodreau et al.,2009).Suppose we wish to

1.3.Examples of ERGMs 12 Table 1.1:Sample space size for undirected networks with diﬀerent numbers of actors. Nodes Possible Edges Total Graphs 5

5 2

= 10 2 10 = 1024 6

6 2

= 15 2 15 = 32,768 7

7 2

= 21 2 21 = 2,097,152 8

8 2

= 28 2 28 = 268,435,456 9

9 2

= 36 2 36 = 68,719,476,736 10

10 2

= 45 2 45 = 3.518437 ×10 13 incorporate p such exogenous attributes for our n-actor network.This information can be represented by an n×n×p matrix X,whose ijkth element is the value of the kth attribute in the potential relation from actor i to j (Fienberg and Wasserman, 1981;Hunter,Handcock,Butts,Goodreau and Morris,2008b).We include such factors in our model of an adolescent friendship data set in Section 5.4. Like the other network statistics discussed,the statistics that comprise X can be transparently calculated from data and hence included in the canonical statistics vector as g(y,X),with the canonical parameter vector η lengthened accordingly.The parameter estimation methodology is unchanged from before (so long as p is not excessively large),and while this greatly expands the usefulness of ERGMs to the researcher,there are no new issues for us to consider from a statistical modeling perspective.Thus we will continue to use g(y) instead of g(y,X) for simplicity. 1.3 Examples of ERGMs We now review the classical Erd˝os-R´enyi-Gilbert,p 1 ,and p ∗ models to give both a sense of commonly used network structures as well as parameter estimation method- ologies.

1.3.Examples of ERGMs 13 1.3.1 Erd˝os-R´enyi-Gilbert model The simplest example of an exponential random graph is the Erd˝os-R´enyi-Gilbert model (Erd¨os and R´enyi,1959;Gilbert,1959),also referred to as a Bernoulli network model,which assumes that each actor forms a relation to every other actor inde- pendently with the same probability p (Hunter et al.,2008b).The ERGM can be expressed as P η (Y = y) = 1 κ(η) e ηg(y) y ∈ Y, where the only network statistic is a count of the number of edges for the directed network g(y) =

i =j y ij and the probability of a tie formation between any pair of actors is the same, p = P η (Y ij = 1) = e η 1 +e η . Then the log likelihood can be expressed as (η) = η

i =j y ij −n(n −1) log(1 +e η ). The MLE of η,ˆη MLE ,can be found analytically to be the logit of the fraction of ties that are present in the observed data set, ˆη MLE = logit

i =j y obs,ij n(n −1)

where n(n − 1) is the number of possible ties in a directed network with n actors.

1.3.Examples of ERGMs 14 The MLE for η is thus easily calculated from the observed data.The independence assumption,however,is too unrealistic for all but the simplest of cases;usually,a researcher is interested in modeling diﬀerent probabilities of tie formations between actors.Thus the Erd˝os-R´enyi-Gilbert model may be most useful as a “null” model, though it is arguably too simple even for this. 1.3.2 The p 1 model Holland and Leinhardt (1981) made advances in relaxing this independence assump- tion with their p 1 model.They focused on two empirical observations fromsociometric studies: • Reciprocation:there tend to be a “surplus” of mutual relationships in network data sets compared to a uniform distribution of directed relationships. • Stars:some individuals attract a surplus of choices compared to a uniform distribution of directed relationships. Holland and Leinhardt then constructed a family of distributions with parameters to control the probability of observing diﬀerent numbers of mutual relationships and stars.Focusing on the dyad,the set of a pair of actors and the possible relations between them,as the basic building block,they proposed the following model: P(Y = y) = 1 K(ρ,θ,{α i },{β j }) exp

ρm(y) +θy ++ +

i α i y i+ +

j β j y +j

1.3.Examples of ERGMs 15 subject to

i α i =

j β j = 0,where ρ = “force of reciprocation” or mutuality parameter m(y) =

i =j y ij y ji ,total number of mutual relationships in y θ = “density” or overall choice eﬀect parameter y ++ =

i =j y ij ,total number of relations in y α i = “productivity” or “expansiveness” eﬀect parameter for node i y i+ =

j y ij ,“out-degree” for node i in y β j = “attractiveness” or “popularity” eﬀect parameter for node j y +j =

i y ij ,“in-degree” for node j in y K = normalizing constant By deﬁning new dyad random variables,D ij = (Y ij ,Y ji ),Holland and Leinhardt showed that with some algebraic manipulation,the form of the model above can be viewed as a log-linear model with independent dyad randomvariables D ij .This makes it possible to use a standard logistic regression to calculate MLEs of the parameters. The statistical independence at the dyad level,however,means that this model will not capture triangular tie conﬁgurations (or anything more complicated) in which dyads are dependent.Also,to reduce the number of parameters,the model assumes ρ ij = ρ and θ ij = θ,meaning that the tendency towards reciprocity and forming relations is assumed to be the same across all actors.This example illustrates how the parameter estimation methodology limits the scope of the model and what types of behavior it can capture.

1.3.Examples of ERGMs 16 1.3.3 Markov graph model Frank and Strauss (1986) relaxed the independence assumption further with the im- plementation of Markov dependence in which two dyads are independent,conditional on the rest of the graph,when they do not share a node.The model uses only three conﬁgurations in an undirected network,expressed as: P(Y = y) = 1 K(θ,σ,τ) exp{θL +σS +τT} where θ = edge parameter L =

i

i

i

1.4.ERGM parameter estimation 17 rameter to determine which ﬁt the data best.The authors also obtained maximum pseudolikelihood estimators (MPLE) from a standard logistic regression (we discuss the pseudolikelihood approach in Section 1.4.1) and observed that the MPLEs are close to those they arrived at from their rigorous simulations.Frank and Strauss seem to conclude that MPLEs are generally quite acceptable. 1.4 ERGM parameter estimation Exponential random graph models have a deceptively simple form (1.1) that belies the challenges involved in model parameter estimation.As discussed in Section 1.2.1, the diﬃculties arise froma normalizing constant (1.2) which may involve a summation over an astronomical number of terms.In this section,we discuss commonly used parameter estimation methods used for exponential families with complex dependence like ERGMs.All approaches avoid evaluating the likelihood function directly but have properties that we ﬁnd undesirable.The methods are aided by a strictly concave log likelihood function (see Section 2.1) and thus there is no worry about local maxima. There can be at most one local maximum,which (if it exists) is the global maximum. 1.4.1 Maximum pseudolikelihood method As mentioned in Section 1.3.2,Frank and Strauss (1986) successfully applied the maximum pseudolikelihood method to social network models,a method ﬁrst used in lattice systems in plant biology (Besag,1974,1975).Strauss and Ikeda (1990) fur- ther justiﬁed the use of maximum pseudolikelihood estimators (MPLE) as reasonable approximations for MLEs in social network models.Wasserman and Pattison (1996); Pattison and Wasserman (1999);Anderson,Wasserman and Crouch (1999) leaned on this result to broaden the scope of network models,allowing for any combination of network structures to be considered.The result is (1.1),the more general form of the

1.4.ERGM parameter estimation 18 ERGM that is currently used. The method of maximum pseudolikelihood ﬁnds the values for the parameters that maximize the pseudolikelihood function for the observed data set,which can be constructed from the densities of Y ij conditional on the rest of network,denoted by f Y ij (y ij | rest).The pseudolikelihood function PL(η) is deﬁned to be the product of these densities, PL(η) =

i =j f Y ij (y ij | rest),(1.5) and in general is diﬀerent than the likelihood function. Deﬁne the vector of change statistics δ g (y) ij to be δ g (y) ij = g(y + ij ) −g(y − ij ) where y + ij and y − ij represent networks with y ij = 1 and y ij = 0,respectively,while leaving the rest of the network as y.Thus δ g (y) ij is the change in g(y) when y ij changes from 0 to 1.The conditional distribution of Y ij | rest for an ERGM is then a Bernoulli distribution with log odds log

P(Y ij = 1 | rest) P(Y ij = 0 | rest)