• unlimited access with print and download
    $ 37 00
  • read full document, no print or download, expires after 72 hours
    $ 4 99
More info
Unlimited access including download and printing, plus availability for reading and annotating in your in your Udini library.
  • Access to this article in your Udini library for 72 hours from purchase.
  • The article will not be available for download or print.
  • Upgrade to the full version of this document at a reduced price.
  • Your trial access payment is credited when purchasing the full version.
Buy
Continue searching

Factors predicting the use of passive voice in newspaper headlines

ProQuest Dissertations and Theses, 2011
Dissertation
Author: Linnea Margaret Micciulla
Abstract:
Information packaging researchers have found that certain factors influence active/passive voice alternations: Animacy, Definiteness and Weight influence argument order and thus choice of voice. Researchers in Critical Discourse Analysis (CDA) and psycholinguistics claim that voice is influenced by social factors, e.g. gender, social standing, or political bias. This dissertation draws from these distinct perspectives to perform probabilistic analysis of factors predicting voice in newspaper headlines, a novel research area for information packaging, and a rich source of data relevant to CDA. In the first study to examine the relative contributions of these two types of constraints, this dissertation explores the predictive values of Animacy, Definiteness and Weight, as well as four social constraints: Gender, Nationality, Age and "Badness." It also investigates using combined human and automated methods for quick and accurate data annotation. The corpus consists of US newspaper headlines published between 2002 and 2007 containing one of twelve selected verbs: accuse, aid, anger, create, encourage, frustrate, hit, hurt, injure, inspire, kill and shoot . The Animacy, Definiteness and Weight hierarchies predict that animate arguments tend to precede inanimate arguments, definite arguments tend to precede less definite arguments, and shorter arguments tend to precede longer arguments, respectively (Quirk et al. 1972, Ransom 1979, inter alia ). The present findings support these hierarchies. Of the linguistic factors, Animacy has the strongest effect. Of the social factors, Nationality and Age are not significant predictors of voice, while Badness is a significant predictor. A "Bad" argument has an increased likelihood of occurring post-verbally relative to other arguments, so that a "Bad" Actor predicts passive, while a "Bad" Undergoer predicts active voice. Gender has a marginally significant effect which differs by verb; overall, arguments with a Female Actor are likely to occur with active voice relative to Male Actors; when the verb is kill , Female Undergoers are relatively more likely to occur with active voice. The findings indicate that both social factors and traditional linguistic constraints predict voice. The results show that including social factors improves probabilistic models of grammar, and that analyses which include both linguistic and social factors provide better support for empirical claims.

TABLE OF CONTENTS CHAPTER 1 - INTRODUCTION 1 1.1 Rationale 1 1.2 Factors included 5 1.3 Types of passives included 7 1.4 Research goals 10 1.5 Coding approach 11 1.6 Organization of thesis 12 CHAPTER 2 - PREVIOUS ANALYSES 14 2.1 Types and functions of passives 17 2.1.1 Psychological functions of voice 20 2.2 Passives in the news 21 2.3 Headlines 23 2.4 Non-hierarchical approaches to passive alternations 24 2.5 Feature hierarchies 25 2.5.1 Formal hierarchies 27 2.5.2 Familiarity hierarchies 30 2.5.3 Dominance hierarchies 37 2.5.4 Complex hierarchies 40 CHAPTER 3- HIERARCHIES 44 3.1 Non-Membership Hierarchies 46 3.1.1 Animacy 46 viii

3.1.2 Definiteness 47 3.1.3 Weight 49 3.2 Membership Hierarchies 52 3.2.1 Nationality 53 3.2.2 Gender 53 3.2.3 Age 54 3.2.4 Badness 54 3.2.5 Argument structure 55 3.2.6 Semantic domains 55 CHAPTER 4 - CORPUS AND METHODOLOGY 58 4.1 Publication selection 59 4.2 Verb selection 59 4.3 Corpus creation and coding methodology 61 4.3.1 What counts as a headline? 61 4.3.2 What counts as a target verb? 63 4.3.3 Coding methodology 66 4.3.4 Data retrieval 67 4.3.5 Data extraction 68 4.3.6 Headline identification 69 4.3.7 Excluded data 70 4.3.8 Manual checking of long strings 71 4.3.9 Voice tagging 72 ix

4.3.10 Preparing sub-corpora for argument tagging 74 4.3.11 Argument identification 75 4.3.12 Determining categories 76 4.3.13 Consistency check 76 4.4 Data analysis 77 4.4.1 Verb and source as random or fixed factors 78 4.4.2 Statistical methods 78 CHAPTER 5 - OVERVIEW OF SOURCE AND VERB 83 5.1 Overview of Sources 83 5.1.1 Effect of source 88 5.1.2 Discussion 95 5.2 Overview of verbs 96 5.2.1 Effect of Verb 98 5.2.2 Verb frames and semantic roles 101 5.2.3 Verb profiles 103 5.3 Discussion 120 CHAPTER 6 - EXPLORATORY MONOFACTORIAL ANALYSES 122 6.1 Definiteness 123 6.1.1 Actor Definiteness (ACDEF) 126 6.1.2 Undergoer Definiteness (UNDEF) 129 6.1.3 Relative Definiteness (RELDEF) 132 6.1.4 Discussion 135 x

6.2 Animacy 136 6.2.1 Actor Animacy (ACANI) 137 6.2.2 Undergoer Animacy (UNANI) 139 6.2.3 Relative Animacy (RELANI) 141 6.2.4 Discussion 143 6.3 Weight 144 6.3.1 Actor Length (ACLEN), general corpus 145 6.3.2 Actor Length (ACLEN), two argument corpus 146 6.3.3 Actor Length (ACLEN), simple NP corpus 146 6.3.4 Undergoer Length (UNLEN), general corpus 147 6.3.5 Undergoer Length (UNLEN), two argument corpus 148 6.3.6 Undergoer Length (UNLEN), simple NP corpus 148 6.3.7 Relative Length (RELLEN) 149 6.3.8 Discussion 150 6.4 Gender 151 6.4.1 Actor Gender (ACGEN) 151 6.4.2 Undergoer Gender (UNGEN) 153 6.4.3 Discussion 154 6.5 Age 155 6.5.1 Undergoer Age (UNAGE) 156 6.5.2 Discussion 158 6.6 Nationality 158 xi

6.6.1 Actor Nationality (ACNAT) 159 6.6.2 Under goer Nat i onal i t y (UNNAT) 161 6.6.3 Discussion 163 6.7 Badness 164 6.7.1 Actor Badness (AcBAD) 164 6.7.2 Undergoer Badness (UNBAD) 165 6.7.3 Alternative lexeme: Police or Cop 166 6.7.4 Discussion 169 6.8 Verb Frame 169 6.8.1 Actor Theta Role (ACTHT) 169 6.8.2 Undergoer Theta Role (UNTHT) 170 6.8.3 Verb frame 173 6.9 Summary of monofactorial results 174 CHAPTER 7 - MULTIFACTORIAL ANALYSIS 176 7.1 Linguistic factors 177 7.2 Membership factors 182 7.2.1 Gender 182 7.2.2 Badness 186 7.2.3 Lexeme choice 187 7.3 Single Verbs 189 7.4 Verb Classes 215 7.5 Summary and Discussion 224 xii

CHAPTER 8 - CONCLUSIONS 228 8.1 Overview of major findings 228 8.1.1 Verb 228 8.1.2 Source 229 8.1.3 Linguistic factors 229 8.1.4 Membership factors 231 8.1.5 Methodology 232 8.2 Applications 232 8.3 Future directions 234 APPENDIX I. Circulation figures 236 APPENDIX II. Precision and recall for automated voice tagging 237 APPENDIX III. Guidelines for argument tagging 238 REFERENCES 242 VITA 250 xiii

LIST OF TABLES Table 1. Constraints Table 2. Attested animacy arguments for the verb hit Table 3. Attested definiteness arguments for the verb kill Table 4. PP attachment ambiguity by preposition for kill Table 5. Verb features Table 6. Included and excluded headliness Table 7. Verb sub-corpora Table 8. Oblique passives Table 9. Factors included in the study Table 10. Distribution of accuse, by source Table 11. Counts, general corpus Table 12. Source contributions to chi-square, general corpus Table 13. Fixed and random effects, general corpus Table 14. Counts, two argument corpus Table 15. Source chi-square contributions, two argument corpus Table 16. Fixed and random effects, two argument corpus Table 17. Sub-corpora overview, general corpus Table 18. Sub-corpora overview, two argument corpus Table 19. Pearson residuals, verb, general corpus Table 20. Pearson residuals, verb, two argument corpus Table 21. Comparative ranking from active to passive xiv

Table 22. Examples of six levels of definiteness 124 Table 23. Counts and percents of definite types 125 Table 24. ACDEF distribution 126 Table 25. Pearson residuals, ACDEF 127 Table 26. Marascuilo results, ACDEF, two argument corpus 128 Table 27. UNDEF distribution 130 Table 28. Pearson residuals, UNDEF 130 Table 29. Marascuilo results, UNDEF, Four Levels 132 Table 30. RELDEF distribution 133 Table 31. Pearson residuals, RELDEF 134 Table 32. Headlines predicted by RELDEF 135 Table 33. Examples of four levels of animacy 137 Table 34. ACANI distribution 137 Table 35. Pearson residuals, ACANI 138 Table 36. Marascuilo results, ACANI 139 Table 37. UNANI distribution 139 Table 38. Pearson residuals, UNANI 140 Table 39. Marascuilo results, UNANI 141 Table 40. RELANI distribution 142 Table 41. Pearson residuals, RELANI 142 Table 42. Headlines predicted by RELANI 143 Table 43. Animacy preferences by verb 144 xv

Table 44. ACLEN distribution, general corpus 146 Table 45. ACLEN distribution, two argument corpus 146 Table 46. ACLEN distribution, simple NP corpus 147 Table 47. UNLEN distribution, general corpus 147 Table 48. UNLEN distribution, two argument corpus 148 Table 49. UNLEN distribution, simple NP corpus 148 Table 50. RELLEN distribution, two argument corpus 149 Table 51. RELLEN distribution, simple NP corpus 150 Table 52. Distribution of gender by verb 151 Table 53. ACGEN distribution 152 Table 54. UNGEN distribution 153 Table 55. UNAGE for accuse, kill, shoot and injure, general corpus 155 Table 56. UNAGE distribution 156 Table 57. Pearson residuals, UNAGE, general corpus 156 Table 58. Marascuilo results, UNAGE, general corpus 157 Table 59. UNAGE, reduced factor levels 158 Table 60. Thirty most frequent nationalities 159 Table 61. ACNAT by region 160 Table 62. ACNAT distribution, ten most frequent 160 Table 63. Distribution of UNNAT by region 161 Table 64. UNNAT, ten most frequent 161 Table 65. Pearson residuals, UNDEF, general corpus 162 xvi

Table 66. ACBAD distribution 165 Table 67. UNBAD distribution 166 Table 68. ACLEX distribution 167 Table 69. UNLEX distribution 168 Table 70. ACTHT distribution 170 Table 71. UNTHT distribution, general corpus 170 Table 72. UNTHT distribution, two argument corpus 171 Table 73. Distribution of verb frames 173 Table 74. Correlational strength 174 Table 75. Two argument corpus model, N=12088 178 Table 76. Random effects, two argument 179 Table 77. Simple NP corpus model, N=7527 180 Table 78. Random effects, simple NP corpus 181 Table 79. ACGEN corpus model, N=260 183 Table 80. Random effects, ACGEN corpus 183 Table 81. ACGEN2 corpus model, N=590 184 Table 82. ACGEN2, random effects 185 Table 83. ACBAD corpus model, N=3009 186 Table 84. Random effects, ACBAD corpus 187 Table 85. ACLEX corpus model, N=233 188 Table 86. ACLEX distribution by SOURCE 189 Table 87. Aid corpus model, N=1814 190 xvii

Table 88. Random effects, aid corpus 191 Table 89. Anger corpus model, N=694 191 Table 90. Create corpus model, N=1896 193 Table 91. Random effects, create corpus 193 Table 92. Encourage corpus model, N=427 194 Table 93. Random effects, encourage corpus 195 Table 94. Frustrate corpus model, N=375 196 Table 95. Random effects, frustrate corpus 197 Table 96. Oblique frustrate corpus model, N=419 198 Table 97. Random effects, oblique frustrate corpus 198 Table 98. Hit corpus model, N=1495 199 Table 99. Random effects, hit corpus 200 Table 100. Oblique hit corpus model, N=1556 201 Table 101. Random effects, oblique hit corpus 201 Table 102. Hurt corpus model, N=1059 202 Table 103. Random effects, hurt corpus 203 Table 104. Oblique hurt corpus model, N=1593 204 Table 105. Random effects, oblique hurt corpus 204 Table 106. Injure corpus model, N=630 205 Table 107. Random effects, injure corpus model 206 Table 108. Oblique injure corpus model, N=1486 206 Table 109. Random effects in oblique injure corpus model 207 xviii

Table 110. Inspire corpus model, N=1700 207 Table 111. Inspire, random effects 209 Table 112. Kill corpus model, N=1228 209 Table 113. Random effects, kill corpus 210 Table 114. Oblique kill corpus model, N=1675 211 Table 115. Random effects, Oblique kill corpus 211 Table 116. Kill corpus model, UNGEN, N=196 212 Table 117. Random effects, UNGEN kill corpus 212 Table 118. Shoot corpus model, N=452 213 Table 119. Random effects, shoot corpus model 213 Table 120. ACBAD shoot model, N=435 214 Table 121. Random effects, ACBAD shoot corpus 214 Table 122. Killing verbs corpus model, N=1682 216 Table 123. Random effects, killing verbs corpus 216 Table 124. ACLEX killing verbs corpus model, N=206 217 Table 125. Random effects from the Killing ACLEX corpus model 218 Table 126. Psych verbs corpus model, N=3214 218 Table 127. Random effects, psych verbs 219 Table 128. Body verbs corpus model, N=1687 219 Table 129. Theme corpus model, N=6418 220 Table 130. Random effects, theme corpus 221 Table 131. ACLEX theme corpus model, N=221 221 xix

Table 132. ACBAD force-theme corpus model, N=1 8 9 4 222 Table 133. Random effects, ACBAD 223 Table 134. Force-Experiencer corpus model, N=1838 224 Table 135. Random effects, Force-Experiencer corpus 224 Table 136. Actor: Linguistic factors predicting argument order 225 Table 137. Undergoer: Linguistic factors predicting argument order 226 Table 138. Summary: Membership factors 227 Table 139. Accuracy measurements for voice tagging 237 xx

LIST OF FIGURES Figure 1. Hierarchy groups (Siewierska 1988) 27 Figure 2. Radial Gradient animacy (Yamamoto 1999) 40 Figure 3. Lexis Nexis examples 62 Figure 4. Lexis Nexis examples (2) 69 Figure 5. Total headlines by SOURCE 84 Figure 6. Voice distribution by SOURCE, all headlines 85 Figure 7. Voice distribution by SOURCE, two arguments 86 Figure 8. Pearson residuals, SOURCE, general corpus 90 Figure 9. Pearson residuals, SOURCE, two argument corpus 94 Figure 10. Pearson residuals, VERB, general corpus 99 Figure 11. Pearson residuals, VERB, two argument corpus 100 Figure 12. Percent passive, aid 103 Figure 13. Percent passive, create 104 Figure 14. Percent passive, anger 105 Figure 15. Percent passive, inspire 107 Figure 16. Percent passive, encourage 108 Figure 17. Percent passive, frustrate 110 Figure 18. Percent passive, hit 112 Figure 19. Percent passive, kill 114 Figure 20. Percent passive, hurt 116 Figure 21. Percent passive, injure 117 xxi

Figure 22. Percent passive, shoot 118 Figure 23. Percent passive, accuse 119 Figure 24. Headlines with actors and oblique actors 121 Figure 25. Pearson residuals, ACDEF, two argument corpus 127 Figure 26. Pearson residuals, ACDEF, reduced to Four levels 129 Figure 27. Pearson residuals, UNDEF, two argument corpus 131 Figure 28. Pearson residuals, UNDEF, general corpus 131 Figure 29. Pearson residuals, ACANI, two argument corpus 138 Figure 30. Pearson residuals, UNANI, two argument 140 Figure 31. Actor length in three corpora 147 Figure 32. Undergoer length in three corpora 149 Figure 33. ACGEN, two argument corpus 152 Figure 34. UNGEN, general corpus 153 Figure 35. UNGEN, two argument corpus 154 Figure 36. Pearson residuals, UNAGE, general corpus 157 Figure 37. Pearson residuals, UNNAT, general corpus 162 Figure 38. Pearson residuals, UNNAT, reduced to three levels 163 Figure 39. ACBAD, two argument corpus 165 Figure 40. UNBAD, general corpus 166 Figure 41. ACLEX, two argument corpus 168 Figure 42. Pearson residuals, UNTHT, general corpus 171 Figure 43. Pearson residuals, UNTHT, two argument corpus 172 xxii

Figure 44. Pearson residuals, UNTHT, reduced to three levels 172 Figure 45. Pearson residuals,verb frame 173 Figure 46. ACLEN, aid corpus 190 Figure 47. ACDEF and ACLEN, anger corpus 192 Figure 48. UNDEF, UNLEN, ACANI, encourage corpus 194 Figure 49. UNANI, ACDEF, UNDEF frustrate corpus 196 Figure 50. ACANI, ACLEN, UNANI, UNDEF, hit corpus 200 Figure 51. ACANI, UNDEF, ACDEF, UNANI, hurt corpus 202 Figure 52. UNDEF, ACANI and ACLEN, injure corpus 205 Figure 53. ACLEN, UNLEN, UNANI, UNDEF, inspire corpus 208 Figure 54. UNDEF, ACANI AND UNLEN kill corpus 210 Figure 55. ACLEX, killing verb class 217 Figure 56. Force-theme, ACLEX, ACBAD 223 xxiii

LIST OF ABBREVIATIONS AJC Atlanta Journal Constitution BG Boston Globe CD The Columbus Dispatch DN New York Daily News DP Denver Post HC Houston Chronicle MJS Milwaukee Journal Sentinel NYP New York Post NYT New York Times PD The Plain Dealer PI Philadelphia Inquirer PPG Pittsburgh Post-Gazette SDUT The San Diego Union Tribune SFC San Francisco Chronicle SLPD St. Louis Post-Dispatch SPT St. Petersburg Times ST Seattle Times USAT USA Today WP The Washington Post WSJ The Wall Street Journal xxiv

1 CHAPTER 1 - INTRODUCTION 1.1 Rationale Newspaper headlines represent a unique and important area of study for discourse analysis, theoretical linguistics, and natural language understanding. Headlines are distinct from other types of text, both linguistically and socially, such that findings based on the analysis of other types of texts may not be applicable to headlines. Since headlines are discourse-initial, there is no previously occurring text to serve as an influencing factor, although the larger social context is certainly relevant. Headlines are potentially discourse-final as well as discourse-initial - that is, they may be "discourse complete", since in some contexts they function as an index rather than as an introduction to a text. From a social perspective, they are designed to capture the reader's attention, and may be the only part of the text that is read (Mardh 1980, van Dijk 1991). Headlines are therefore an ideal subject for a study of the interaction of diverse syntactic and socially-based constraints. There have been few linguistic analyses of English-language newspaper headlines, and only Simon-Vandenbergen (1981) analyzes voice in a large headline corpus. To my knowledge, no large-scale investigation of voice alternation in newspaper headlines has been published at the time of this writing, and only a few studies (Aissen 1999, 2003, Biber et al. 1999, Dingare 2001) have analyzed the active-passive alternation in a newspaper corpus. Despite the lack of statistical evidence, passivization, since it performs the function of postposing or removing the agent of an event, has in some cases intuitively been assumed to be evidence of a biased, or ideologically slanted,

2 representation of real world events (Trew 1979, Fowler 1991, and others). This assumption of bias is generally made without taking into consideration other factors shown to affect the order of constituents, such as givenness (Chafe, 1976, Birner 1996, Birner and Ward 1998, Birner and Ward 2006), definiteness (Ransom 1979, Aissen 1999, Dingare 2001), animacy (Kato 1979, Ransom 1979, Aissen 1999, Dingare 2001) and weight (Erdmann, 1988, Birner 1996, Birner and Ward 1998, Wasow 1997, Arnold et al. 2000, McDonald et al. 1993). The current study examines the roles of these linguistic information structuring factors in conjunction with the roles of other, more socially relevant factors that could be interpreted as systematic bias, such as the gender or nationality of the arguments. I will refer to these social factors as membership constraints. The passive voice has long suffered the disdain of prescriptivists, and the field of journalism is no exception. The New York Times Manual of Style and Usage (Siegal and Connolly, 1999) says little about the use of voice in headlines, simply stating that, "Short, active verbs work best." (Siegal and Connolly, 1999:155). The Washington Post Deskbook says with reference to headlines, "Use the active voice as much as possible in preference to the passive voice." (Lippman, 1989:125). The Associated Press Guide to Newswriting (Cappon, 2000) echoes the above sentiments, and goes on to elaborate: Police arrested John Smith is shorter and crisper than John Smith was arrested by police. On many occasions, of course, news values dictate the passive form, in leads especially. If you were writing about your town's leading citizen, you'd make it Mayor John Smith was arrested by Smithtown police today. In most cases, though, the passive is flabby, dropping the doer of a deed out of the picture. (Cappon, 2000:20).

3 It has been well-documented that the active voice is unmarked relative to the passive, and that for most verbs the active is used far more frequently than the passive across genres. Still, given the prescriptive recommendations to avoid the passive voice in headlines and elsewhere, the passive occurs in headlines with surprising frequency. In the case of passives without a by-phrase, also known as "short" or "agentless" passives, it may be that the need for a short headline overrules the prescriptive guidelines. For headlines containing a "long passive," defined as including a by-phrase, an explanation other than headline length is required. Interestingly, the Associated Press guide's directives for the use of passive voice rely on membership, rather than linguistic categories, as they suggest that the passive voice could justifiably be used to highlight "your town's leading citizen." This recommendation to use the passive voice when the direct object is socially important, regardless of linguistic factors, suggests a conscious awareness on the part of the language user of the role of membership in voice alternations. The choice of passive or active voice takes on a special significance in the news genre, and particularly in headlines. Passivization has frequently been cited by discourse analysts focusing on the media as a means of 1) obscuring agency, and 2) creating distance between the event and the reader (Fowler et al. 1979, van Dijk 1993, 2000). The claim is that active sentences serve to emphasize the responsible agency of the subject, whereas passive sentences about the same action either background the agency or eliminate the agent altogether (van Dijk, 2000). Trew's (1979) oft-cited analyses of (1) and (2) exemplify this issue well:

4 (1) The Times a. RIOTING BLACKS SHOT DEAD BY POLICE AS ANC LEADERS MEET b. Eleven Africans were shot dead and 15 wounded when Rhodesian police opened fire on a rioting crowd... (The Times, June 2, 1975) (2) The Guardian a. POLICE SHOOT 11 DEAD IN SALISBURY RIOT b. Riot police shot and killed 11 African demonstrators and wounded 15 others... (The Guardian, June 2, 1975) Trew notes about (1) that, "not only is it in the passive, but the syntactic agent is deleted ('Eleven Africans were shot dead by...') and is identified only weakly by implication..." (Trew 1979:94). Trew goes on to claim that the movement of the agent to the by-phrase slot in the headline (la) and the subsequent removal of the agent in the first line of the text (lb) show an editorial bias against the rioters and in favor of the police. Trew contrasts this text with a report of the same event in the Guardian (2), where the syntactic agent remains in pre-verbal position. While it seems plausible at first glance that the syntactic choices made in the excerpt from The Times are symptomatic of bias, particularly given the counter-example from The Guardian, this conclusion is reached without presenting evidence that the passive structure is predominant in representing violence committed by "the establishment", either in The Times or in the news genre as a whole. Taken in isolation, the usage of the active voice in the individual examples given above is merely anecdotal. Furthermore, there is no analysis of linguistic factors constraining the use of the passive. By providing evidence for whether shoot is canonically used in the passive or the active, or by performing an analysis of the discourse-status of the police, the researcher could

5 provide some background within which to support the claim that (1) represents non- typical, or biased, usage. 1.2 Factors included The present study is both corpus-based and corpus-driven, as these terms are defined by Tognini-Bonelli (2001). The study is corpus-based in that it begins with a set of hypotheses about voice alternations and seeks to evaluate their validity in a particular corpus. It is corpus-driven in its incorporation of data that emerges from the corpus, leading to the proposal of new types of passives and previously untested factors, based on what is attested in the corpus rather than primarily on researcher intuition or expectation. Many of the constraints included in the present study have been given a variety of names in previous literature. For the purpose of this study, I have grouped them into the types shown in Table 1. TABLE 1. CONSTRAINTS Constraint Constraint type Based on Animacy Information structure Argument Weight Information structure Argument Definiteness Information structure Argument Gender Social Argument Nationality Social Argument Age Social Argument Badness Social Argument Lexeme Social/Lexical Argument Semantic domain Lexical Verb Argument Lexical Verb structure The first three constraints, together with a fourth information structure constraint, Person, have been widely shown to influence information structuring in general, and the

passive in particular. Since arguments in the headline corpus were overwhelmingly third person, Person is not included in the present study. Previous research has emphasized relative rather than absolute constraints; that is, constraints compared the relative values of two arguments rather than analyzing the usage patterns of a single argument. This has resulted in a focus on passives with overt agent arguments (3), and a lack of research into the more frequently occurring agentless passives, which lack an overt by-phrase (4). (3) Passive with by-phrase: Man Killed by Truck (New York Times, 9/25/ 2003) (4) Passive without by-phrase: Man Killed Near Times Square (New York Times, 11/4/2004) The effect of weight has only been measured in previous studies by comparing the promoted patient with an agent present in an oblique by-phrase (Birner 1996, Birner and Ward 1998), and indeed, without a basis for comparison, the boundary between a "light" and a "heavy" argument seems arbitrary. Although previous analyses comparing passive subjects with by-phrase agents have certainly informed our understanding of information structuring, the majority of passives do not have oblique by-phrases, and thus the most common type of passive, the agentless passive, has been left unanalyzed for the most part. Additionally, previous studies focusing on passive-active alternations have not answered questions regarding how diverse constraints work in combination with each other, nor have they considered the roles of the social or lexical factors listed in Table 1. In addition to traditional information-structuring constraints, the semantic domain and the frame of the verb are potential factors influencing the choice of voice. For examples, theme-experiencer verbs, such as alarm or frustrate, have been found to occur more frequently with passive voice than other verbs (Ferreira, 1994). According to Biber

7 et al., (1999) "The availability of the passive option is subject to a number of constraints, chiefly connected with the nature of the verb." (1999:935). This observation suggests that constraints may vary by frame, by semantic domain, or even by individual lexeme. 1.3 Types of passives included In the Longman Grammar of Spoken and Written English, Biber et al. (1999) identify a number of types of passives, including both finite and non-finite constructions. Since the goal of the present study is to analyze the passive-active voice alternation, it excludes passive constructions involving nominal postmodifiers, complements of verbs, and other non-finite constructions which do not lend themselves to truth-value active equivalents. For example, for both passive and active voice, only those headlines in which the target verb was the only finite verb in the main clause were included. As an example, in (5), the target verb hit does not meet the criterion of being the only finite main verb in the headline; therefore, these active and passive headlines were excluded. Example (5a) was excluded because of the presence of the finite verb help. Example (5b) is excluded because created is in a reduced relative clause, and (5c) is excluded because create occurs in an infinitive construction. Although there is a verb, found, which is in a reduced relative clause in (6a), the target verb, hit, meets the criteria of being the only finite verb in the main clause, therefore, this headline is included. (5) Excluded non-finite structures (target verb: create) a. 3 science grants help create programs at Southwestern (The San Diego Union-Tribune, 12/8/2005)

8 b. McFarland aims to fill the hole created by Sapp's exit (St. Louis Post- Dispatch, 6/20/2004) c. P&G deal to create behemoth (The Atlanta Journal-Constitution, 1/29/2005) (6) Included headlines (target verb: hit)-. a. Multiple vehicles hit body found in street (The Atlanta Journal- Constitution, 7/31/2007) b. Man fatally hit by car while crossing street (The Houston Chronicle, 3/1/2004) The two major classes of finite passives are the long passive, which contains a by- phrase, and the short passive, also called the agentless passive. Of the studies that propose relative constraints predicting the choice of active and passive voice, the long passive has been used to rank the agent against the patient, while agentless passives have been largely ignored. Although the presence of both an agent and a patient in the long passive lends itself to the study of information structure, the majority of passives are short. Biber et al. (1999) found that short passives outnumbered long passives in the news genre by about four-to-one. The current study considers both long and short passives, although the focus is primarily on long passives. Long passives are arguably more informative with regard to voice alternations, since both arguments that participate in the alternation are present. In addition to traditional long passives, this study explores a previously uncharted area of passive structure by including attested alternations without by-phrases, as in (7). Depending on the verb under analysis, an agent-like argument may be introduced by a preposition other than by. To my knowledge, there have been no studies published analyzing those passives in which the argument corresponding to the active subject is in a

Full document contains 275 pages
Abstract: Information packaging researchers have found that certain factors influence active/passive voice alternations: Animacy, Definiteness and Weight influence argument order and thus choice of voice. Researchers in Critical Discourse Analysis (CDA) and psycholinguistics claim that voice is influenced by social factors, e.g. gender, social standing, or political bias. This dissertation draws from these distinct perspectives to perform probabilistic analysis of factors predicting voice in newspaper headlines, a novel research area for information packaging, and a rich source of data relevant to CDA. In the first study to examine the relative contributions of these two types of constraints, this dissertation explores the predictive values of Animacy, Definiteness and Weight, as well as four social constraints: Gender, Nationality, Age and "Badness." It also investigates using combined human and automated methods for quick and accurate data annotation. The corpus consists of US newspaper headlines published between 2002 and 2007 containing one of twelve selected verbs: accuse, aid, anger, create, encourage, frustrate, hit, hurt, injure, inspire, kill and shoot . The Animacy, Definiteness and Weight hierarchies predict that animate arguments tend to precede inanimate arguments, definite arguments tend to precede less definite arguments, and shorter arguments tend to precede longer arguments, respectively (Quirk et al. 1972, Ransom 1979, inter alia ). The present findings support these hierarchies. Of the linguistic factors, Animacy has the strongest effect. Of the social factors, Nationality and Age are not significant predictors of voice, while Badness is a significant predictor. A "Bad" argument has an increased likelihood of occurring post-verbally relative to other arguments, so that a "Bad" Actor predicts passive, while a "Bad" Undergoer predicts active voice. Gender has a marginally significant effect which differs by verb; overall, arguments with a Female Actor are likely to occur with active voice relative to Male Actors; when the verb is kill , Female Undergoers are relatively more likely to occur with active voice. The findings indicate that both social factors and traditional linguistic constraints predict voice. The results show that including social factors improves probabilistic models of grammar, and that analyses which include both linguistic and social factors provide better support for empirical claims.