Comparison of rationally-derived and empirically-derived methods for predicting failure in residential treatment
Contents Approval Page iii Acknowledgements iv Table of Contents v List of Figures viii List of Tables ix List of Abbreviations xi Abstract xii Chapter 1. Statement of the Problem 1 2. Literature Review 3 Efficacy Research 4 Effectiveness Research 5 Patient-Focused Research 6 Clinical Versus Statistical Prediction 9 Outcome Measurement and Quality Management 14 The COMPASS Outpatient Treatment Assessment System 16 The Stuttgart-Heidelberg model 19 The Clinical Outcomes in Routine Evaluation System. 22 The Outcome Questionnaire-45 26 Use of the OQ-45 to Predict Outcome 29 Rationally-derived method 31 Empirically-derived method 35 Accuracy of methods 39 The Youth Outcome Questionnaire 46 YOQ rationally-derived method 51 YOQ empirically-derived method 57 Comparison of YOQ methods 60 v
Outcome Measurement in Residential Treatment Centers 60 Research Questions and Hypotheses 74 3. Methods..... 77 Participants 77 Measure 77 Reliability 79 Validity 80 Sensitivity and specificity 81 Procedures 82 Analyses 96 4. Results 101 Predicted Outcome 101 Actual Outcome 102 Accuracy of Treatment Failure Predictions 103 Positive predictive value 103 Statistical false positives 105 Hit and miss rates 106 Sensitivity 107 Specificity 108 Negative predictive value 108 Summary of method accuracy 110 Summary of accuracy by respondent group I l l Parent-report respondent group 112 All But Self respondent group 113 Agreement Between Methods' Predictions of Treatment Failure 115 Use of Both Methods 117 Speed of Methods in Initial Signal-Alarm Identification 118 Use of Final YOQ Score in Prediction of Outcome 119 Original Versus Conservative Respondent Groups 126 5. Discussion 138 Summary of Results 139 Interpretation of Results 143 Comparison with Results of Previous Studies 150 Implications of Findings 158 Limitations 162 Directions for Future Research 166 VI
Concluding Summary 168 References 170 vn
Figures ures Page 1. Example of OQ-45 rationally-derived algorithm for sessions 2 - 4 33 2. Example of ERC generated by OQ-45 empirically-derived method 38 3. Original vs. conservative overall accuracy for the All Reports group..... 127 4. Original vs. conservative overall accuracy for the Self-Report group 128 5. Original vs. conservative overall accuracy for the Parent-Report group 128 6. Original vs. conservative overall accuracy for the House Counselor group 129 7. Original vs. conservative sensitivity and specificity, All Reports group 130 8. Original vs. conservative sensitivity and specificity, Self-Report group 131 9. Original vs. conservative sensitivity and specificity, Parent-Report group 131 10. Original vs. conservative sensitivity and specificity, House Counselors 132 11. Original vs. conservative PPV andNPV, All Reports group 133 12. Original vs. conservative PPV and NPV, Self-Report group 133 13. Original vs. conservative PPV and NPV, Parent-Report group.. 134 14. Original vs. conservative PPV and NPV, House Counselor group 134 Vl l l
Tables Tables Page 1. Steps taken in preparation of data for analysis 83 2. Demographics for dataset and subgroups 90 3. Comparison between original database and groups retained for analysis 92 4. Definition of test result terms for this study 95 5. Outcome predicted for each respondent group by RDM vs. EDM 101 6. Actual outcome for each respondent group 102 7. Accuracy of rationally- and empirically-derived methods 104 8. PPV of RDM and EDM and chance probability of accurate prediction 105 9. Hit rate for rationally- and empirically-derived methods 106 10. Sensitivity and specificity of RDM and EDM 107 11. NPV for rationally-derived and empirically-derived methods 109 12. Overall accuracy of RDM and EDM 110 13. Accuracy of predictions from single versus mixed respondent data 112 14. Test characteristics for All But Self respondent group 114 15. Signal-alarms generated by each method alone and jointly 116 16. Comparison of speed of identification of signal-alarms 119 17. Use of Final YOQ Score, All Reports 3+ YOQ respondent group 121 18. Use of Final YOQ Score, Self-Report 3+ YOQ respondent group 122 ix
19. Use of Final YOQ Score, Parent-Report 3+ YOQ respondent group 123 20. Use of Final YOQ Score, House Counselor 3+ YOQ respondent group 124 21. Rationally-derived method values for conservative approach 136 22. Empirically-derived method values for conservative approach 137 x
Abbreviations RTC HMO OQ-45 HLM ERC YOQ CBCL CPRS Ohio Scales ROC curve AUC BYU PPV NPV RDM EDM SA TF TP FP TN FN Residential Treatment Center Health Maintenance Organization Outcome Questionnaire - 45 Hierarchical Linear Modeling Expected Recovery Curve Youth Outcome Questionnaire Child Behavior Checklist Connors' Parent Rating Scale Ohio Youth Problems, Functioning, and Satisfaction Scales Receiver Operating Characteristic curve Area Under (a ROC) Curve Brigham Young University Positive Predictive Value Negative Predictive Value Rationally-Derived Method Empirically-Derived Method Signal-Alarm Treatment Failure True Positive False Positive True Negative False Negative
Abstract of the Dissertation Comparison of Rationally-Derived and Empirically-Derived Methods for Predicting Failure in Residential Treatment by Jennifer Pester Grattan Doctor of Philosophy in Clinical Psychology Loma Linda University, September 2009 Dr. David A. Vermeersch, Chairperson Patient-focused research methods have been used in adult mental health treatment to improve outcomes by tracking individual treatment response and comparing it with expected recovery patterns. One such approach has used rationally- and empirically- derived methods to analyze data from the OQ-45 and identify patients who are not responding as expected to treatment. Treatment is then adjusted, improving outcomes and lowering overall costs. Similar but less extensive research has shown analogous methods can be used with children and adolescents. This would be particularly useful in residential treatment, which is an expensive and inadequately researched approach. This study used archival data gathered according to routine clinical procedures to compare the accuracy of a rationally-derived method (RDM) and an empirically-derived method (EDM) in predicting treatment failure on the basis of YOQ scores for 812 children and adolescents in residential treatment. Both methods were found to predict treatment failure more accurately than would be expected by chance. Performance of the methods was roughly similar to the observations of previous OQ-45 and YOQ studies. The RDM generated more indiscriminate predictions of treatment failure, earlier in treatment, while the EDM was xii
more selective in its predictions, which typically occurred slightly later in treatment. Overall, the EDM was most accurate, but it is recommended that joint use of the methods as a two-stage warning system be considered, as this would maximize the strengths and minimize the weaknesses of each method. The use of data from YOQs completed by various respondents was also examined. Combined data from multiple respondents, such as parents, clinicians, and house counselors, generated outcome predictions that were just as accurate as those made using single-respondent data. For this sample, it was found that self-report YOQ data evaluated independently of other-report data also generated usefully accurate outcome predictions. Finally, inclusion of participants with only 2 YOQs available for analysis was found to significantly inflate method accuracy. It is recommended that future studies use only participants with 3 or more YOQs. Omission of the final YOQ from the prediction pool may also be advisable, although this did not significantly alter method accuracy in the current study. Xl l l
Statement of the Problem Ethical and financial considerations are placing increasing pressure on mental health professionals to provide concrete evidence for the usefulness of their services. This move toward evidence-based practice has created increased research interest in the identification of empirically supported treatments. These attempts to evaluate certain psychotherapeutic methods or theoretical orientations typically use research designs that are structured to evaluate the efficacy or effectiveness of the treatment approach in question. Efficacy and effectiveness research methods measure how well mental health services can and do work in general, but do not provide information about individual treatment response. Quantification of treatment progress and outcome at the level of the individual patient is essential information for clinicians, the patients they serve, and the insurance companies and HMOs that are increasingly relied upon for reimbursement for services. Patient-focused research methods have thus evolved to provide this information by tracking individual treatment response in a way that is easily monitored by clinicians, case managers, and researchers. Quality management programs that utilize patient-focused research methodology compare individual responses to treatment with expected recovery patterns. This allows for the identification of patients who are responding better than expected and who may be appropriate for termination, as well as the identification of patients who are not responding as expected to treatment and who may deteriorate or prematurely terminate therapy. This is particularly useful, as clinicians have been found to be poor at identifying patients who are likely to have negative treatment outcomes. The use of 1
questionnaires and statistical methods of prediction to identify idiosyncratic response patterns creates the opportunity for adjustment of treatment, which is likely to result in improved outcomes and lower costs. This is especially desirable in the case of residential treatment centers (RTCs) for children and adolescents, which often receive government funding to provide services for troubled youth. In order to optimize their use of this funding, RTCs would likely benefit from a system for identifying which residents are at the greatest risk of not responding to the program, worsening, or prematurely leaving the program so that they could then provide them with extra support or more intensive treatment, and, alternately, identifying as soon as possible residents who have improved adequately and are no longer in need of services. While several quality management programs have been developed in the United States and Europe, application of these methods to this population is lagging, in part due to the historically somewhat anemic body of research on this patient population. Research is needed to evaluate the accuracy and utility of the adaptation of successful quality management programs for adult outpatient therapy for use with children and adolescents in residential treatment. The purpose of this study is to evaluate and compare the accuracy of two methods for using the information gathered by one such program in a RTC population to identify those patients who are at risk for negative treatment outcomes. Specifically, the study will address the following main questions: 1.) How accurately do these methods predict treatment failure? 2.) Do the two methods differ in which residents are identified as at-risk? 3.) Does combination of the methods yield more accurate predictions than their individual use? 4.) Do the two methods differ in how early in treatment at-risk residents are identified?
Literature Review The professional climate surrounding the provision of mental health care services has become increasingly dependent upon third-party reimbursement over the past few decades. Whereas clinicians once functioned primarily independently, being paid directly by their patients, many clinicians now work within the context of organizations, HMOs, or insurance companies (Inglehart, 1996). As Lambert (2001a) observed in his introduction to a special patient-focused research section of the Journal of Counseling and Clinical Psychology, "the power of economic forces to change practice patterns, theories of change, and even research agendas cannot be underestimated" (p. 148). This shift in payment structure has brought with it corporate demands for quality management and evidence-based practice, which have made it increasingly necessary for clinicians to both use empirically supported treatments and be able to make tangible (and therefore billable) the subjective change experienced by their patients (Barlow, 2000; Kelly, 1996; Lambert, Strupp, & Horowitz, 1997; Leibert, 2006; Whiston, 1996). The official American Psychological Association (APA) policy statement on evidence-based practice defines it as "the integration of the best available research with clinical expertise in the context of patient characteristics, culture, and preferences" (APA, 2006, p. 284). While identification of empirically supported treatments by traditional research methods such as efficacy and effectiveness research is primary within this definition, the inclusion of the role of clinical expertise in the definition of evidence- based practice acknowledges the reservations researchers have voiced regarding the gap between the growing list of empirically supported treatments and the degree to which these approaches are actually used by clinicians (e.g., Addis, Wade, & Hatgis, 1999; 3
Barlow, Levitt, & Bufka, 1999; Mussell et al., 2000; Persons, 1995; Ruscio & Holohan, 2006; Wolfe, 2006). Goldfried and Wolfe (1996) observed that, as the methodological rigor of psychotherapy research has increased to an extreme of mimicking a medical model with discrete disorders and treatments, it has become progressively more detached from the clinical reality of individual patients who, throughout the course of their treatment, present clinicians with very idiosyncratic dilemmas. Ruscio and Holohan (2006) suggest another reason for the gap between research and practice is the dissimilarity between the patients selected for inclusion in the randomized controlled trials common to efficacy research and the more complex patients seen in the typical clinical practice. This view is supported by a multidimensional meta-analysis of research on empirically supported treatments (ESTs), conducted by Westen and Morrison (2001), who found that approximately two-thirds of the individuals who sought to participate in the 34 EST studies they analyzed were excluded on the basis of such factors as comorbid anxiety, mood, or personality disorders, substance abuse, suicide risk, physical problems, or past psychotherapy. The more stringent the exclusion criteria of a study were, the more efficacious the treatment was shown to be. Efficacy Research Efficacy research is considered the "gold standard" for treatment outcome research and the identification of empirically supported treatments (Kendall, 1998; K. B. Wells, 1998) because it maximizes internal control of variables through methodology such as stringent inclusion/exclusion criteria, random assignment of participants to treatment and control groups, and use of manualized treatment protocols. It is thus
5 successful at isolating and demonstrating treatment effects; however, this results in what is known as the interpretability/generalizability dilemma (Goldfried & Wolfe, 1996), as maximization of internal control and thus interpretability (which is necessary for the isolation of treatment effects) comes at the cost of external validity, making the results less generalizable to other, less controlled conditions, such as those encountered in routine clinical practice (Nathan, Stuart, & Dolan, 2000). Effectiveness Research While efficacy research seeks to determine the degree to which a proposed treatment can work under ideal conditions, the goal of effectiveness research is to determine the degree to which a proposed treatment does work in routine clinical practice (Howard, Moras, Brill, Martinovich, & Lutz, 1996). In order to address this question, effectiveness research is conducted in real-world clinical settings, which often precludes use of the experimental controls efficacy research uses to increase internal validity, but allows for assessment of the ecological validity and generalizability of the conclusions of efficacy research. Effectiveness research, like efficacy research, focuses on treatments and the average group response to them (Howard et al.). Some researchers, such as Westen and Morrison (2001), have theorized that the addition of effectiveness research may fill the gaps created by the shortcomings of efficacy research. Lambert (2001b) has countered "in practice it is difficult to distinguish efficacy and effectiveness studies. In addition, there is little evidence to suggest that these designs produce a different picture of outcome" (p. 911). Goldfried and Wolfe (1996) expressed concern that "the conceptual and methodological constraints associated with outcome research may become clinical constraints for the practicing therapist" (p.
6 1007), and in response to this concern encouraged the development of "a new outcome research paradigm that involves an active collaboration between researcher and practicing clinician" (p. 1007). Lambert (2001b) further asserted neither efficacy nor effectiveness research "maximizes treatment effects to their full extent" (p. 912), and proposed that clinical outcomes would be most improved by the use of outcome management procedures in which each patient's response to treatment is measured on an ongoing basis, in order to provide clinicians with real-time feedback so they can make modifications in treatment as needed. Patient-focused Research In a landmark article addressing the limitations of efficacy and effectiveness research, Howard et al. (1996) introduced a new research approach: patient-focused research. They observed that efficacy and effectiveness research leaves unanswered the most salient question facing clinicians, which is whether or not the individual patient is responding to treatment. Drawing from previous research on the dose-response and phase models of psychotherapeutic effectiveness (Howard, Kopta, Krause, & Orlinsky, 1986; and Howard, Lueger, Maling, & Martinovich, 1993, respectively), they suggested a method of patient profiling where data is continuously gathered regarding the actual progress of patients, then compared with a graph of the pattern of progress predicted for each patient, which is generated from existing data on clinical characteristics and expected treatment response. They described this method as making possible the early identification of patients who are not responding as expected, which could then inform decision-making regarding "the appropriateness of the current treatment and the need for further treatment" (p. 1063).
7 The dose-response model of psychotherapeutic effectiveness developed by Howard and colleagues (Howard et al., 1986) demonstrated what they described as a lawful linear relationship "between the log of the number of sessions and the normalized probability of patient improvement" (Howard et al., 1996, p. 1060). This, known as a log-normal relationship, or, alternately, the dose-effect or dose-response relationship, "reflects that more and more efforts (e.g., sessions, trials, or milligrams of medication) are needed to produce incremental changes in the desired response" (Howard et al., 1996, p. 1060). Reflections on potential reasons for this dose-response relationship led Howard, Lueger, et al. (1993) to the development of the phase model of psychotherapy, which proposed three distinct phases of change during therapy: remoralization (improvement in subjective well-being), remediation (improvement of symptoms), and rehabilitation (improvement in life functioning through change of maladaptive behaviors). These phases were demonstrated by Howard, Lueger et al. (1993) to be "probabilistically, sequentially, and causally dependent" (Howard et al., 1996, p. 1061), meaning that remoralization leads to and is necessary for remediation, which leads to and is necessary for rehabilitation, which is the fundamental goal of treatment. Progression through these phases underlies the "increasing difficulty of achieving treatment goals over the course of psychotherapy" (Lutz, 2003, p.747), which is shown in the decelerating curve of improvement predicted by the dosage model of psychotherapeutic effectiveness. The development of these models of treatment response, as well as concurrent advances in statistical techniques (e.g., hierarchical linear modeling; Lutz, Martinovich, & Howard, 1999), have made possible the monitoring of individual patients and the
8 discernment of clinically significant and reliable change which currently characterize patient-focused research. In more general terms, patient-focused research has come to be defined by several qualities. According to Sapyta, Riemer, and Bickman (2005), patient-focused research "attempts to improve practice by providing systematic information regarding individual patients' mental health status to clinicians" (p. 147). Lambert, Hansen, and Finch (2001) state that patient-focused research 1.) monitors the progress of individual patients over the course of therapy, 2.) provides feedback to interested parties such as therapists, supervisors, or case managers, and 3.) "attempts to answer the question, Is this particular treatment working for this patient?" (p. 159). Efficacy, effectiveness, and patient-focused research are complementary approaches and are similar in that they all require careful planning and have the ultimate goal of improving patient care and facilitating efficient use of therapy. They differ in the directions from which they approach this goal. Efficacy and effectiveness research, the mainstays of the evidence-based practice movement, characterize a more top-down approach in which stringently controlled studies are used to develop a body of more standardized and research-based treatments with the hope that this will make treatment more efficient or effective in general. Patient-focused research takes a more bottom-up approach to addressing quality management, by using patient-specific information, gathered by efficient means, to enhance the treatment outcome of individual patients. While patient-focused research can be carried out on a smaller scale to simply enhance psychotherapy, the majority of the patient-focused research reported on in journal articles is conducted as a part of outcome measurement or quality management
9 systems that have been developed and implemented in various settings. These systems represent "a win-win proposition for researchers, clinicians, health care organizations and patients who share the goal of optimal patient improvement as well as cost effective services" (Lambert, 2001a, p. 148). Clinical Versus Statistical Prediction It could be argued that the primary question targeted by patient-focused research, which is whether or not the current treatment is working for the individual patient, would be best answered by the clinician working with that patient. The comparative accuracy of clinical versus statistical prediction has been debated in the psychological literature since the 1920s (Grove & Lloyd, 2006). The subject of the debate is not the source of information, but rather the best method for drawing conclusions from a given set of data, regardless of the type or source of the data, a distinction first clarified by Meehl (1954/1996) in his seminal work on the topic and further elaborated on by Dawes, Faust, and Meehl (1989). Clinical prediction in this sense refers to the informal, flexible, subjective process whereby which expert judges combine data to arrive at specific conclusions or predictions. Statistical prediction, also sometimes called mechanical, actuarial, or algorithmic prediction, refers to a formal process by which a set equation or formula or other objective method is used to combine data and arrive at one of a set of predetermined conclusions or predictions based on the nature of that data (Grove & Meehl, 1996). According to two frequently cited reviews on the topic, the research is clear that statistical prediction achieves better overall rates of accuracy than clinical prediction (Grove, Zald, Lebow, Snitz, & Nelson, 2000), and has been clear since the mid-1950s
10 when Meehl (1954/1996) first published his book dissecting the intricacies of the issue. Even so, the topic is still the subject of research today. Meehl (1986) has postulated "there is no controversy in social science that shows such a large body of qualitatively diverse studies coming out so uniformly in the same direction as this one" (p. 373-374). Given the weight of the evidence in favor of statistical prediction, surprisingly little effort has been made by the field of clinical psychology to incorporate statistical prediction into the routine decisions of clinical practice, as clinicians still intuitively favor their own predictions over those generated by an algorithm (Dawes et al., 1989; Hannan et al., 2005; Meehl, 1986). One of the main arguments in favor of clinical prediction has been labeled the "broken leg case" by Meehl (as cited in Grove & Lloyd, 2006, p. 193), in reference to an example of a man who goes to the movies once a week. Statistical methods would predict that the likelihood of his going to the movies during any given week would be quite high. The accuracy of this prediction depends on all factors not accounted for in the statistical formula remaining the same. If the man were to break his leg and be unable to sit comfortably in the theater, statistical methods would not take this new, idiographic information into account. A clinician assessing the case would be able to take this new data into consideration and thus would provide a more accurate prediction that, despite his previous habits, the man would likely not go to the movies for a few weeks. The researchers who build the algorithms which underlie statistical predictions cannot foresee all circumstances, and data cannot be gathered on everything, so these algorithms are unable to take extraneous circumstances, such as broken legs, into account (Grove & Lloyd, 2006). Meehl (1954/1996) has countered that while the predictions of
11 clinicians might be more accurate in these cases, they are by definition the exception rather than the rule, and statistical prediction is still more accurate overall. While clinicians can override statistical predictions in making real-world case decisions, Meehl cautioned against this unless the extraneous evidence is very strong, due to the higher accuracy rate of statistical methods. As explained by Grove and Lloyd (2006), some clinicians and even researchers view the distinction between clinical and statistical prediction as artificial, their reasoning being that the methods can be used to complement each other, (e.g., Holt, 1970). Meehl (1986) has asserted this is an untenable position, as the two approaches can disagree with each other, sometimes quite often. In these situations it is not possible to act on the basis of both, and Meehl writes that it would be absurd to not act in accordance with the method that has been repeatedly shown to be more accurate. Not all researchers agree with the conclusions and views of Meehl (1954/1996) and Grove and Lloyd (2006). Holt (1970) takes issue with the way in which studies on this topic have been dichotomized in reviews as involving either clinical or statistical prediction, and views the differentiation between methods of data collection versus methods of data combination for prediction as artificial. Holt outlined six stages at which clinical judgment could be involved in the predictive process, and observed that in reviews such as that of Meehl (1954/1996), if the majority of a study each required "a high level of trained judgment" (Holt, 1970, p. 340) but in the final stage "the processed data are combined according to any uniform procedure, even when the rules are not actuarially derived" (p. 340), the study was classified as a statistical prediction study.
12 Holt saw this classification method as artificially decreasing the number of good studies on clinical prediction, and artificially inflating the successes of statistical prediction. Holt (1970) also discussed common flaws in research on this topic, including the use of un-cross-validated formulas resulting in criterion contamination, the use of inadequate or premature measures of criteria, failure to distinguish between the predictions of unqualified judges and experienced clinicians, too few subjects to draw conclusions or support generalizations, and use of quantitative data for which clinical judgment is relatively inconsequential (such as making predictions about historical events or personal characteristics or future behaviors on the basis of MMPI profile data only). Holt observed that no studies on this topic had yet been conducted to which he would not have objections. Despite this dissatisfaction with the research methods of these studies, Holt also stated "I should not be at all surprised if statistical predictions [when studied correctly] were as good as the clinical ones, or possibly even better" (p. 341). Holt (1970) concluded his article by reaffirming the value of clinical psychology. His argument appears not to be that statistical prediction is necessarily any less accurate than portrayed by Meehl (1954/1996), but rather that the role of the clinician in statistical prediction was underemphasized. This view has been more recently asserted by Westen and Weinberger (2004), who stated "there is no substitute for clinical experience in generating hypotheses and devising clinically relevant items for use in research" (p. 599). Not all comparisons between clinical and statistical prediction are fair. Studies have examined the prediction of all sorts of things, some of which clinicians could be expected to have some experience with and expertise at predicting, such as suicide (e.g., Lemerond, 1977) or psychiatric diagnosis (e.g., Goldberg, 1965). Other studies have
13 compared predictions in areas in which clinicians have little training or experience, such as successful completion of military training (e.g., Bobbitt & Newman, 1944) or occupational choice (e.g., Webb, Hultgen, & Craddick, 1975). If what is being predicted is outside the realm of the clinician's expertise (or the clinical predictor is not even a clinician but rather a layperson or college student), then it can hardly be said it is a fair test of clinical prediction. As pointed out by Westen and Weinberger (2004), clinicians could be expected to do best at using available data to answer questions they routinely face, such as aspects of client personality or symptoms. While the question of treatment outcome might initially seem like an issue clinicians would deal with regularly, and therefore could be expected to have expertise in predicting, this may actually not be the case. Hannan et al. (2005) found that in a sample 550 patients, only 3 were predicted by their clinicians to deteriorate over the course of treatment, and less than 50 were predicted to have no change. For their sample, 40 patients deteriorated and nearly 300 experienced no change. More than half of the majority of patients predicted by their therapists to have positive treatment outcomes ended therapy with no significant change. Hannan et al. concluded "therapists tend to over-predict improvement and fail to recognize clients who worsen during therapy" (p. 161). It may be that in order to ethically provide treatment, clinicians must honestly believe the treatment they provide is either helpful to the patient or at least very likely to be helpful. This necessary bias would prevent clinicians from being able to clearly and accurately predict treatment failure, as evidenced by the work of Hannan et al., who concluded that therapists "simply did not anticipate negative treatment outcomes" (p. 161).