On December 10, the PEER Committee released a report titled “The Early Learning Collaborative Act of 2013: Evaluation of the Operations and Effectiveness of the Program” about Mississippi’s new state-funded preK collaboratives. This report purported to evaluate several aspects of the implementation of the preK law by the Mississippi Department of Education and to provide a performance evaluation of the effectiveness of the first year of the program.
Mississippi First is a strong advocate for preK and was heavily involved in the passage of the Early Learning Collaborative Act of 2013, which established Mississippi’s state-funded preK program. Since that time, we have been providing assistance to the Mississippi Department of Education and the 11 funded collaboratives to support quality implementation. For these reasons, the PEER report was of great interest to us. Mississippi First supports research-based decision-making and performance evaluation. We believe in being honest and transparent about data and letting it drive a cycle of continuous improvement. Research tells us that preK, when done right, works. Our goal with our work on Mississippi’s preK program is to ensure the highest-quality implementation of the highest-quality standards.
For over two decades, PEER has built a reputation for fairness in the highly politicized environment of state government. As a result, we take the findings of any PEER report seriously and carefully analyze them to inform our work. Upon reading the preK report, however, our interest quickly turned to dismay and then bafflement as to how a report with analyses so lacking in scientific validity could have been released under PEER’s name. Contrary to PEER’s reputation, this report makes so little sense that no useful or reliable information can be gleaned from it about either program effectiveness or implementation quality.
The problems underlying both the methodology and the conclusions of this report are myriad. In the interests of brevity, we will focus only on PEER’s analysis purporting to show that collaboratives underperformed other public school preK classrooms by 6% and that this underperformance was statistically significant—large enough statistically to be a true finding, rather than the result of random variation.
PEER’s conclusion that collaboratives underperformed non-collaboratives is based on a statistical analysis (although calling it an analysis is generous) PEER performed using scores from the 2014-2015 Mississippi K-3 Assessment Support System (MKAS2). MKAS2 is assessment that serves as both a Kindergarten-entry and third grade-exit for literacy skills. Four-year-olds in all public school preK classrooms and collaboratives took the MKAS2 in Fall 2014 and in Spring 2015. If a child only took the test once, PEER excluded that child from the analysis. PEER then established two groups for comparison: 1) all the students in collaboratives and 2) all the students in non-collaborative public school preK classrooms. To compare these groups, PEER devised what they call an “adjusted-pass-rate” for each group (see page 24 of the report as well as the Technical Appendix beginning on page 37). Four-year-olds who reach a benchmark of 498 on the MKAS2 in the spring of their four-year-old year are considered “on-track” to be Kindergarten ready. PEER’s “adjusted-pass-rate” is the difference between the percentage of children who met the 498 preK benchmark in Spring 2015 and those who already met that benchmark when they entered preK. In other words, this number is the percentage of preK students who were below the benchmark at the beginning of preK but were able to meet or exceed the benchmark at the end of preK, and it excludes all the children in both groups who met or exceeded the benchmark both times. PEER compared these two “adjusted-pass-rates” (percentages of children who changed from not met to met) and concluded that 6% more children in non-collaborative classes changed from not met to met than children in collaborative classes. It is important to note that this 6% difference is not a 6-percentage point difference; PEER does not tell us what the actual percentage point difference was. Finally, PEER states that this 6% finding is statistically significant. PEER did not provide any summary data tables to support these claims and stated that the federal Family Educational Rights and Privacy Act (FERPA) prevented it from providing even basic information about the data used in its analysis (“Regarding this appendix, much of the detail that would normally be present in a technical appendix to a report of this nature has been omitted in order to comply with requirements of the Family Educational Rights and Privacy Act regarding personally identifiable information” (page 37)).
Critique of PEER’s Analysis
From a layman’s perspective, this methodology might not sound that bad, but to the trained eye, it essentially amounts to fun with numbers. We highlight below the two most fundamental problems we see in the methodology.
- Not an Apples-to-Apples Comparison—One of the most fundamental rules in statistical analysis is that researchers must ensure that two groups being compared are statistically indistinguishable in any aspect that might affect the outcome of what is being studied. Researchers often go to great lengths to ensure this is the case, whether through randomization or by using statistical controls. A credible study reports how researchers ensured that two comparison groups are equivalent and provides summary statistics of the two groups to prove this point. In this case, PEER needed to ensure that the four-year-olds in both groups—those in the collaboratives and those in the comparison non-collaboratives—were similar in any respect that might influence the outcome. At the very least, PEER needed to provide a detailed defense of why their methodology eliminates the need for the two groups to be similar by providing summary statistics about the two groups and then showing that the results are not impacted by any variations in the student population. They also needed to show that excluding children with only one test score did not bias the results for either group. Not only does PEER not do any of this, they state that any preparation or analysis of the data set beyond eliminating students with only one test score was unnecessary (page 39). Frankly, this stance is without scientific justification.
- In footnote 13, PEER does say that they tried to estimate what the poverty rates would be for the two groups by looking at Census data in order to assure that the two programs did not “serve socioeconomically distinct populations” (footnote 13 on page 39). This method of ensuring that the two groups were the same in terms of poverty is almost comically bad. If you investigate the Census data that PEER says it relies on, you find that the most specific data PEER could have used are the Small Area Income and Poverty Estimates, which provide an estimate of poverty for children ages 5-17 in a school district (this would include all children living in the confines of the district, not just ones who actually attend public schools). The Census describes in detail that the room for error in these estimates for small school districts is extremely large: the error for a school district with a population of less than 2,500 (which describes many districts in Mississippi) is greater than 100%, meaning that the true poverty percentage could be 100% greater or smaller than the estimate. Take Coahoma County School District, for example, which the Census estimates has a poverty rate of 49.7% for children ages 5-17. An error rate of greater than 100% means that the actual poverty rate for children in Coahoma County may be 100% or may be 0%. For school districts with a child population between 2,500-5,000, the error rate is 69%. PEER took these extremely biased estimates and then created a weighted average for the collaboratives since many of them span multiple school districts. Keep in mind that preK serves four-year-old children, who are not even included in these estimates, and only a fraction of four-year-olds are served by any program. Even if we could be far more confident that these Census estimates were precise, there is still the issue of bias surrounding which children actually make it in to the limited number of public preK spots in any given district, not to mention the fact that the children served in collaborative classrooms—which include Head Start classrooms and licensed childcare classrooms—could very well be more impoverished than students in public school district classrooms. The long and the short of it is that there is so much room for error in this method that we wonder why PEER attempted it at all. The only credible way to assess whether the two populations are truly similar in terms of poverty is to request student-level data from the Mississippi Department of Education and then to calculate the average poverty rate of each group.
- Method of Comparison Obscures True Effectiveness—PEER’s use of the percent of students who changed from not met to met obscures how much each group of students had to improve in order to meet the benchmark. Imagine for example that 50% of students in Program A changed from not met to met while only 10% of students in Program B did. While Program A might look more effective on the surface, if we also knew that students in Program A started much, much closer to the benchmark and then barely stepped over it, on average, whereas students in Program B started extremely far from the benchmark and just barely missed it, on average, our view of the effectiveness of the two programs would change. Since PEER does not provide any summary statistics about the initial or final average scale scores of both groups, we have no way of assessing whether this method distorts the reality of situation. PEER argues that it did not use gain scores—the amount each group grew, on average, in terms of scale scores on MKAS2—because they could not be confident in the “interval data properties” of the MKAS2 (page 26). This means, for example, that PEER claims it did not know whether a 10-point change on MKAS2 from a 400 to a 410 is the same change as a 490 to a 500. We do not know why they did not ask either the Mississippi Department of Education or Renaissance Learning, the test vendor, for confirmation of a consistent data scale in order to use growth scores or even what their “theoretical doubts” about the assessment are.
These flaws make it impossible to ascertain whether students in collaboratives performed as well, better, or worse than similar students in non-collaborative public classrooms or to ascertain to what any differences in outcomes might be attributed. While PEER acknowledges that a better research design is needed, the acknowledgment does not prevent them from concluding, ”The average performance of students in non-collaborative publicly funded prekindergarten programs was better than the average performance of students in collaborative prekindergarten programs by a statistically significant amount” (page 30).
Dubious FERPA Claims
In addition to these deep flaws, PEER’s use of FERPA as a shield against any outside review of its analysis is bizarre and absurd. FERPA protects the disclosure of individual student data to the public. For PEER to have completed this analysis, it was not necessary to have student names or other personally identifiable information. We doubt PEER itself had access to any personally identifiable information; certainly PEER’s methodology did not even attempt to control for any student characteristics using student-level data. It is dangerous and disingenuous for any actor of state government to conceal the details of a questionable analysis of thousands of data points with FERPA. One of the deepest held values in science is the belief in peer review and replicability—science is not considered credible if outside experts cannot review or replicate the results of a study or analysis using the same methodology, subjects, or data set. PEER’s reluctance to share the details of their work and data sets leaves us to wonder whether PEER already knows that it would not withstand scrutiny by other professionals.
National Experts Join Critique
Despite a complete lack of transparency in its work, PEER states in its response to MDE that “Regarding MDE’s concerns about ‘technical issues,’ PEER is confident that its findings would stand up in the court of expert judgment without additional defense” (page 54 of the report). We decided to test that assertion by sending the report to several national researchers from across the ideological spectrum. These researchers confirmed our concerns about the methodology, and the National Institute for Early Education Research (NIEER) was kind enough to write a two-page summary of the major flaws of the report, even though we asked for a very quick turnaround. (Other researchers told us that they would need more time to craft a formal response before they could comment on the record, but we encourage reporters to reach out to credible, trained statisticians and ask them to analyze the validity of the methodology.)
Report is Premature by Program Evaluation Standards and by Law
We have spent quite some time in this office pondering the “why” of this report. Why risk PEER’s reputation with such an objectively bad analysis? Why do it with preK? Why now? PEER writes in its report that it was required to create this performance evaluation report, despite the passage of only a year of program implementation. What the law actually states is that the Mississippi Department of Education must provide an annual report to the Governor and the Legislature and that “the PEER Committee shall review those reports and other program data and submit an independent evaluation of program operation and effectiveness to the Legislature and the Governor on or before October 1 of the calendar year before the beginning of the next phased-in period of funding” [emphasis added] (Miss. Code Ann. § 37-21-51(3)(g)). The law is very clear as to what a funding “phase” is: “Each phase shall last for at least three (3) years but no more than five (5) years. The State Department of Education shall determine when to move to a new phase of the program, within the timeline provided herein” (Miss. Code Ann. § 37-21-51(3)(h)(3)(ii)). (We know that PEER knows this because they discuss it on pages 7 and 8 of their report.) In a report that expends pages of ink on what the Mississippi Department of Education’s duty is under the law, we have no idea why PEER so gravely misinterpreted its own duty in the law. Programs need at least three years, if not five, to find their footing before formal performance evaluation will yield useful information. PEER’s evaluation of this program after only one year is not only bad practice but also in no way legally required.
Again, we have to ask why PEER felt that this ill-conceived report was worthy of publication. We have no definitive answers to this question and hesitate to speculate publicly. We hope that whatever gave rise to this embarrassing effort is a momentary lapse and not the start of a new trend at PEER. Mississippians need and deserve an impartial government watchdog staffed with professionals competent in performance evaluation. While we are sure that such professionals exist at PEER, we are sad that they did not prevent this report from being published in this form or at this time.
 Note that the Census specifically states that you cannot use these data to determine a true “rate” for poverty because the data sets that are needed for the numerator and denominator refer to slightly different sets of children. We can only assume that PEER disregarded this note so we have done so here for illustrative purposes.