Failing the Test

by Fred Smith, retired administrative analyst with the New York City public school system, with Robin Jacobowitz, Director of Education Projects at the Benjamin Center

It’s that time of year again.

This week, approximately 1.2 million children in grades 3-8 sat for the annual New York State tests in English Language Arts (ELA). Math exams will be given in early May.

The State Education Department (SED) has been testing students in reading and math for decades. But in 2013, SED began administration of Common Core-aligned tests. In 2011, NCS Pearson, Inc. was awarded a five-year contract to develop these exams. Pearson received $38.8 million for its work.

From the outset, some parents and educators questioned the value and impact of Common Core-based testing. Parents and teaching professionals were concerned about the ambiguity and inappropriateness of the questions, the length of the assessment, the frustrating experiences English Language Learners and students with disabilities had with the exams, and the lack of transparency that thwarted scrutiny of the testing program. There was particular concern about the developmental appropriateness of the reading passages and items used to assess eight- and nine-year-old students in grades 3 and 4.

Initially, these complaints were dismissed by officials as unfounded, the scattered griping of overprotective parents or a sign of low expectations for children. But eventually the Education Department made some adjustments in its program – it shortened tests by one or two questions, removed time limits and, this year, testing will take place over four days instead of six.

Still, after several years of implementation, it is fair to investigate the quality of this ongoing program, which targets more than one million students each year and costs taxpayers millions of dollars. Student performance on these instruments is widely reported and commented on. We need to flip the accountability question and now ask, “How did the tests perform?”

In a released report, Fred Smith, retired administrative analyst with the New York City public school system, in partnership with the Benjamin Center, explores the efficacy of the testing program by examining the constructed response questions (CRQs) on the ELA tests, with a focus on zero scores. See the entire series, linked at right, where you can delve into multiple aspects of the report, ranging from how the tests fail minority students, to how they harm the youngest children.

A BenCen Series
New York State’s
Testing Failure

Your taxes are paying for a deeply flawed testing system. This series looks at those flaws — and at fixes.

These questions require students to provide written answers which can earn from zero to two (0-2) or zero to four (0-4) points, as judged by trained scorers.

Why zeroes? Because a zero on a CRQ reflects a student’s complete inability to cope with the test material. According to the test scoring rubrics, a zero is given to an answer that is “totally inaccurate,” “unintelligible,” or “indecipherable.” Keep in mind that a student can get a score of one (1) for partial answers and even incomplete sentences. A zero (0), then, represents an irrelevant or incomprehensible answer. We studied how the percentage of zero scores changed with the advent of Common Core-aligned assessments.

A snapshot of our key findings shows:

A steep increase in the percentage of students receiving zeroes on the CRQs in 2013 when the CC-aligned tests debuted

Particularly sharp increases, sustained over time, in the percentage of zeroes for students in Grades 3 and 4 and for English Language Learners and students with disabilities

Higher zero scores for Black and Hispanic students compared to White and Asian students

Let’s look at some data we obtained from SED after a protracted FOIL process and more readily from the New York City Department of Education for grades 3-8, 2012-2016.

Figure 1 shows the average percentage of zero scores New York State students received on the CRQs.

Sharp increases in zeroes occur in all grades from 2012 (pre-Common Core alignment) to 2013 (Common Core alignment). Grade 3 shows a jump from 11 percent in 2012 to 18 percent in 2013. This increase is sustained in 2014-2016, reaching up to 22 percent in 2016. And the steepest rise occurs in grade 4, where the zeroes go from 5 percent in 2012 to 16 percent in 2015. The increase is also sustained, general, through 2016.

The percentage of zeroes fluctuates for grades 5-7 and declines gradually for grade 8 from the initial 2013 surge. But for all grades in all years, the percentage of zero scores post-Common Core alignment (2013-2016) remains well above the 2012 percentage.

In order to understand the magnitude of these percentages it is helpful to see them in terms of the number of children they represent, not just as statistical abstractions. In 2013, the number of students receiving at least one zero score jumped to 168,802. That’s 168,802 children that year, aged 8 to 13, who were so befuddled by a question that their responses were “inaccurate, unintelligible, incoherent” — more than enough children to fill the seats in Madison Square Garden eight times!

New York City’s test population includes approximately 440,000 students. Citywide data allowed us to break down the results by subgroups (English Language Learners (ELLs) and students with disabilities (SWDs)) and by race. Table 2 shows the average number of zero scores students in each group received by grade level. The number of zeroes was tallied for every child on each CRQ and the mean was taken.

As New York City public school students comprise 37 percent of the state’s test population, it is not surprising that these results parallel those of the state. In Table 2, we see a sweeping increase in the average number of zeroes from 2012-2013. But let’s take a closer look at the subgroups. In grade 4 for example, the average for ELLs rose from 1.0 to 3.3 zeroes and from 1.1 to 3.4 for students with disabilities. The average for Black and Hispanic students went from 0.4 to 1.9 and from 0.5 to 2.0, respectively, an increase of 1.5 in both groups. The zero scores for their White and Asian counterparts increased from 0.2 and 0.3 to 1.0, respectively.

These averages may seem small, but remember — there were only nine or ten CRQs per exam. In that context, three zeroes means students could not respond intelligibly to at least 30 percent of the questions.

Let’s take a closer look at the achievement gap, particularly for students in grades 3 and 4, who struggled the most with these exams.

Grade 3: In 2012, there was a difference of .5 zeroes (1.2 minus .7) between Black and White 3^rd graders and a difference of .6 (1.3 minus .7) between Hispanic and White children. After the 2015 ELA, the distance between the groups was 1.0 (2.2 – 1.2) and 1.1 (2.3 – 1.2) respectively. So, after the third administration of the CC-aligned ELA exam, the gap in the percentage of zero scores grew by an average of .5 zeroes.

Grade 4: Similarly, in 2012, the Black/White gap was an average of .2 zeroes, and .3 for Hispanic and White students. The achievement gap between both groups and Whites was .9 in 2015, expanding it by an average of .7 and .6 zeroes.

Where does this leave us?

The data show that there has been an increase in the percentage of zero scores since the administration of exams aligned with the Common Core. We anticipate that officials will claim this outcome to be the consequence of tougher standards reflected by more rigorous exams.

We argue that those assertions are insufficient explanations for what we found. Recall that a zero score indicates an unintelligible or incoherent answer. Certainly, some zeroes are to be expected. But the percentage of zero scores, particularly for students in grades 3 and 4, is unreasonable in our view. With so many answers deemed “incomprehensible, incoherent, or irrelevant,” we must ask whether such a program yielded any valuable information at all about our youngest students, as the testing was purported to do. The failure here is much more likely in the questions themselves and in the belief that it was acceptable to ask eight- and nine-year-olds to sit and take long exams over several days. That the data also indicate a widening achievement gaps cannot be ignored.

Further evidence of flawed testing can be noted in the decline of zeros in 2016 — when the SED removed time limits — from the surge in 2013, for most grades. After three years of CC-aligned testing, the SED acknowledged that the time constraints imposed by the tests were an issue. This, in itself, is an after-the-fact admission that the tests were poorly developed, as test administration procedures, including timing, should be resolved as part of the test-development process before tests become operational.

In taking stock of the testing program we must return to the fears and doubts that were expressed by a small number of people early on. Were New York State’s CC-aligned tests appropriate measures? Would they have a negative impact on students, especially the most vulnerable?

The analyses and findings in this report vindicate these early concerns and give empirical grounding to the opt-out movement that grew to an astounding 20 percent of the test population between 2013 and 2015. Specifically, our findings raise questions about the efficacy of this kind of testing, particularly for our youngest students. They also open a needed discussion about the quality of Pearson’s work, the worth of its product, and SED’s judgment in managing the program.

The federal Every Student Succeeds Act (ESSA) dictates that we test our young students in math and ELA each year. We have a responsibility to determine how to do that in a way that serves children and educational goals. New York State’s testing program did neither.

The complete report, which will be released in the coming weeks, will present additional analyses that reinforce the patterns we observed. We also found:

The number of omissions where students left out or did not reach a question increased on the core-based tests

A high number/percentage of students got zeroes on 5 (= half) or more of the CRQs; these results follow the demographic tendencies unveiled above

Specific items that stood out as particularly problematic based on the evidence we uncovered

Hi Richard. Thank you for your interest in our work. A zero is an actual score that is given to a response that is recorded in a test booklet. Scorers who are trained in using the exam’s rubrics make a judgment about the character of the response and assign a value to it–which can be zero. Zero scores are given to responses that are deemed to be “totally inaccurate, unintelligible, or indecipherable.” We believe that zero scores are a reflection of a student’s complete inability to cope with the test material.

A blank is indeterminate. It shows up in the data as an empty cell. We don’t know if the student was unwilling to answer or, if in fact, the student never reached the question or perhaps inadvertently left out a response under duress. Blanks are a residual that occur far less frequently than zeroes. Like zeroes, they add no points to a student’s test results.

We recognize blanks (we call them omissions) as a separate phenomenon from zero scores and attend to them in our forthcoming report. We would love to put you on our mailing list; is there a way that we can contact you?

Thanks again for your interest in our work.

4 Comments

readdoctor
April 14, 2018 at 9:45 pm

A zero score also indicates this is at the frustration level, that level in which no useful information can be discovered. If the assessment had some diagnostic value, the assessor would stop it there, and go down until they found a level at which the student could handle the material, and thus gather useful information.
However, there is no diagnostic use for these assessments, and if there is no diagnostic use for your assessment then you should not be giving it.

I would point my finger at our policy makers, the administrative leadership, and the legislators for the lack of efficacy. These are the people who should have protected our children. These are the people who should have asked the tough questions. These are people who have also turned their heads everytime Pearson made a campaign contribution, or gave away some perk.
I also say our silence and apathy empowers this lack of efficacy. We too, hold some blame.
Opt Out,
Walk Out,
Stand up,
Speak up,
Do something!

Richard P. Phelps (@RichardPPhelps)
April 22, 2018 at 1:43 pm

“Because a zero on a CRQ reflects a student’s complete inability to cope with the test material. ”
Inability or, I would guess just as likely, unwillingness. A blank answer gets a zero, too, no?

- Robin Jacobowitz (Post author)
  April 23, 2018 at 4:12 pm
  
  Hi Richard. Thank you for your interest in our work. A zero is an actual score that is given to a response that is recorded in a test booklet. Scorers who are trained in using the exam’s rubrics make a judgment about the character of the response and assign a value to it–which can be zero. Zero scores are given to responses that are deemed to be “totally inaccurate, unintelligible, or indecipherable.” We believe that zero scores are a reflection of a student’s complete inability to cope with the test material.
  
  A blank is indeterminate. It shows up in the data as an empty cell. We don’t know if the student was unwilling to answer or, if in fact, the student never reached the question or perhaps inadvertently left out a response under duress. Blanks are a residual that occur far less frequently than zeroes. Like zeroes, they add no points to a student’s test results.
  
  We recognize blanks (we call them omissions) as a separate phenomenon from zero scores and attend to them in our forthcoming report. We would love to put you on our mailing list; is there a way that we can contact you?
  
  Thanks again for your interest in our work.
  
  - Richard Phelps
    July 26, 2018 at 6:36 pm
    
    Sorry, just now saw your response. Sure, I’d appreciate being on your mailing list. richard (at) nonpartisaneducation (dot) org
    
    Thanks, RP

The BenCen Blog

Informing Public Discourse in the Hudson Valley and Across the State

A BenCen Series
New York State’s
Testing Failure

4 Comments

Leave a Reply to readdoctor Cancel reply

Categories

Archived Posts

Search the Blog

Search the Blog

Tags

The BenCen Blog

Informing Public Discourse in the Hudson Valley and Across the State

Failing the Test

A BenCen Series New York State’s Testing Failure

Previous post

Next post

4 Comments

Leave a Reply to readdoctor Cancel reply

Subscribe by Email

Categories

Archived Posts

Search the Blog

Subscribe by Email

Search the Blog

Tags

A BenCen Series
New York State’s
Testing Failure