Understanding Cut Scores: How Educators Can Use a Cut-Score to Predict Student Success

ConnectedMTSS
Sep 25
7 min read

It’s been an exciting start to the year. Since last year, literacy practices and the science of reading frameworks have continued to develop for many districts. States continue to establish laws to guarantee that high-quality reading instruction is in place now or very soon. However, when we hear about student performance, there has also been not-so-great news. NAEP scores were lower in math, reading, and science for older students (eighth and twelfth-grade students tested). We will have to see how the next round of NAEP scores turn out in the future.

Many of us also have newly normed assessments that were released this year. In the past, norms had been collected before COVID, and some of us were dealing with comparisons that seemed a bit steep for our students. Now, I hear more about how every student looks average or above average, which we were concerned about last year. It appears that many more students scored in the average range this fall, which is a sharp increase from the past. Why is there a difference in the new screening results?

Cut Scores

With many assessments administered in schools today, one of the most important tools for guiding instruction and intervention is the cut score—a statistical point on a test scale that helps predict whether a student is likely to meet proficiency standards. These scores are essential for identifying students who may need additional support and for making informed decisions about curriculum and instruction. But how are these cut scores determined?

Researchers and assessment designers use several statistical methods to develop cut scores that are both accurate and meaningful. Four of the most widely recognized approaches are Discriminant Analysis (DA), Logistic Regression (LR), Receiver Operating Characteristic Curve Analysis (ROC), and the Equipercentile Method (EM). Each method provides a distinct approach to interpreting student data and forecasting performance, usually on a validation measure such as an end-of-year state test.

Discriminant Analysis (DA) works by classifying students—such as proficient or not proficient—based on probabilities. It aims to maximize correct classifications (true positives and true negatives), though some misclassification is inevitable. This method helps educators identify which students may benefit from more intensive instruction.

Logistic Regression (LR) is used when the outcome is binary (e.g., pass/fail). It predicts the likelihood that a student belongs to a particular group, focusing on identifying those at risk of scoring below proficiency. While effective, it can sometimes over-identify students as at-risk, leading to unnecessary interventions.

Receiver Operating Characteristic Curve Analysis (ROC) evaluates the accuracy of different cut scores by plotting sensitivity (true positive rate) and specificity (true negative rate). This method allows educators to choose a score that offers the best balance between identifying at-risk students and minimizing false positives, often using a sensitivity threshold of 0.9.

The Equipercentile Method (EM) compares scores from different assessments by aligning percentiles. This approach is especially useful in “linking studies,” where results from a fall screening test are matched with spring state assessments. EM enables educators to interpret scores across different tests and make consistent predictions about student performance.

Each of these methods plays a critical role in helping schools use assessment data to support student learning. However, considerable data and knowledge are needed to develop cut scores that educators can use with confidence, accuracy, and efficiency. These measures use a validation measure such as a state test where retrospective looks back allow researchers to look at how fall screeners predicted spring state test performance.

Recently, states and districts have taken up efforts to improve literacy instruction, including evidence-based instruction and intervention. Concurrently, some test publishers have completed renorming studies of screening assessments that districts have used for several years. The new curricula often pose challenges for educators as they learn the content and routines. New norms have also provided challenges, where students who are performing similarly to the previous year now appear to be performing closer to the average range. When districts use a percentile cut-score, students who, for example, score below the 25th percentile are now few and far between. However, students continued to exhibit difficulties with decoding, fluency, or even comprehension. The previous cut-scores are not sufficient to identify students whom teachers previously detected reading difficulties.

Local vs. Publisher-Created Cut Scores: Why Context Matters

While test publishers often provide cut scores based on research and national norms, educators must consider whether these scores accurately reflect their local student population. Using vendor-provided scores can be convenient, but they may not always predict student outcomes effectively in every context. Experts recommend cross-validating these scores to ensure they truly identify students who are on track for proficiency or in need of intervention.

Some districts rely on normative scores, such as the 25th percentile to flag students at risk or the 50th percentile to indicate proficiency. These scores offer consistency and simplicity, especially when resources for local analysis are limited. However, locally developed cut scores—created using research-based methods—can offer greater precision. There is evidence that locally derived scores may better predict student performance on year-end state tests and maintain higher sensitivity and specificity over time (Nelson et al., 2017).

It’s also important to recognize that cut scores reduce continuous data into binary categories (e.g., proficient vs. not proficient), which can oversimplify student performance. A student scoring just above or below the threshold may be grouped with others whose needs differ significantly. Therefore, while cut scores are useful tools, educators should interpret them thoughtfully and consider the broader context of each student’s learning needs.

What would I do if I ruled the educational world?

How would I respond if asked about the problem where previously the 25th percentile was a threshold for identifying students who may need more intensive instruction (e.g., intervention), but new norms identify far fewer students? Keeping in mind, no cut-score is ever perfect.

Change from Percentile Cut Scores. Move from percentile-based cut scores to actual scores. Percentile ranks will change with new norms, and it can already be seen that the use of an arbitrary percentile rank will lead to a wide range of performance between grades. The 25th in 1st grade will be different from the 25th in 3rd, and so on. (not my idea, VanDerHeyden, pick a year).
Examine your data (how does screening compare to state test outcomes?): How many students typically receive intervention each year? Does your system have the capacity to continue servicing that number of students? Do those students show improvement? Look at where your distribution in each grade shows a need for more instruction.
- (CAUTION: Do not set a threshold below the state requirements for reading intervention/this is my “this is hot” label, as in fast-food coffee containers)
- What is the approximate score that students who need intervention earn, and what percentage of the grade is that?
- How did your students perform on the state test? How many were proficient?
- Select cut scores that net you similar percentages of students who were above or below proficiency on the state test last spring. Go back and analyze prior years so you have a full year to see. Often, this is relatively stable.
Select Cut Scores for EACH grade level. Use actual scores for each grade level. If you have a data system that can efficiently color-code the scores or bands, adopt something like this:
- Above the cut- green
- Below the cut- orange (not that each student below requires intervention)
  - Review patterns of performance or previous interventions (one score is not enough)
- Some districts align their performance bands to state test performance bands using scores (advanced, accelerated, proficient, basic, below basic) - color code if you do
- I did not recommend the 4 methods of cut-score development: partner with a university for that. I’m not a researcher, but I contend you can get pretty close with meaningful cut-scores using slightly better than back-of-the-envelope calculation.
Road Test and ask for feedback: Try out the cut scores for a year, calculate the number identified, check progress for students above and below the cut-score.
- Color Coding: If you color-code your data, you can more easily refine/change cut-scores, and people will not lose their minds.
- Ask for feedback! Ask your Lit Coaches, Ask your School Psychologists, Ask your Gen Ed Teachers. How is this working?

As districts refine literacy instruction and adopt newly normed assessments, educators face challenges in accurately identifying students in need of support. There are research-based methods to create cut-scores highlighted above. If a school has the expertise and know-how, give a research-based method a shot.

For the rest of us mortals, consider not relying solely on percentile ranks, especially with shifting norms, and start to look for locally derived cut scores tailored to district-specific data and needs. By examining actual student performance and intervention outcomes, educators can set meaningful thresholds that better reflect their students’ needs. Ultimately, thoughtful use of cut scores—grounded in context and data—can enhance decision-making and improve educational outcomes.

Much of this was written for a lit review in a dissertation that maybe 3 people read, and 1-2 read well. With a bit of review and AI, establishing data-based decision rules continues to make sense and would help educators. A few years ago, I thought I had completed a fairly useless project, and there would be no need to apply any of the methods since the world was embracing SoR and states were providing cut-scores for districts. However, change is a constant, and now, using vendor or state cut-scores without applying district or school context leads to unreliable decision-making. I continue to say, if I ruled the educational world, I’d make it my goal to apply research-based methods to look at what “should” work and then, using the methods, look at how well what was done did “work.” There is much more to establishing data-based decision-making frameworks. Starting with a well-developed cut score for each grade could help schools or districts to make reliable decisions about which students are on track or possibly off track.

Chris Birr
School Psychologist & RtI/MTSS Consultant

Understanding Cut Scores: How Educators Can Use a Cut-Score to Predict Student Success

Recent Posts

Comments

Chris Birr School Psychologist & RtI/MTSS Consultant

Comments

Chris Birr
School Psychologist & RtI/MTSS Consultant