Call Now: (800) 537-1660  
The Algebra Buster
The Algebra Buster


May 24th









May 24th

The Myth of Objectivity in Mathematics Assessment

THE ADVANCED PLACEMENT
CALCULUS TEST


Advanced Placement calculus tests have been taken
by high school students for four decades. These tests
include multiple -choice items, the staple of standardized
tests, and a set of free-response questions for
which students must supply answers, show their
work, and explain their reasoning. This respected
measure of students’ knowledge of elementary calculus
is thus, in part, an alternative assessment.

The 1998 Advanced Placement Calculus AB test
contained the free-response question shown in figure
2.
Students’ solutions to free-response questions
such as this one are scored by at least two
readers
, who fol low an explicit set of guidelines for
as signing points and must agree on the score
assigned to each paper. The rubric used to score
this problem is shown in figure 3.

In this scoring rubric, the nine points allocated
for this problem are assigned as follows: two points
for finding the derivative implicitly and verifying it,

Consider the curve defined by 2y3 + 6x2y –12x2 + 6y = 1.

a) Show that



b) Write an equation of each horizontal tangent
line to the curve.

c) The line through the origin with slope –1 is
tangent to the curve at point P. Find the x-and
y- coordinates of point P.

Fig. 2
1998 Advanced Placement Calculus AB
free-response question 6
(Source: College Board)

four points for finding where the derivative has the
value of zero and verifying that the tangent lines
are horizontal there, and three points for using one
of two different specified approaches to find the
point of tangency of the line y = –x. Use this rubric
to score the work shown in figure 4.

On part (a), the student’s correct implicit differentiation
would earn two points. Setting the derivative
equal to 0 and, after a false start, solving for x
and y would
earn two more points for part (b).
Finally, in part (c), setting the derivative equal to
–1 is worth an additional point . The score for this
student would be five points out of a possible nine
points.

This example shows a consistent assessment
method. Unlike the previously discussed quadratic-equation
task, for which arguments could be made
for a wide range of scores, the Advanced Placement
calculus task itself, for which predictable routes to
the solution exist, combines with the rubric that
specifies the routes and assigns points, thereby
facilitating agreement on a single score.

How useful is this score? What does five points
out of nine mean on this task? How much of the calculus
that this task is meant to assess does this
student know? Will everyone who obtains a five-point
score on this problem know the same amount?
This student is clearly able to differentiate implicitly.
The student also seems to know that the derivative
is related to the slope of the tangent to the
curve at a point. Given the difficulty that this student
had in completing parts (b) and (c), any other
inferences about the student’s mathematical knowledge
would be difficult.

Another student who earned the same score for
parts (a) and (b) could have earned three points for
part (c) by successfully completing the first of the
two solution strategies outlined in the rubric. However,
that strategy makes no use of calculus. Therefore

a) Show that



b) Write an equation of each
horizontal tangent line to
the curve.

c) The line through the origin
with slope –1 is tangent to the
curve at point P. Find the x-and
y-coordinates of point P.

 

 

 

1: implicit differentiation
1: verifies expression for dy /dx

1: sets dy/dx = 0
1: solves dy/dx = 0
1: uses solutions for x to find equations
of horizontal tangent lines
1: verifies which solutions for y yield
equations of horizontal tangent
lines

1: y = –x
1: substitutes y = –x into equation of
curve
1: solves for x and y
or
1: sets dy/dx = –1
1: substitutes y = –x into dy/dx
1: solves for x and y
Fig. 3
Scoring rubric for Advanced Placement calculus free-response question
(Source: College Board)

a score of seven points out of nine could be
earned without furnishing any additional evidence
of understanding of calculus. To put these scores in
context, the average score of all 1998 Advanced
Placement Calculus AB test-takers on this item
was 2.86, and 80 percent of those test-takers scored
4 or lower (College Board).

As this example illustrates, the specificity required
for consistent scoring can reduce the usefulness of
the scores themselves. Taken together, these two
assessment examples show that, although consistency
is necessary, it is not sufficient to ensure that
assessment information is useful.

THE SAT-I MATHEMATICS TEST


The Scholastic Assessment Test (SAT) is a widely
used example of a standardized, norm-referenced
test. The test is administered under standardized
conditions, including the amount of time allotted
and the directions and resources provided for the
test-takers. The scores are norm-referenced: the
student is told how his or her performance compared
with that of a comparison group of students
who already took the test instead of being told how
many questions he or she answered correctly and
incorrectly.

The mean score on the SAT-I Mathematics test is
500, the standard deviation of scores is 100, and
the test items are chosen so that the scores of the
comparison group are approximately normally distributed,
as shown in figure 5 (Crocker and Algina

The
specificity
required for
consistent
scoring can
reduce the
usefulness of
the scores

a) Show that

b) Write an equation of each horizontal tangent
line to the curve.

c) The line through the origin with slope –1 is
tangent to the curve at point P. Find the x-and
y-coordinates of point P.

Fig. 4
Sample student work

1986). A student who receives a score of 600 on this
test actually earned a raw score that placed him or
her one standard deviation above the mean raw
score of the comparison group. That student scored
higher than about 84 percent of the students with
whom he or she is being compared.

Fig. 5
SAT-I Mathematics test-score distribution
(Source: Crocker and Algina 1986)

Suppose that student x scores 470 on the SAT-I
Mathematics test, whereas student y scores 530 on
the same test. What can you conclude about the
mathematical knowledge of these two students?
Most con sumers of these scores—the students
themselves, their parents or guardians, teachers
and administrators, college admissions officers, and
newspaper reporters—would be confident that student
y knows more. What is the meaning of these
two scores? To answer this question, understanding
how these tests are designed is important.

The creators of such tests as the SAT base their
work on the assumption that students x and y each
possess a certain amount of knowledge, ability, or
(for the SAT) potential to succeed in the first year of
college. If they could ask students all possible questions,
the resulting “true scores” on this complete
test would accurately measure their knowledge,
ability, or potential. However, constructing and
administering such a test is impossible. Instead,
the designers create a test that consists of questions
that are, in effect, a random sample drawn
from the universe of all possible questions.

Like the results of any survey that is based on a
sample drawn from some population, the actual
scores that students earn on this test are only
approximations of their true scores. Each actual
score has some measurement error associated with
it. A full report of a student’s performance on this
test would use the actual score and the measurement
error to build an interval estimate.

For the SAT-I Mathematics test, the standard
error of measurement is about thirty points. Student
x’s actual score of 470, combined with this
measurement error, tells us that we can be 95 percent
sure that her or his true score is somewhere
between 410 and 530, an interval that extends
sixty points, that is, two “standard errors,” on
either side of the actual score. Similarly, student y’s
true score is, with 95 percent certainty, between
470 and 590. See figure 6.

These confidence intervals overlap; students x
and y would need actual scores that differed by at
least eighty-four points for us to be 95 percent sure
that their true scores were different. Because their

Fig. 6
Interval estimates of two SAT scores

actual scores differ by only sixty points, we do not
have enough evidence to conclude that their knowledge
differs at all. See the appendix for a derivation
of these statistics.

The consistency of the assessment information
furnished by the SAT is reduced by the seldom-reported
variability introduced by measurement
error. What do the scores mean? What mathematical
ideas are being assessed by this test? How much
mathematics is known by students x and y, whose
scores are statistically the same? The norm-referenced
score reported for each student—a score that
simply describes how that student did relative to
students in the comparison group—carries little
information about how much that student understands
of the arithmetic , elementary algebra , and
geometry content of the test.

In the eyes of parents, administrators, and other
consumers of assessment information, standardized,
norm-referenced tests are the “gold standard”
of objective assessment. However, objectivity—even
in these tests—does not exist. Human judgment
about mental constructs is introduced when test
designers and consumers decide “what items to
include on the test, the wording and content of the
items, the de termination of the ‘correct’ answer, . . .
how the test is administered, and the uses of the
results” (FairTest: The National Center for Fair
and Open Testing), as well as when designers
assume that at any given time, each student possesses
a certain amount of knowledge, ability, or
potential that can be measured, with some measurement
error, by a single instrument. Such a test
is only one way to conceptualize knowledge, ability,
or potential. If knowledge is multifaceted, complex ,
individually constructed, and inextricably tied to
the context in which the learning occurs—as more
than two decades of research on learning indicate
(Davis, Maher, and Noddings 1990; Battista
1999)—then no single instrument is likely to “measure”
that knowledge in any consistent and meaningful
way.
 

Prev Next
 
Home    Why Algebra Buster?    Guarantee    Testimonials    Ordering    FAQ    About Us
What's new?    Resources    Animated demo    Algebra lessons    Bibliography of     textbooks
 

Copyright © 2009, algebra-online.com. All rights reserved.