BASED
ON SCORE INTERPRETATION
1. Norm-Referenced
testing
Test
which compares a person’s score against the scores of a group of people who
have already taken the same exam.
A norm-referenced
test is a standardized test that compares a student's test performance
with that of a sample of similar students who have taken the same test. After
constructing a test, the test developers administer it to a standardization
sample of students using the same administration and scoring procedures for all
students. This makes the administration and scoring "standardized."
The test scores of the standardization sample are called norms, which include a
variety of types of scores. Norms are the scores obtained by the
standardization sample and are the scores to which students are compared when
they are administered a test.
Once test developers
standardize a norm-referenced test, examiners can administer it to students
with similar characteristics to the norm group and can compare the scores of
these students with those of the norm group. Norm-referenced standardized tests
can use local, state, or national norms as a base. Because of the comparison of
scores between a norm group and other groups of students, a norm-referenced
test provides information on the relative standing of students.
When assessing students
with disabilities, evaluators should employ caution before making comparisons
or interpretations stemming from established norms. It is possible to use
typical norms when making interpretations that draw from the relative
performance of the students with disabilities and from the general population
of students. However, when making comparisons or interpretations that use level
or degree of disability, normative data should come from the sample population
to which comparisons are made.
Test manuals should provide
sufficient details about the normative group so that test users can make
informed judgments about the appropriateness of the norm sample (American
Educational Research Association et al., 1999).
Purpose :
The major reason for using
a norm-referenced tests (NRT) is to classify students. NRTs are designed to
highlight achievement differences between and among students to produce a dependable
rank order of students across a continuum of achievement from high achievers to
low achievers (Stiggins, 1994). School systems might want to classify students
in this way so that they can be properly placed in remedial or gifted programs.
These types of tests are also used to help teachers select students for
different ability level reading or mathematics instructional groups.
With norm-referenced tests,
a representative group of students is given the test prior to its availability
to the public. The scores of the students who take the test after publication
are then compared to those of the norm group. Tests such as the California
Achievement Test (CTB/McGraw-Hill), the Iowa Test of Basic Skills (Riverside),
and the Metropolitan Achievement Test (Psychological Corporation) are normed
using a national sample of students. Because norming a test is such an
elaborate and expensive process, the norms are typically used by test
publishers for 7 years. All students who take the test during that seven year
period have their scores compared to the original norm group.
Content : The content of an NRT
test is selected according to how well it ranks students from high achievers to
low. (discriminates among students.)
The normal curve represents the norm or
average performance of a population and the scores that are above and below the
average within that population. The norms for a test include percentile ranks,
standard scores, and other statistics for the norm group on which the test was
standardized. A certain percentage of the norm group falls within various
ranges along the normal curve. Depending on the range within which test scores
fall, scores correspond to various descriptors ranging from deficient to
superior.
The Graduate Record Exam (GRE)
The GRE is taken by college students wishing to enter graduate schools. The
test items are included in an actual exam after they are analyzed and
determined to discriminate appropriately. The following quote describes the
"test development process" at GRE:
The General Test is composed of questions
formulated by specialists in various fields. New questions are pretested in
actual tests under standard testing conditions. Questions appearing in a test
for the first time are analyzed for usefulness and potential weaknesses; they
are not used in computing scores. Questions that perform satisfactorily become
part of a pool from which new editions of the General Test are assembled at a
future date.
Others example : IQ test, TOEFL, CAT,
CTBS and SAT
2. Criterion-Referenced
Testing
A criterion-referenced
test is a test that provides a basis for determining a candidate's level of
knowledge and skills in relation to a well-defined domain of content. Often one
or more performance standards are set on the test score scale to aid in test
score interpretation. Criterion-referenced tests, a type of test introduced by
Glaser (1962) and Popham and Husek (1969), are also known as domain-referenced
tests, competency tests, basic skills tests, mastery tests, performance tests
or assessments, authentic assessments, objective-referenced tests,
standards-based tests, credentialing exams, and more. What all of these tests
have in common is that they attempt to determine a candidate's level of
performance in relation to a well-defined domain of content. This can be
contrasted with norm-referenced tests, which determine a candidate's
level of the construct measured by a test in relation to a well-defined
reference group of candidates, referred to as the norm group. So it might be
said that criterion-referenced tests permit a candidate's score to be
interpreted in relation to a domain of content, and norm-referenced tests
permit a candidate's score to be interpreted in relation to a group of examinees.
The first interpretation is content-centered, and the second interpretation is
examinee-centered.
Purpose :
While norm-referenced tests
ascertains the rank of students, criterion-referenced tests (CRTs) determine
"...what test takers can do and what they know, not how they compare to
others (Anastasi, 1988, p. 102). CRTs report how well students are doing
relative to a pre-determined performance level on a specified set of
educational goals or outcomes included in the school, district, or state curriculum.
Educators or policy makers
may choose to use a CRT when they wish to see how well students have learned
the knowledge and skills which they are expected to have mastered. This
information may be used as one piece of information to determine how well the
student is learning the desired curriculum and how well the school is teaching
that curriculum.
Content : The content of a CRT
test is determined by how well it matches the learning outcomes deemed most
important. Although no test can measure everything of importance, the content
selected for the CRT is selected on the basis of its significance in the
curriculum.
A Sample CRM: The Performance Assessment
Most appropriate for determining the progress
of smaller numbers of students on higher-order learning tasks. For performance
assessments, students are tasked with creating or presenting a unique product
or solution (paper, design, oral presentation, hands-on experiment). They are
given standards or expected criteria prior to their performance. The standards
are used to create rubrics or scales for use by instructors or raters in
assessing student products or presentations.
Classroom quizzes and exams that are based on
course objectives are other examples of criterion-references measures. Quizzes
and exams can be norm-referenced, however, if the instructor purposely selects
items that discriminate
Others Example : PIAT (Peabody Individual
Achievement Test)
BASED ON TESTING MODE
1. Direct
Test
A test is said to be direct
when the test actually requires the candidate to demonstrate ability in the
skill being sampled. It is a performance test. For example, if we wanted to
find out if someone could drive a vehicle, we would test this most effectively
by actually asking him to drive the vehicle. In language terms, if we wanted to
test whether someone could write an academic essay, we would ask him to do just
that. In terms of spoken interaction, we would require candidates to
participate in oral activities that replicated as closely as possible [and this
is the problem] all aspects of real-life language use, including time
constraints, dealing with multiple interlocutors, and ambient noise. Attempts
to reproduce aspects of real life within tests have led to some interesting scenarios.
Such tests
include:
·
Role-playing.
·
Information gap tasks.
·
Reading authentic texts, listening to authentic texts.
·
Writing letters, reports, form filling and note taking.
·
Summarising.
Direct tests are task oriented
rather than test oriented, they require the ability to use language in real
situations, and they therefore should have a good formative effect on your
future teaching methods and help you with curricula writing. However, they do
call for skill and judgment on the part of the teacher.
2.
Indirect Test
An indirect
test measures the ability or knowledge that underlies the skill we are trying
to sample in our test. So, for example, you might test someone on the Highway
Code in order to determine whether he is a safe and law-abiding driver [as is
now done as part of the UK driving test]. An example from language learning
might be to test the learners’ pronunciation ability by asking them to match
words that rhymed with each other.
One of these words sound different from the others.
Underline it.
Door, law, though, pore
This is essentially knowledge
about the target language [or recognition of target language items] rather than
actual performance in the language. Indirect testing is controversial,
and views on it vary, but it is clear that many of the claims made for it in
the past cannot be readily substantiated. It does not give any direct
indication of the candidates’ oral proficiency, accuracy, or appropriateness of
pronunciation. In many instances, an indirect approach involves the testing of
enabling skills at a micro-level. Thus, in terms of spoken interaction, we
might seek to test learners by asking them to write down what they would
actually say in a given situation.
BASED ON CONCEPT OF
LANGUAGE ABILITY
1.
Discrete Testing
Discrete Point tests are based on an
analytical view of language. This is where language is divided up so that
components of it may be tested. Discrete point tests aim to achieve a high
reliability factor by testing a large number of discrete items. From these
separated parts, you can form an opinion is which is then applied to language
as an entity. You may recognise some of
the following Discrete Point tests:
1.
Phoneme recognition.
2.
Yes/No, True/ False answers.
3.
Spelling.
4. Word completion.
5. Grammar items.
6. Most multiple choice
tests.
Such tests have a down side in
that they take language out of context and usually bear no relationship to the
concept or use of whole language.
2.
Integrative Testing
In
order to overcome the above defect, you should consider Integrative tests. Such tests usually require the testees to
demonstrate simultaneous control over several aspects of language, just as they
would in real language use situations. Examples of Integrative tests that you
may be familiar with include:
1.
Cloze tests
2.
Dictation
3.
Translation
4.
Essays and other coherent writing tasks
5.
Oral interviews and conversation
6.
Reading, or other extended samples of real text
3.
Communicative Testing
Since
the late 1970s and early 1980s the Communicative approach to language teaching
has gained dominance. What is actually meant by ‘Communicative ability’ is
still a matter of academic interest and research. Broadly speaking
communicative ability should encompass the following skills:
·
Grammatical competence. How grammar rules are actually applied in written and
oral real life language situations.
·
Sociolinguistic competence. Knowing the rules of language use, ‘Turn taking’
during conversation discourse, etc. or using appropriate language for a given
situation.
·
Strategic competence. Being able to use appropriate verbal and non-verbal
communication strategies.
Communicative
tests are concerned not only with these different aspects of knowledge but on
the testees’ ability to demonstrate them in actual situations. So, how should
you go about setting a Communicative test?
Firstly, you should attempt to
replicate real life situations. Within these situations communicative ability
can be tested as representatively as possible. There is a strong emphasis on
the purpose of the test. The importance
of context is recognised. There should be both authenticity of task and genuiness
of texts. Tasks ought to be as direct as
possible. When engaged in oral
assessment you should attempt to reflect the interactive nature of normal
speech and also assess pragmatic skills being used.
Communicative
tests are both direct and integrative.
They attempt to focus on the expression and understanding of the
functional use of language rather than on the more limited mastery of language
form found in discreet point tests.
The
theoretical status of communicative testing is still subject to criticism in
some quarters, yet as language teachers see the positive benefits accruing from
such testing, they are becoming more and more acceptable. They will not only
help you to develop communicative classroom competence but also to bridge the
gap between teaching, testing and real life.
They are useful tools in the areas of curriculum development and in the
assessment of future needs, as they aim to reflect real life situations. For
participating teachers and students this can only be beneficial.
4.
Performance Testing
Performance test or assessment
is a term that is commonly used in place of, or with, authentic assessment.
Performance assessment requires students to demonstrate their knowledge,
skills, and strategies by creating a response or a product (Rudner &
Boston, 1994; Wiggins, 1989). Rather than choosing from several multiple-choice
options, students might demonstrate their literacy abilities by conducting
research and writing a report, developing a character analysis, debating a
character's motives, creating a mobile of important information they learned,
dramatizing a favorite story, drawing and writing about a story, or reading
aloud a personally meaningful section of a story. For example, after completing
a first-grade theme on families in which students learned about being part of a
family and about the structure and sequence of stories, students might
illustrate and write their own flap stories with several parts, telling a story
about how a family member or friend helped them when they were feeling sad.
The formats for performance
assessments range from relatively short answers to long-term projects that
require students to present or demonstrate their work. These performances often
require students to engage in higher-order thinking and to integrate many
language arts skills. Consequently, some performance assessments are longer and
more complex than more traditional assessments. Within a complete assessment
system, however, there should be a balance of longer performance assessments
and shorter ones.
BASED ON COURAGE ON
THE MATERIAL AND TIME ALLOTMENT
1.
Power Test
On a power test, the student is given
sufficient time to finish the test. Some students may not answer all the
questions, but this is because they are unable to do so, not because they were
rushed. Most classroom tests are power tests: the length has been set to permit
all students to complete the test.
2.
Speed Test
On a speed
test, the student works against time. A typical speed test is the typing test
in which the student tries to improve his or her rate of words per minute. A
language test that is so long the students are unable to finish within the time
allotted and that contains items of more or less equal difficulty throughout
the test would be considered a speed test. For instance, the reading and
translation test given for doctoral candidates is frequently a speed test: the
candidates must finish the translation within a specific time limit.
BASED ON ITEM
PRESENTATION
1.
Computer Adaptive
Test
Computer-adaptive
testing (CAT) is a technologically advanced method of assessment in which the
computer selects and presents test items to examinees according to the
estimated level of the examinee's language ability. The basic notion of an
adaptive test is to mimic automatically what a wise examiner would normally do.
Specifically, if an examiner asked a question that turned out to be too
difficult for the examinee, the next question asked would be considerably
easier. This approach stems from the realization that we learn little about an
individual's ability if we persist in asking questions that are far too
difficult or far too easy for that person. We learn the most about an
examinee's ability when we accurately direct our questions at the current level
of the examinee's ability (Wainer, 1990, p. 10).
Thus, in a CAT, the
first item is usually of a medium-difficulty level for the test population. An
examinee who responds correctly will then receive a more difficult item. An
examinee who misses the first item will be given an easier question. And so it
goes, with the computer algorithm adjusting the selection of the items
interactively to the successful or failed responses of the test taker.
Advantages :
In a CAT, each examinee takes a
unique test that is tailored to his or her ability level. Avoided are questions
that have low information value about the test taker's proficiency. The result
of this approach is higher precision across a wider range of ability levels
(Carlson, 1994, p. 218). In fact, CAT was developed to eliminate the
time-consuming and inefficient (and traditional) test that presents easy
questions to high-ability persons and excessively difficult questions to
low-ability testees. Other advantages of CAT include the following:
- Self-Pacing. CAT allows test takers to work
at their own pace. The speed of examinee responses could be used as
additional information in assessing proficiency, if desired and warranted.
- Challenge. Test takers are challenged by
test items at an appropriate level; they are not discouraged or annoyed by
items that are far above or below their ability level.
- Immediate Feedback. The test can be scored
immediately, providing instantaneous feedback for the examinees.
- Improved Test Security. The computer contains the
entire item pool, rather than merely those specific items that will make
up the examinee's test. As a result, it is more difficult to artificially
boost one's scores by merely learning a few items or even types of items
(Wainer, 1990). However, in order to achieve improved security, the item
pool must be sufficiently large to ensure that test items do not reappear
with a frequency sufficient to allow examinees to memorize them.
- Multimedia Presentation. Tests can include text,
graphics, photographs, and even full-motion video clips, although
multimedia CAT development is still in its infancy.
2.
Paper-based Test
Paper based Testing uses traditional standard
testing to assess the student abilities . It is commonly used in class during
teachimg and learning activities. For example, it holds in daily test and some
quizes.
BASED ON TEST-MAKER
1.
Teacher Made Test
Teacher
made test are prepared by teacher for use with particular groups of students with regard to curriculum . Teacher made test may reveal specific areas of
instruction in which a students need remedial help.
2. Standardized Test
Standardized tests take the form of a
series of questions with multiple choice answers which can be filled out by
thousands of test takers at once and quickly
graded using scanning machines. The test is
designed to measure test takers against each
other and a standard, and standardized tests are
used to assess progress in schools, ability to attend institutions of higher education, and to place students in programs
suited to their abilities. Many parents and educators have criticized standardized testing, arguing that it is not a fair
measure of the abilities of the test taker, and
that standardized testing, especially high-stakes testing, should be minimized or abolished
altogether.
Standardized tests can either be on
paper or on a computer. The test taker is
provided with a question, statement, or problem, and expected to select one of
the choices below it as an answer. Sometimes the answer is straightforward;
when asked what two plus two is, a student would select “four” from the list of
available answers. The answer is not always so clear, as many tests include
more theoretical questions, like those involving a short passage that the test taker is asked to read. The student is instructed
to pick the best available answer, and at the end of a set time period, answer
sheets are collected and scored.
There are some advantages
to standardized tests. They are cheap, very quick
to grade, and they allow analysts to look at a wide sample of individuals. For
this reason, they are often used to measure the progress of a school, by
comparing standardized test
results with students from other schools. However, standardized
tests are ultimately not a very good measure of individual student performance
and intelligence, because the system is extremely simplistic. A standardized test can
measure whether or not a student knows when the Magna Carta was written, for
example, but it cannot determine whether or not the student has absorbed and
thought about the larger issues surrounding the historical document.
Studies on the format of standardized tests have suggested that many of them
contain embedded cultural biases which make them inherently more difficult for
children outside the culture of the test writers.
Although most tests are analyzed for obvious bias and offensive terms,
subconscious bias can never be fully eliminated. Furthermore, critics have
argued that standardized tests do not allow a
student to demonstrate his or her skills of reasoning, deductive logic, critical thinking, and creativity. For this reason,
some tests integrate short essays. These essays are often given only short
attention by graders, who frequently vary widely in opinion on how they think
the essay should be scored.
Finally, many concerned
parents and educators disapprove of the practice of high-stakes testing. When a
standardized test
is used alone to determine whether or not a student should advance a grade,
graduate, or be admitted to school, this is known as high-stakes testing.
Often, school accreditation or teacher promotion rests on the outcome of standardized tests alone, an issue of serious concern
to many people. Critics of high-stakes testing believe that other factors
should be accounted for when considering big issues including classroom
performance, interviews, class work, and observations.
References :
Brown, H.Douglas. 2003. Language Assessment(Principles and Classroom Practices). San
Francisco State University: Longman.
Cizek, G. (Ed.). (2001). Setting
performance standards: Concepts, methods, and perspectives.
Alderson, J. C., Clapham, C., & Wall, D.
(1995). Language test construction and
evaluation. New York: Cambridge University Press.
0 comments:
Post a Comment