Using Computer-Adaptive Tests to Assess Student Performance of

English as a Second Language

Yuhong Sun

Abstract

This paper discusses how a computerized assessment can help teachers of English in assessment. It focuses on the characteristics of computer-adaptive testing, the advantages this new approach presents and the significance of using computer-adaptive testing to assess student performance in English as a second language. The paper points out that CAT not only reduces teachers’ workload but also provides authentic, real-world, problem-based challenges to students. The capability to set constructed-response tasks and to measure both student performance and student learning processes makes it a powerful method for assessment in English as a second language. Finally the paper also points out some of the concerns about CAT and the factors that interfere students’ performance in assessment.

Introduction

As China opens her door to the outside world, more and more students realize the importance of learning English. The increased number of students in the higher education of English as a second language and the corresponding increase in time spent by teachers of English on assessment has greatly encouraged English teachers’ interest into how technology can assist in this area.

In China, English is a core course for every university student regardless of his major. Teachers of English in universities are always heavily loaded with teaching as well as assessment. Producing, conducting and marking tests is an extremely time consuming activity for the teachers of English who have to teach large classes of students, and especially who have to conduct exams very frequently in order to assess students’ performance and progress. Before, one method of reducing the workload is to use paper based Multiple Choice Questionnaires (MCQs) if the subject material happens to match this format. The students’ answer sheets are checked by hand or scanned by using an Optical Mark Reader (OMR) to get raw data and then teachers calculate the exam scores by hand. No doubt, the paper based MCQs reduced English teachers’ effort to some extent. However, not all of the questions can match MCQs format. If teachers want to assess students’ thinking skills or problem solving skills, they have to construct short-answer questions or essays and those questions can not be checked by OMR.

In 1998 the TOEFL test was introduced as a computer-based test in many parts of the world, including China. The advantage of administering and scoring large-scale tests so quickly and efficiently and the capability to measure complex learning immediately called the English teachers’ and researchers’ attention. Computer-based tests not only reduce teachers’ workload but also provide an authentic, real-world, problem-based challenges to students. Among other things, one of the most outstanding aspects is the capability to set constructed-response tasks and to measure both student performance and student learning processes. Many educators predicted that sooner or later computers would become powerful tools for assessment.

Today computers are indeed playing an important role in the types of tests to assess the second language learning in Chinese universities. In fact, computerized testing is increasingly being considered as a practical alternative to paper-and-pencil testing (Kingsbury & Houser, 1993).

This paper describes what a computer-adaptive test (CAT) is, what advantages this new approach has and how it can be applied to the formative tests to assess students’ performance in English as a second language. Finally the paper points out some of the factors that should be taken into consideration to ensure that the computer-based assessment methods adopted reflect both the aims and objectives of learning.

Characteristics and Advantages of Computer-based Assessment

Tests administered on computers are known as computerized tests, generally known as computer-based tests. Actually computerized tests consist of two main different types: Computer Adaptive Testing (CAT) and Computer Based Testing (CBT). Adaptive testing is a dynamic form of test where a student’s answers to any combination of questions will have an impact on the progress of the exam; depending on how well the student is doing, the questions he or she subsequently will see change over the testing period. In contrast, CBT refers to any other kind of test that was once in conventional format and that is transliterated as exactly as possible to the computer (Niemeyer, 1999).

In the 1960s and 1970s, the U.S. Department of Defense perceived the potential benefits of adaptive testing and supported extensive theoretical research in CAT (Wainer, 1990). It was Frederick Lord who first introduced Computer-adaptive test in 1970s. He succeeded in working out the theoretical structure and the practical technicalities of creating mass-administered tailored tests using the computer (Lord, 1980). However, until 1990s this standardized adaptive test has been available. Since then, computer-based testing (CBT) in general, and computer-adaptive testing (CAT) in particular has more and more proved to be a positive development in assessment.

The computer-adaptive Test works differently as paper-and-pencil test. In a computer-adaptive test, each student takes a test that is unique to him. A CAT is tailored to a student’s performance level and provides precise information about his abilities using fewer test questions than traditional paper-based tests. At the start of the test, the student is presented with test questions of average difficulty. As he answers each question, the computer scores that question and then uses that information, as well as his responses to previous questions and information about the test design, to determine which question is presented next. If he replies correctly, the next item received is more difficult. However, if he response incorrectly, he will be presented an easier question in turn. His next question will be the one that best reflects both his previous performance and the test design.

Computer-adaptive tests are also scored differently than most paper-and-pencil tests. A student's score on a CAT test depends on a combination of the following factors as: (1) The number of questions he answered within the allotted time; (2) His performance on questions answered throughout the test; and (3) The statistical characteristics of questions answered throughout the test (including difficulty level).

The result of this approach is “the higher precision across a wider range of students’ ability levels” (Carlson, 1994). Time-consuming and inefficient test is avoided, which presents either extremely easy questions to high-ability students or extremely difficult questions to low-ability students. Other advantages of CAT also include the following:

Confidence. Students are challenged but not discouraged by the questions that are presented because the questions are neither too difficult nor too easy.
Immediate Feedback. The test can be scored as soon as the student finishes his test. It can provide immediate feedback for the students.
Multimedia Presentation. Tests can include text, graphics, photographs, and even full-motion video clips. For example, in Listening part in TOEFL, while students are listening, pictures of the speakers or other information are presented on the computer screen, which “make it possible for students to interact with a computer during language testing”, see Brown (1997).
Self-Pacing. CAT allows students to answer the test questions at their own pace. But the speed of student responses could be used as additional information in assessing proficiency.6
Convenience. Computer based assessment (CBA) can be delivered in many ways: floppy disk, CD ROM, online or on a designated computer hard drive.
Comfortable environment. The computer test lab is quiet and comfortable. There is neither distraction nor interference from outside.

Using Computer-adaptive Tests to Assess Student Achievement in ESL

Because of the advantages mentioned above, more and more second language researchers, practitioners, and experts in testing organizations are enthusiastically involved in designing, developing, and using computerized placement and achievement tests. They aim to provide models that can be used by individual students, departments and schools to implement and evaluate CAT in second language. They not only attempt to replace the summative paper-and-pencil tests in assessing students’ general proficiency of English as a second language but also want to design some more specific formative computer-based tests to assess students’ learning that is closely related to the curriculum. For example, University of Luton has several years experience of delivering summative computer-based examinations, as well as formative computer-based tests. Strathclyde University has developed and implemented a web-based assessment engine. (Bull, 1999 )

Software development companies also assist language researchers, experts and developers with their own institutional second language CATs. For example, Assessment Systems Corporation (St. Paul, Minnesota) and Computer-Adaptive Technologies (Chicago, Illinois) offered Commercial CAT programs and those programs make it easier for developers to create second language CATs by using the software templates (Dunkel, 1999). It is predicated that in the near future, more and more commercial companies and academic institutions will be engaged in this area and sooner or later computer-adaptive testing will become a practical alternative to traditional paper-and-pencil testing.

Today the question no longer seems to be, "Should we use or create a CBT or a CAT?" but rather, "What do we need to know about computer-based or computer-adaptive testing to design such tests?" Or “What should we do to make good use of such tests to assess students’ learning in ESL?”

At present, there are so many computer-based tests available at the market for teachers to choose. However, not all of the tests are very practical to use. Some only assess students’ general proficiency of English but not appropriate for assessing particular skills such as reading and listening comprehension. Some are very fancy at forms but poor at contents. Some don’t meet the aims and goals of the second language learning. Therefore, it becomes a new challenge for the teachers of English to design an appropriate CAT or to make full use of it to assess student performance. Their new tasks are (1) to design or help assessment designers or publishers design CAT tests that are related to the curriculum, meaning a student's performance on a test should indicate the student's level of competence in the school curriculum; (2) to choose the CAT tests that best meet the goals and aims of the learning; (3) to choose the CAT tests that are integrated with students’ current learning, not an add-on extra. In order to ensure that the computer-based assessment adopted reflect both the aims and objectives of second language learning, teachers should strictly select the CAT tests according the following rules.

The computer-based tests (CAT) can be delivered in many ways: floppy disk, CD ROM, online or on a designated computer hard drive.
The tests should be based on the premise that assessment is curriculum referenced, meaning that they must be (1) tied to a student’s curricula, (2) capable of having many multiple forms, and (3) sensitive to the improvement of students' achievement over time (Fuchs, Deno, & Mirkin, 1983).
The tests should be easy to use. Trained teachers don’t have to spend a lot of time administering, scoring, and graphing a reading passage.
The tests should be reusable, meaning they can be administered frequently and repeatedly which allows the teacher to view the student’s progress over several days or several months rather than in one testing session.

The tests should be safe to use. The computer must contain a substantially large item pool to ensure that test items do not reappear frequently for students to memorize them.

Once a teacher has the most suitable CAT tests, he can use them to serve his own purposes. There are many things that a teacher can do with CAT tests. For instance, he can conduct exams frequently without much effort, viewing student’s progress over several days or several months as he chooses. Furthermore, he can reference the student’s performance in different ways (Deno, 1985b): (1) compare how the same student has done recently on other similar tasks; (2) see how the student is progressing toward a long term goal; (3) compare how the student is doing before or after he made some adjustments in instruction; and (4) he can also compare the student with other students in another school (for online testing). Each student’s performance can be recorded, stored and compared in a variety of ways at a future time.

The frequent viewing of a student’s progress can increase the teacher’s sensitivity to his instruction such as when he needs to modify his methods. This results in a greater achievement for the student since the teacher concerns about students’ progress and modifies his instruction from time to time to suit the students’ current needs.

Another example is that a teacher can choose different tests for the learning needs of individuals. Computer adaptive assessment is designed to allow learners to select assessment items they wish to use for their own needs. For example, a student who needs to improve his reading may enter keywords or subjects into a database, searching for questions on a particular topic that he needs help with.

A teacher can also use CAT to provide remedial activities depending on the student’s results in a test item. In this case the educational designer will have included a structure that branches the student to additional resources or activities based on their performance.

There are so many potentials of CAT that only a few examples are mentioned here. In general, CAT serves the purpose of "assisting learners in monitoring their understanding, leading students to re-study or seek help they need.

Concerns

However, along with the advantages there are also many concerns and disadvantages. Some are concerned about the degree to computer anxiety might be injected into the assessment process to impact student performance in negative ways. Some observe that most students show higher scores for paper-and-pencil exams, but a few have found advantages for those who take computerized tests. Students are unable to underline text, scratch out eliminated choices—all commonly used strategies. Computer screens take longer to read than printed materials, and that it is more difficult to detect errors on computer screens. Most computerized tests show only one item on the screen at a time, preventing test-takers from easily checking previous items and the pattern of their responses. Scrolling through multiple screens does not allow side-by-side comparisons. Computers may worsen test bias. The performance gap which already exists on multiple-choice tests between men and women, ethnic groups, and persons from different socioeconomic backgrounds could widen as a result of computerized testing.

Conclusion

Compared with the advantages of CAT, the disadvantages seem trivial. However, they can not be neglected. As computer technology has become a practical and powerful tool for assessment, more and more traditional paper-and-pencil tests will be switched to CAT. Therefore, it is important to pay enough attention to the concerns and solve some of the CAT technique problems.

As the paper pointed out earlier, Computer-based testing in general and Computer-adaptive testing in particular not only reduce teachers’ workload but also provide an authentic, real-world, problem-based challenges to students. The outstanding capability to set constructed-response tasks and to measure both student performance and student learning processes will sooner or later become a potentially powerful tool for replacing traditional paper and pencil tests. The computer-adaptive assessment methods addressed in this paper are especially useful methods for meeting the goals and aims of curriculum and students' needs of learning. Therefore, It is English teachers' responsibility to pay special attention to testing the use of computer-adaptive testing tools, designing new CAT tools for assessing student performance in ESL and helping students overcoming computer anxiety.

References

Bull, J. (1999) The implementation and evaluation of computer assisted assessment project TLTP phases 3. Infocus [Online] vailable:

http://www.lboro.ac.uk/departments/dils/cti/infocus8/caa.htm [1999, February]

Bernhardt, E. (1996). If reading is reader-based, can there be a computer adaptive reading test? Symposium conducted at the Center for Advanced Research on Language Acquisition of the University of Minnesota, Bloomington, MN.

Carslon, R. (1994). Computer-adaptive testing: A shift in the evaluation paradigm. Journal of Educational Technology Systems, 22, 213-224.

Chung, Gregory K. W. K., Baker, Eva L. (1997). Year 1 Technology Studies: Implications for Technology in Assessment. National Center for Research on Evaluation, Standards, and Student Testing, Los Angeles, CA.

Deno, S. (1985b). The nature and development of curriculum-based measurement. Preventing School Failure,

DiGangi, S. A., Jannasch-Pennel, A., Chong H., & Mudiam S. V. (1999). Curriculum-Based Measurement and Computer-Based Assessment: Constructing an intelligent, web-based evaluation tool. Society for Computers in Psychology [Online].

Dunkel, Patricia. (1999). Considerations in Developing or Using Second/Foreign Language Proficiency Computer-Adaptive Tests. Language Learning & Technology, 2, 77-93.

Federico, P. (1996) A Measuring Recognition Performance Using Computer-based and Paper-based Methods. Behavior Research Methods, Instruments, & Computers, 23, 341-347.

Fuchs, L., Wesson, C., Tindal, G., Mirkin, P., & Deno, S. (1981). Teacher efficiency in continuous evaluation of IEP goals (Research Report No. 53). Minneapolis: University of Minnesota Institute for Research on Learning Disabilities.

Kingsbury, G., & Houser, R. (1993). Assessing the utility of item response models: Computer adaptive testing. Educational Measurement: Issues and Practice. 12, 21-27.

Kumar, D. D. (1994) Computer Based Science Assessment: Implications for Students with Learning Disabilities. [Online] Available:

Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.

Mogey, N., & Watt, H. (1999) The use of computers in the assessment of student learning. Implementing Learning Technology [Online] Available:

http://www.icbl.hw.ac.uk/ltdi/implementing-it/using.htm [1999, July].

Niemeyer, Chris. (1999) A computerized final exam for a library skills course. Reference Services Review, 27, 91 –92.

Wainer, H. (1990). Computer adaptive testing: A primer. Hillsdale, NJ: Lawrence Erlbaum Associates.

Zakrzewski, Stan. & Bull, Joanna. (1998 ) The Mass Implementation and Evaluation of Computer-based Assessments. Assessment & Evaluation in Higher Evaluation, 23, 141-52.