Read Microsoft Word - Technical Manual January 2003.doc text version

ACCUPLACER

OnLine

TECHNICAL MANUAL

January 2003

TABLE OF CONTENTS Chapter 1 - Overview of ACCUPLACERTM OnLine................................................................. 1 Introduction ............................................................................................................................ 1 Purpose of ACCUPLACER On-Line Tests ............................................................................. 2 Overview and Description of ACCUPLACER OnLine ............................................................. 2 Unique Features of ACCUPLACER OnLine ........................................................................... 3 Organization of this Manual.................................................................................................... 5 Chapter 2: Development of ACCUPLACER Tests................................................................. 6 ACCUPLACER Test Development Steps ............................................................................... 6 Historical Overview of ACCUPLACER Test Forms Development ........................................... 9 Test Maintenance..................................................................................................................13 Chapter 3: Understanding and Interpreting ACCUPLACER Scores ...................................15 ACCUPLACER Score Scale: Total Right Score ...................................................................15 The Percentile Rank..............................................................................................................16 Information for Making Placement Decisions.........................................................................16 Chapter 4: Reading Comprehension .....................................................................................17 Proficiency Statements for the Reading Comprehension Test...............................................18 Chapter 5: Sentence Skills Test............................................................................................19 Proficiency Statements for the Sentence Skills Test..............................................................20 Chapter 6: Arithmetic Test ....................................................................................................21 Proficiency Statements for the Arithmetic Test ......................................................................22 Chapter 7: Elementary Algebra Test.....................................................................................23 Proficiency Statements for the Elementary Algebra Test.......................................................24 Chapter 8: College-Level Mathematics Test ........................................................................25 Proficiency Statements for College-Level Math Test .............................................................26 Chapter 9: Levels of English Proficiency Tests...................................................................28 Content of LOEP Reading Skills Test....................................................................................28 Proficiency Statements for the LOEP Reading Skills Test .....................................................29 Content of LOEP Language Use Test ...................................................................................30 Proficiency Statements for the LOEP Language Use Test ....................................................31 Content of LOEP Sentence Meaning Test.............................................................................31 Proficiency Statements for the LOEP Sentence Meaning Test..............................................33

ACCUPLACER

THE COLLEGE BOARD

Chapter 10: Statistical Characteristics of ACCUPLACER Item Banks ...............................34 Statistical Characteristics of Reading Comprehension Item Pool ..........................................34 Statistical Characteristics of Sentence Skills Item Pool .........................................................35 Statistical Characteristics of Arithmetic Item Pool..................................................................37 Statistical Characteristics of Elementary Algebra Item Pool ..................................................38 Statistical Characteristics of College-Level Math Item Pool ...................................................39 Statistical Characteristics of the LOEP Item Pools ................................................................40 Chapter 11: Reliability, Information, and Measurement Error ............................................44 Test-Retest Reliability............................................................................................................44 Internal Consistency Reliability..............................................................................................45 The Standard Error of Measurement and Conditional Standard Error ...................................45 IRT Test Information .............................................................................................................46 Classification Consistency.....................................................................................................46 Reliability, Standard Errors and Test Information Data for ACCUPLACER Scores................47 Test-Retest Reliability............................................................................................................52 Accuracy of Classification......................................................................................................53 Information Functions for ACCUPLACER Item Pools............................................................53 Chapter 12: Validity Evidence for ACCUPLACER................................................................61 A Brief Overview of Test Validity and Validation ....................................................................61 Test Validation.......................................................................................................................62 Gathering Validity Evidence...................................................................................................62 Content Validity Evidence for ACCUPLACER........................................................................63 Predictive Validity ..................................................................................................................63 Predictive Validity Across 50 Institutions ...............................................................................64 Correlation results .................................................................................................................65 Reading Comprehension.......................................................................................................66 Sentence Skills......................................................................................................................66 Arithmetic ..............................................................................................................................66 Elementary Algebra...............................................................................................................66 College-Level Mathematics ...................................................................................................66 Validity Study for the Levels of English Proficiency (LOEP) Test...........................................70 LOEP Correlational Results...................................................................................................74 Differential Predictive Validity ................................................................................................80 Differential Predictive Validity Results ...................................................................................80

ACCUPLACER

THE COLLEGE BOARD

Other ACCUPLACER Validity Studies ...................................................................................84 Differential Item Functioning..................................................................................................86 ACCUPLACER DIF Studies ..................................................................................................86 Evaluating DIF on the Reading Comprehension, Sentence Skills, Arithmetic, Elementary Algebra, and College-Level Mathematics Tests.....................................................................87 LOEP DIF Study....................................................................................................................90 LOEP DIF Results .................................................................................................................91 Summary of ACCUPLACER Validity Evidence ......................................................................92 Chapter 13: WritePlacer PlusTM ............................................................................................93 Development of WritePlacer Plus Prompts............................................................................93 Equivalence of WritePlacer Plus Prompts .............................................................................93 WritePlacer Plus Scoring.......................................................................................................93 Summary Of Research On WritePlacer Plus.........................................................................94 References ..............................................................................................................................97 Appendix A: National Reference Group Data Appendix B: Test Development Committees Appendix C: Institutions Participating In Validity Studies Appendix D: Institutions Participating In LOEP Pretest And Validity Studies Appendix E: Surveys of Cut-Scores

ACCUPLACER

THE COLLEGE BOARD

Chapter 1 - Overview of ACCUPLACERTM OnLine

Introduction

ACCUPLACER OnLine is a computerized-adaptive placement testing system delivered over the Internet that is used for assessing the knowledge and skills of incoming college students. Tests within the ACCUPLACER system are designed to diagnose students' strengths and weaknesses and help colleges and universities make appropriate course placement decisions for students. The ACCUPLACER System is designed to provide placement, advisement, and guidance information for students entering two- or four-year institutions of higher education. The purpose of this technical manual is to provide college and university personnel, and others who use or who are interested in using ACCUPLACER, with as much information as possible regarding the technical qualities of ACCUPLACER tests. We define technical quality broadly and so this manual includes information pertaining to the purpose of these tests, the content of the tests and how they were developed, how to interpret ACCUPLACER scores, the accuracy of the scores from a measurement perspective, and evidence that bears on the validity of interpretations made on the basis of ACCUPLACER scores. In addition to this manual, other documents pertaining to ACCUPLACER include, the ACCUPLACER OnLine User's Manual, ACCUPLACER OnLine Student Guide, ACCUPLACER OnLine Frequently Asked Questions, and [list to be completed]. The Standards for Educational and Psychological Testing, which was developed by the American Educational Research Association (AERA), the American Psychological Association (APA) and the National Council on Measurement in Education (NCME), is a widely endorsed document for evaluating and ensuring the quality of educational tests (AERA, APA, NCME, 1999). With respect to providing technical documentation to test users, the Standards state The provision of supporting documents for tests is the primary means by which test developers, publishers, and distributors communicate with test users. These documents are evaluated on the basis of their completeness, accuracy, currency, and clarity and should be available to qualified individuals as appropriate. (p. 67) These suggestions are echoed by the Guidelines for Computer-Based Testing developed by the Association of Test Publishers (2002). We acknowledge that a clear understanding of the purpose of a test and of its technical qualities is necessary for test consumers to make informed decisions about the use of a particular test. Therefore, this manual is comprehensive with respect to providing information about ACCUPLACER. The Standards were used to decide which information to include in the manual as well as to educate readers about the information that is provided. Thus, readers will find several excerpts from the Standards interspersed throughout the manual. With respect to information that should be included in a technical manual, the Standards stipulate A test's documentation typically specifies the nature of the test; its intended use; the process involved in the test's development; technical information related to scoring, interpretation, and evidence of validity and reliability; scaling and norming if appropriate to the instrument; and guidelines for test administration and interpretation. (p. 67)

Appendix A

Each of these important areas is covered in this manual. Our goal is to provide sufficient information about ACCUPLACER so that interested parties can make "sound judgments about the nature and quality of the test, the resulting scores, and the interpretations based on the test scores" (AERA et al., 1999, p. 67). An attractive feature of this manual is that it is available electronically on The College Board's web site complete with hyperlinks so that users can easily move from section to section to find the information most relevant to their needs. Since the quality of a testing program is not static, The College Board continually evaluates the quality of its tests and conducts ongoing research as necessary. Therefore, additional updates to this technical manual will be made as new information becomes available.

Purpose of ACCUPLACER On-Line Tests

The purpose of ACCUPLACER tests is to determine which course placements are appropriate for students and whether or not remedial work is needed. ACCUPLACER tests can also be used to monitor student course progress and to suggest whether remediation is still needed or if a change in course assignment is recommended. This information may be supplied to the student or to the academic adviser or faculty member to help monitor progress. Because of the "adaptive" nature of the tests, the questions presented on successive tests will vary, thereby greatly reducing the effects of repeated practice on the tests. Scores from ACCUPLACER Tests are intended for use in making placement decisions. In no case should they be used for admissions. To assure fairness, placement decisions made with the aid of ACCUPLACER scores should be reviewed periodically, and if classroom performance indicates that students are capable of more advanced work or need further preparation, placement assignments should be changed. Also, it should be noted that placement decisions are most accurate when multiple measures are used. When possible, ACCUPLACER scores should be used in conjunction with other available data on student performance.

Overview and Description of ACCUPLACER OnLine

ACCUPLACER OnLine is a comprehensive battery of computerized placement tests for incoming college students that have several important features for helping colleges and universities make important course placement decisions. Tests within the ACCUPLACER battery are delivered over the Internet to provide fast and accurate determination of whether a student has the skills to take a freshman course or would benefit most from developmental work. Community colleges, four-year colleges, and technical schools around the world use ACCUPLACER tests extensively. There are nine tests within the ACCUPLACER battery. These tests differ by subject matter and some are specifically designed for students whose primary language is not English. A list of the tests in the ACCUPLACER battery is presented in Table 1.

ACCUPLACER

A-2

THE COLLEGE BOARD

Appendix A Table 1: ACCUPLACER OnLine Tests General Assessments Reading Comprehension Sentence skills Arithmetic Elementary Algebra College Level Mathematics ® WritePlacer Plus Assessments of English Proficiency LOEP Reading Skills LOEP Sentence Meaning LOEP Language Usage LOEP Listening WritePlacer® ESL

In addition to the standard ACCUPLACER OnLine Tests, alternative format assessments are also available (i.e., COMPANION Paper, COMPANION CD-Rom, COMPANION Special Formats--Braille, Audio Cassette, and Large Print).

Unique Features of ACCUPLACER OnLine

ACCUPLACER OnLine uses sophisticated technology to provide accurate and efficient measurement of students' knowledge and skills. It uses computerized-adaptive testing technology to select specific test questions that are best suited for each particular test taker. This "tailoring" of the test for each student allows for accurate diagnosis of students' knowledge and skills using fewer items than are typically required in traditional "paper-and-pencil" tests. The computerized nature of the assessment also allows for instantaneous score reporting. As soon as a student finishes a test, her or his score is available and is immediately exportable into existing campus information systems. Instantaneous score reporting is even available for the ACCUPLACER writing test, WritePlacer Plus. This test uses complex computerized scoring procedures to provide scores for students (see Chapter 13). An extremely convenient feature of ACCUPLACER OnLine is that it is administered over the Internet, so schools can access the testing program whenever it is convenient for them without the attendant difficulties of installing and upgrading software. Computerized Adaptive Testing Computerized adaptive testing is a test administration system that uses the computer to select and deliver test items to examinees (Patelis, 2000). These tests are called adaptive because the computer selects the items to be administered to a specific examinee based, in part, on the proficiency of the examinee. Unlike many traditional tests where all examinees take a single form of an exam, the computer adapts or "tailors" the exam to each examinee. This tailoring is done by keeping track of an examinee's performance on each test item and then using this information to select the next item to be administered. The criteria for selecting the next item to be administered to an examinee are complex. However, the primary criterion is a desire to match the difficulty of the item to the examinee's current estimated proficiency. All ACCUPLACER tests, with the exception of WritePlacer Plus, are computerizedadaptive. Adaptive testing means that the sequence of test questions and the questions themselves will vary from student to student. The next question administered to an examinee is automatically chosen to yield the most information about the examinee based on the skill level indicated by answers to all prior questions.

ACCUPLACER

A-3

THE COLLEGE BOARD

Appendix A ACCUPLACER tailors the test to each student using an item-selection algorithm. This algorithm initially administers an item of middle difficulty to each student, randomly selected from one of about five very similar items. If the response were wrong, it branches to a randomly chosen one of three extremely easy items; if the response were right, it branches to a randomly chosen one of three extremely difficult items. Items presented stay very easy or very difficult until there is at least one right or wrong answer, whereupon item selection aims for maximum information but is subject to constraints that provide for content balance. An example of how the ACCUPLACER computerized-adaptive testing system works is presented below in Figure 1-1. Figure 1-1

As illustrated in Figure 1-1, a student's performance on one ACCUPLACER test question determines the difficulty level of the next question that will be delivered. Two things determine a student's score on an ACCUPLACER test: how many questions were correctly answered, and the difficulty level of the questions that were answered correctly. Because students are tested at their individual ability levels, each student is likely to encounter a different test. This eliminates problems of students exchanging information about answers either before or during the test. Adaptive testing provides very accurate measurement over the complete range of a particular skill. Students in institutions that offer multiple levels of developmental courses will benefit most from the accuracy of adaptive testing. These tests, by personalizing the choice of successive test questions, achieve their accuracy with substantial fewer questions than conventional tests.

ACCUPLACER

A-4

THE COLLEGE BOARD

Appendix A Although few questions (12 - 20) are presented for each ACCUPLACER Test, great accuracy is maintained. This process achieves several positive results. Students are tested quickly and are not frustrated or bored by questions that are too easy or too hard. The difficulty of the questions is quickly and automatically adapted to the capability of the individual student. Thus, challenging tests corresponding to each student' skill level are always provided. s Because of the untimed nature of the tests, students may work at their own pace. Both students and administration can benefit since test results may be displayed immediately. Seamless Serial Testing Due to the fact that ACCUPLACER Tests are untimed, an examinee taking a battery of reading, math, and language tests typically needs less than 75 minutes to complete the battery, including time for completion of background information and an orientation to how the tests are administered. This is appreciably less time than the comparable paper-based battery would require because the tests are shorter than conventional tests. In mathematics, the computer provides sophisticated branching between modules. It starts the examinee with the module likely to be appropriate, and then uses the first module to determine whether to administer a second. This branching application is known as seamless serial testing. Through its use, many examinees will need to take only one mathematics module, and few if any will need all three modules. The Levels of English Proficiency tests permit introduction of similar starting and branching logic in the assessment of verbal skills. The existing ACCUPLACER verbal tests in Reading Comprehension and Sentence Skills provide one test level, "branching."

Organization of this Manual

To best understand ACCUPLACER tests, the content of each test should be understood. The next chapter describes the development of ACCUPLACER tests, followed by a chapter explaining how to interpret ACCUPLACER scores. The next eight chapters each describe a particular ACCUPLACER test. Chapters on reliability and validity information on ACCUPLACER are presented next, followed by a chapter on WritePlacer Plus.

ACCUPLACER

A-5

THE COLLEGE BOARD

Appendix A

Chapter 2: Development of ACCUPLACER Tests

The quality of an examination depends on the quality of the test development process and the expertise of the developers. The development of ACCUPLACER item pools, the selection of items for examinees, score reporting, and other aspects of the testing program undergo numerous quality control checks, each designed to make the decisions made on the basis of ACCUPLACER scores as valid as possible. In this section, we briefly describe the major steps involved in creating ACCUPLACER tests and highlight the procedures designed to ensure fairness to all examinees.

ACCUPLACER Test Development Steps

Several important steps were fundamental to the development of ACCUPLACER tests. These steps include (a) developing test specifications, (b) writing items, (c) reviewing items for technical accuracy, psychometric quality, and equity issues, (d) field-testing items, (e) calibrating items and selecting those to be used in the operational item pools, (f) developing algorithms for selecting items tailored to examinees (item selection algorithm) and providing accurate scores (scaling), and (g) monitoring the functioning of ACCUPLACER tests and the quality of the item pools. Each of these steps is briefly described in this section. Within these steps, numerous reviews and other quality assurance steps are taken to make the test as good as it can be. Overall, experts may perform more than 25 reviews and assurance steps by the time questions are administered in a test that will be scored and reported. Developing Test Specifications After stating the purpose of a test, clear delineation of the content domain to be measured is necessary. The importance of clearly defining the content domain to be tested is a critical validity issue for all educational tests (Ebel, 1977; Sireci, 1998b). As Millman and Greene (1989) described "It is in the specification of test content that the test developer translates a general test label, such as `math computation...' into a specific blueprint of actual test content" (p. 342). For all ACCUPLACER tests, committees of "subject matter experts," consisting primarily of college and university faculty, were convened to create detailed content specifications. The content specifications for all ACCUPLACER tests are presented in a subsequent section of this manual. Item Writing The quality of a test is only as good as its items. Many of the items in the original ACCUPLACER item pools came from the New Jersey College Basic Skills Tests (NJCBST). These items were written by curriculum experts and underwent several rounds of revision and review. Statistical information was available for these items. Only the best items from content and statistical perspectives were selected for ACCUPLACER item pools. The other items in the original pools and the items from the newer ACCUPLACER tests, such as the LOEP tests, were written by curriculum experts specifically for ACCUPLACER. Separate groups of item writers were contracted for each subject area. These experts were familiarized with the purpose of ACCUPLACER, with its test specifications, and the populations of students who take these tests. They were also instructed in the process of item construction and provided with guidelines for constructing quality test items (e.g., Haladyna & Downing, 1989).

ACCUPLACER

A-6

THE COLLEGE BOARD

Appendix A Item Review Internal content experts at ETS (the contractor for item development) reviewed all test items at least once, with most items being reviewed twice. Subsequent to the internal review, external committees of content specialists reviewed the items that survived the internal reviews. These reviews focused on the technical accuracy of the items and the fit of the items to their intended content specifications. The clarity of the items and definitiveness of the correct answer were also reviewed by these specialists. Following the content reviews, all items also underwent sensitivity reviews to ensure that they did not contain material that could be construed as offensive, derogatory, or unnecessarily controversial. Each item was reviewed for cultural and linguistic bias to ensure that no inappropriate or offensive material was used. (Each item was also statistically analyzed for potential bias, as described in a subsequent section.) The fundamental tenet underlying sensitivity review is that test material should not (a) be offensive to any groups of test takers, (b) advantage any group of test takers, or (c) disadvantage any group of test takers (Ramsay, 1993; Sireci & Mullane, 1994). Thus, material that is not central to the content domain tested but that may be more familiar to some groups than to others (e.g., items about baseball on a math test), is typically flagged in a sensitivity review. The sensitivity reviews were coordinated by ETS, which has developed and refined sensitivity review criteria for over two decades. Field-Testing Items After content and sensitivity reviews, items are considered to be appropriate for administering to examinees. However, expert review cannot catch all potential problems in all items and so items are "field-tested" to evaluate their statistical properties before they become operational. The statistics gathered on items include the proportion of examinees who answered the item correctly (called the difficulty index), the proportions of examinees who responded to each incorrect response option, and an index of how well the item discriminates between low-scoring and high-scoring examinees (the discrimination index). Field-tests are prerequisite for items on a computerized-adaptive test like ACCUPLACER, because estimates of item difficulty and discrimination are required for the item selection algorithm. At the field-test stage, the field-test items are administered to examinees, but they do not count toward their scores. Instead, item statistics are calculated to see if the items are functioning as intended. Only those items whose difficulty and discrimination statistics are considered reasonable are selected for inclusion in an ACCUPLACER item pool. For example, if a large proportion of high-scoring examinees answered a particular field-test item incorrectly, either the item would be revised or discarded. Item Calibration and Selection The psychometric model that underlies computerized-adaptive testing is item response theory (IRT). IRT posits several mathematical models that characterize items and examinees on a common scale. There are several attractive features of IRT, including the ability to provide scores on a common scale for examinees who take different items. The IRT model used for ACCUPLACER is the three-parameter logistic (3PL) IRT model (Lord & Novick, 1968), which expresses the probability of a correct response to an item as a function of an examinee's latent

ACCUPLACER

A-7

THE COLLEGE BOARD

Appendix A proficiency (denoted ) and three characteristics of an item called item parameters: the item discrimination parameter, the item difficulty parameter, and the pseudo-chance or lower asymptote parameter. Obtaining estimates of these three parameters for an ACCUPLACER item is known as item calibration. After field-test data are collected, all items are calibrated onto the operational ACCUPLACER scale. The parameters for each field-test item are then evaluated so that the best items are selected for use in an ACCUPLACER item pool. For an item to be selected for an ACCUPLACER item pool, it must demonstrate the ability to distinguish between high proficiency and low proficiency examinees. A brief description of the IRT model used in calibrating ACCUPLACER items is provided in the scaling section. Item Selection Algorithm The adaptive nature of a computerized-adaptive test stems from the procedure used to select the items to be administered to an examinee. This procedure is referred to as the item selection algorithm. A key goal of the algorithm is to match item difficulty to examinee proficiency (see Figure 1). The ACCUPLACER item selection algorithm was developed by Martha Stocking at ETS, who is a leader in the field of computerized adaptive testing. At the beginning of an ACCUPLACER test, an item of "moderate" difficulty is administered to the examinee. After each response to an item, the proficiency estimate for the examinee is updated. In addition to matching item difficulty to examinee proficiency the ACCUPLACER item selection algorithm controls for content representation, which ensures that the content specifications of the test are adhered to for each examinee (Swanson & Stocking, 1993). Thus, although different examinees will get different questions, the proportion of items from each content area on the test will be roughly the same for all examinees. Scaling As is par for the course in computerized-adaptive testing, different examinees take different sets of items. IRT is used to put the scores from these different test administrations on a common scale. There are many IRT models available. ACCUPLACER uses the threeparameter logistic (3PL) IRT model (Lord & Novick, 1968), which was explicitly designed for multiple-choice items like those used on ACCUPLACER tests. The 3PL model expresses the probability of a correct response to an item as a function of an examinee's latent ability the item (denoted ) and three characteristics of an item called item parameters: discrimination parameter, the item difficulty parameter, and the pseudo-chance or lower asymptote parameter. The formula for the 3PL model is

Pi ( ) = ci + (1 - ci )

e1.7ai ( -bi ) . 1 + e1.7ai ( -bi )

2-1

where Pi ( ) is the probability that a randomly chosen examinee with ability will answer item i correctly, bi is the item difficulty parameter, a i is the item discrimination parameter, and ci is the pseudo-chance parameter (Hambleton, Swaminathan, & Rogers, 1991). The item difficulty parameter indicates the location on the latent proficiency scale where an examinee of ability has a .50 probability of answering the item correctly. The larger the value of bi , the more difficult the item. The discrimination parameter is proportional to the slope of the monotonically increasing function that relates the probability of a correct response on the item to . The larger the a-parameter for an item, the better the item distinguishes between examinees of low

ACCUPLACER

A-8

THE COLLEGE BOARD

Appendix A and high ability. The pseudo-chance parameter represents the probability that examinees of very low proficiency will answer the item correctly--perhaps by guessing, which is why this parameter is sometimes called the "guessing" parameter. The larger the c-parameter, the more likely it is that an examinee of low proficiency can answer the item correctly. The details of IRT are beyond the scope of this manual, but several excellent textbooks on IRT are available. Interested readers are referred to Hambleton and Swaminathan (1985), Hambleton, Swaminathan, and Rogers (1991), Lord (1980), and Lord & Novick (1968). Excellent discussions of IRT within the context of computerized-adaptive testing can be found in Wainer (2000). After completing an ACCUPLACER test, an estimate of is provided for each examinee. The metric for this latent proficiency scale is arbitrary, and so it is typically taken to have a mean of zero and standard deviation of one. To make the ACCUPLACER scores as useful as possible, a 0-to-120 scale was created. This scale represents a transformation from the scale to a scale that describes the number of test items the examinee probably would have answered correctly, if he or she responded to 120 ACCUPLACER items. Further information about the scores reported on ACCUPLACER tests can be found in the chapter Understanding and Interpreting ACCUPLACER Scores.

Historical Overview of ACCUPLACER Test Forms Development

ACCUPLACER Core Tests When ACCUPLACER Tests were first introduced, the battery consisted of only four tests: Reading Comprehension, Sentence Skills, Arithmetic, and Elementary Algebra. The College Level Mathematics Test, the LOEP Tests, and WritePlacer Plus were introduced later. This core set of examinations was first introduced in 1985 and was introduced to assist in placing entering college students in English and Mathematics courses. The scaling of the original ACCUPLACER tests began with items from five forms (Forms A-E) of the New Jersey College Basic Skills Placement Tests (NJCBSPT). There was one version of each of Forms A, B. C, and D; for Form E, there were 8 variants differing only in the field-test section of five items. For each test, all of the items from the five forms, including fieldtest items, were calibrated together in a single LOGIST calibration run by making use of common items between the forms. Spaced samples from past administrations of the NJCBSPT were used for the calibration run. Each test was calibrated separately. The algebra calibration included only individuals who had taken an algebra course. The specifications developed for the Reading Comprehension Test included Reading Comprehension items drawn from the NJCBSPT reading comprehension test plus sentence relationships items drawn from another test then a part of the NJCBSPT, called Logical Relationships. Relevant items from both NJCBSPT tests were included in the IRT calibration done for reading comprehension. The total numbers of items and total sample sizes for each calibration are presented in Table 2-1.

ACCUPLACER

A-9

THE COLLEGE BOARD

Appendix A Table 2-1: Item Calibration Test Reading Comprehension Sentence Sense Math Computation Elementary Algebra Number Of Items 127 127 116 116 Number Of Examinees 12,522 29,974 29,974 27,977

It was determined that additional items were needed in the computer adaptive testing pools to meet test specifications. As a result, 20 NJCBSPT items that were already calibrated were selected as common items for each test. These items were administered along with newly written items in paper-and-pencil based pretesting. There were three pretest forms, each including the 20 common items, for each of three of the tests; for Reading Comprehension there were eight field-test forms. The numbers of new items pre-tested were: Reading Comprehension, 200; Sentence Skills, 135; Arithmetic Skills, 105; Elementary Algebra, 78. For each test, forms were specified at institutions that had specified which tests they would pretest. Participation in the pretesting was solicited in the spring of 1983 by mailing to selected high schools chosen, by spaced sampling, from a national list used by the ETS pretest office. The mailing included both accredited and non-accredited high schools. Of 233 high schools that agreed to participate, 199 actually provided data. From the high schools, usable test booklets (based on analysis of answers to background question) were received as follows: Reading Comprehension, 10,686; Sentence Skills, 10,797; Arithmetic Skills, 12,173; Elementary Algebra, 10,176. Colleges were also invited to participate in pretesting. Participation was solicited by mail with the targeted gap being students entering in the fall of 1983. About 600 colleges were contacted initially; 86 actually provided data (this list is also available upon request). For the college data, numbers of students included in the calibrations are reported in Table 2-2. Table 2-2: Numbers of Postsecondary Students Participating In Initial Pretest Test Reading Comprehension Sentence Skills Arithmetic Skills Elementary Algebra Two-year 780 1,119 1,022 1,010 Four-year 751 1,099 728 633

Pretest forms were initially subjected to standard item analysis procedures; some items were discarded at this point. All the high school and college students were included in the samples for the calibrations. New items were calibrated using LOGIST and placed on the scale of original LOGIST runs by a characteristic curve transformation method applied to the common items. Final pools for the four modules of 120 questions each were then selected based on information from the IRT calibrations and the need to meet test specifications.

ACCUPLACER

A-10

THE COLLEGE BOARD

Appendix A College-Level Mathematics The College Level Mathematics Test (CLM) was introduced in 1990. The CLM test was first introduced to assist in the placement of student who demonstrated the ability above what was measured in the Elementary Algebra exam. The scaling of this test began with an item pool from three existing College Board Mathematics tests: College Algebra, College Trigonometry, and College Algebra/Trigonometry. Over 350 items were available from these three tests. An initial screening of the items by a faculty committee reduced the number of acceptable items to 305. Items were then identified as being appropriate for three different sub areas: Intermediate Algebra, College Algebra, and Precalculus. The Test Development Committee specified the number of items to be administered from each sub-area. Many items were appropriate for more than one sub-area, allowing administration of one item to cover two or three sub-areas. Out of the assumed 20-item adaptive test length, a minimum of 10 to 15 items had to be drawn from the Intermediate Algebra sub-area, a minimum of 11 to 14 items had to be drawn from the College Algebra sub-area, and a minimum of 14 to 16 items from the Precalculus sub-area. Following these specifications, the final 120item pool was selected. Items were selected based both on their statistical properties and on how well they met the test specifications. Levels of English Proficiency (LOEP) Tests The initial impetus for LOEP came from a 1988 nationwide survey conducted by the College Board and Educational Testing Service in which ESL coordinators expressed the need for an ESL placement test for two-year colleges. Discussions with ACCUPLACER users in both two-year and four-year institutions indicated that such a test could also assist in placing a sizable number of students who are native speakers of English with skills below the level for which the other ACCUPLACER verbal tests are intended. Each part of LOEP is an untimed computer-delivered adaptive test. Any combination of the parts may be administered alone or along with other tests of ACCUPLACER. The seamless serial testing branching capabilities of ACCUPLACER tests may be used to direct students to LOEP, based on the student' score on the ACCUPLACER Reading Comprehension test or on s the answer to a background question indicating that English is a second language. The LOEP consists of three components: Sentence Meaning, Language Use, and Reading Skills. Courses into which students could be placed on the basis of LOEP performance include ESL and developmental courses In Reading, Language Arts, and English. In most cases, placement into college-level courses will be based on ACCUPLACER core verbal tests rather than on LOEP, and students for whom English is the best language will be tested first on the core tests before determining that LOEP is appropriate in difficulty for them. Development of LOEP began with the formation of the test development committee, whose nine members represented two-year colleges and secondary schools from seven states, Puerto Rico, and Canada. The committee met first in February, 1992, to determine the number and level of tests needed, to select item formats, and to set general content specifications. Committee members subsequently participated in writing and reviewing items, and met again in February, 1993, to review field-test results and set the final test specifications.

ACCUPLACER

A-11

THE COLLEGE BOARD

Appendix A

Over 200 questions were written for each part of LOEP. Test Committee members, ETS staff, and others with background in ESL and developmental education wrote and reviewed test questions. ETS staff conducted final reviews of items and pretest forms following guidelines used in evaluating all ETS tests, which include reviews to assure that the tests are sensitive to the concerns of, and balanced with respect to, gender and cultural and ethnic groups. Items were assembled into field-test blocks and the blocks, in turn, into field-test forms. Fifty-eight pretest forms were required for each of the three parts of LOEP, with each block of items appearing in two or three of the forms. The forms varied in length from 20 questions for Reading Skill to 33-36 for each of the other two parts of the test. This length was intended to make it possible to introduce and administer a pretest, including a collection of background information, within one class period. Test forms were spiral-packaged so that each student in a class would receive a different form from that given any other in that class to assure that there would not be differences across forms in the proficiency levels of the students tested. Requests for field-test participation were sent to the English and ESL department chairs of all U.S. community colleges, to the English department chair of about 400 less selective U.S. four-year colleges and universities, to contacts at selected non-U.S. North American colleges and universities, to the principals of about 200 U.S. high schools, and to all ACCUPLACER test coordinators. Institutions were asked to test motivated students who would normally be tested for placement into English and reading courses. They made their own decisions about who would be tested and when, in relation to the beginning of instruction, testing would take place. Pretest materials were distributed beginning in August 1992, and continuing through the fall; all pretesting was completed and data were returned by the end of December. Appendix D presents a list of the 159 institutions that provided LOEP Pretest data. U.S. participants included 9 high schools, 113 community colleges, 25 four-year institutions, and 2 non-degree institutions. 10 non-U.S. North American colleges and universities also provided data. Usable data were obtained from approximately 16,000 students. Answers to background questions obtained during the field-test sessions indicated that, for about half the sample, English was the language they knew best. About 17% of those providing the information indicated that their best language was Spanish; 15% knew best an Asian language, with Vietnamese, Chinese, Japanese, Korean, and other Asian languages each being represented by at least several percent of the sample. The sample included smaller numbers of students whose best language was Arabic; Hindi; Urdu or another African language; a Native American language; or French or another European language. Slightly more than half (54%) of the examinees that indicated their gender were female. Ethnic group membership included 40% describing themselves as White or Caucasian; 18% Hispanic (about evenly divided among Mexican, Mexican American, or Chicano; Puerto Rican; and other Hispanic groups); 18% Asian; 15% African, African American, or Black; and 3% American Indian, Native American, or Alaskan Native. Most students had either most or very little of their education in the United States: Somewhat more than 50% indicated having had 10 or more years of education in the U.S., while about 40% had 3 years or fewer In the U.S. Ninety-six percent of the examinees responded, "yes" to the question, "Can you read and write your best language well?"

ACCUPLACER

A-12

THE COLLEGE BOARD

Appendix A Standard item analysis procedures were carried out to identify any questions on the basis of the level of performance or their relations with other questions, appeared to be inappropriate for inclusion in the test. About a dozen items were removed based on these analyses. IRT calibrations were then performed. For each part of LOEP, all students for whom English was a second language, plus the lowest-scoring 5-10% of those for whom it was the best language, were included in the calibrations. This sampling was done so that the results would be representative of the population the test is intended to serve, eliminating a sizable number of native English speakers whose skills were above the range for which the test yields discriminating measurement. The IRT item parameters resulting from these calibrations, taken in conjunction with the test specifications, formed the basis for constructing an adaptive test for each of the three parts of LOEP. Computerized simulation studies were carried out to investigate the characteristics of tests with different lengths and to explore variations where the test specifications provided flexibility for example, to determine for each part what number of female-reference and malereference items should be administered to best achieve the specification that approximately equal numbers of female-relevant and male-relevant items should be given. WritePlacer Plus WritePlacer Plus is a direct measure of student writing skills first introduced in 2000. Examinees are asked to provide a writing sample in response to a specific prompt provided. Essays are scored using an automated scoring mechanism that measures writing skills at the level expected of an entering college levels student. The writing skills considered important for a committee of college faculty and other educators defined entry-level college students. These skills were then reviewed and further validated by several hundred-faculty members. Writing prompts were developed to measure the writing skills identified. These writing prompts were field tested with a sample of entry-level students at several colleges and universities. The prompts were evaluated based on the field test results and a final set of prompts was selected for use in the WritePlacer Plus program.

Test Maintenance

The College Board has a program that continually monitors the technical quality of the ACCUPLACER examinations. This program includes: 1) a validity studies service for schools interested in investigating the validity of placement decisions using the ACCUPLACER exams, 2) an external audit of the program designed to identify policies and procedures that should be implemented, 3) continual feedback from test center supervisor and examinees, and 4) a program of item pool maintenance. Item pool maintenance is a two-step program. First, items are continually monitored to ensure their technical quality. Items are reviewed through feedback from test supervisors and through monitoring the statistical performance of the items. If either of these steps identifies a problem, items are removed from the item pool. The second component of the item maintenance plan is for occasional "refreshment" of the item pool. "Refreshing" item pools refers to the process of adding new, carefully fieldtested, items to the existing operational item pool. The refreshing of ACCUPLACER item pools

ACCUPLACER

A-13

THE COLLEGE BOARD

Appendix A began 1994 with a review of all items for each of the existing tests. Two test development committees met to review the test specifications and create new items for the Reading Comprehension, Sentence Skills, Arithmetic, and Elementary Algebra tests. Test specifications for each of the tests were revised to align with national standards, such as those of the National Council of the Teachers of Mathematics (NCTM) and with present practices of instruction. Committee members and test development specialists at Educational Testing Service (ETS) wrote over 300 items for each of the four content areas. At subsequent meetings, the Test Development Committee reviewed all of the proposed items. In addition to reviewing the items for content validity and technical accuracy, the items were reviewed to assure they continued to meet The College Board's standards for currency, sensitivity, and bias-free language. The Sensitivity Review Guidelines developed by ETS were applied to the all items. After items have been completed, they are pretested to ensure they meet the technical quality requirements for ACCUPLACER items. After enough candidates have taken the pretest items, item statistics are compiled and items are calibrated. Calibration refers to the process of getting item statistics for every item. Once items are calibrated, they can be placed into the examinations as new items. "Refresher" items for the Arithmetic and Elementary Algebra exams have been calibrated and are scheduled for placement into the exam in the Spring of 1993. "Refresher" items for the Sentence Skills and Reading Comprehension exams are currently being pretested and are scheduled for placement into the examinations in the Fall of 2003.

ACCUPLACER

A-14

THE COLLEGE BOARD

Appendix A

Chapter 3: Understanding and Interpreting ACCUPLACER Scores

To make ACCUPLACER results as useful as possible, three different measures of student performance are provided: the ACCUPLACER scaled score, which is called the Total Right Score, the Range, and the Percentile Rank. The Total Right Score is the score that should be used for making placement decisions because this score provides a criterionreferenced interpretation of students' performance. As explained below, this score represents a student's performance with respect to the entire pool of original ACCUPLACER items. The Range uses an estimate of test score error to put a lower and upper bound around the Total Right Score. The Percentile Rank score can be used to compare a student's score to a national group of students who took the test. Although the Percentile Rank should not be used to make placement decisions, in some situations it may be helpful to gauge a student's performance to national comparison group. The Total Right Score and Range are described in this chapter. The Percentile Rank is briefly described here, and tables based on the Percentile Rank scores can be found in Appendix A. This chapter also provides a brief description of the proficiency statements derived for each ACCUPLACER test. These statements were developed to describe specific levels of knowledge and skills that are likely to be possessed by students at specific points along the Total Right score scale. These statements should be useful to institutions for determining the specific cut scores to be used for placement decisions. The proficiency statements for each subject test are provided in Chapters 4 through 9.

ACCUPLACER Score Scale: Total Right Score

Scores for ACCUPLACER tests are reported on a scale ranging from 0 to 120. The 120-point scale was chosen to reflect the 120 items that were in the original item pools for each of the tests. Although additional items have been added to and deleted from the pools, making their total numbers between 117 - 300 items per pool, the original 120-point scale was retained for reasons of consistency. This scaled score is reported as the Total Right Score. This score thus provides an absolute measure (i.e., a criterion-referenced measure) of the student's skills, independent of the distribution of skills among all test takers. It is recommended that schools use this score for making student placement decisions, in computing summary statistics, in correlating test performance with other information in a student's records, and in other statistical treatments of the test data. The Total Right Score is calculated directly from the IRT model used to calibrate items and compute scores for students. Given the proficiency estimate ( ) for an examinee, the probability that an examinee will correctly answer each of the original 120 items can be estimated (since the item parameters are known; Lord & Novick, 1968). That is, given an examinee's and the 3PL parameters for all 120 items in the original pool (i through n), the scaled score can be computed by

TotalRightScore =

n i =1

Pi ( x = 1 | )

3-1

ACCUPLACER

A-15

THE COLLEGE BOARD

Appendix A

The Percentile Rank

The Percentile Rank indicates student performance in relation to a population of test takers in a specific year. Thus, it describes a student' test performance in relation to the s performance of a particular group of students that serves as a convenient reference point for future students who take the test. The Percentile Rank score indicates the percentage of students in the reference group who scored equal to or below the score attained by a student. For example, if Pablo earned a Percentile Rank of 68, that means he scored as good as or better than 68% of the students in the reference group. Total Right Score to Percentile Rank Score conversion tables for ACCUPLACER tests are presented in Appendix A. The Percentile Rank scores may be useful to some colleges for tracking their students' performance (as a group) over time. However, for the purposes of placing individual students, the Total Right Score should be used.

Information for Making Placement Decisions

Proficiency Statements ACCUPLACER Tests are designed to assist institutions in placing students into appropriate courses. Given that institutions differ greatly with respect to composition of the student body, faculty, and course content, it is not possible to stipulate specific test cut scores that should be used for placement decisions. Instead, each institution should establish their own cut scores to facilitate placement decisions based on factors and data unique to their institution. To help institutions establish these cut scores, the College Board developed "proficiency statements" that describe the knowledge and skills associated with specific ACCUPLACER Total Right scores. These statements were derived by convening a panel of experts in each subject area to review items anchored at specific points along the Total Right score scale and to describe the knowledge and skills that are required to answer these items correctly. The Proficiency Statements for each test are provided in chapters 4 through 9. These statements provide useful information for understanding students' skill levels. Wherever possible, actual placement decisions should include other variables that may contribute to an accurate assessment of a student's ability such as high school grades, background information, etc. Cut Score Surveys Given that each institution is encouraged to set its own cut scores on ACCUPLACER, it may be helpful for institutions to know the cut scores used by other institutions. The College Board surveyed ACCUPLACER users to obtain information about the cut scores used at their institutions. These surveys are shown in Appendix E.

ACCUPLACER

A-16

THE COLLEGE BOARD

Appendix A

Chapter 4: Reading Comprehension

The Reading Comprehension test measures a student's ability to understand what he or she has read. There are five content areas on this test: (a) Identifying Main Ideas, (b) Direct Statements/Secondary Ideas, (c) Inferences, (d) Applications, and (e) Sentence Relationships. The approximate distribution of these items is listed in Table 4-1. Examinees are presented with a series of 20 questions of two primary types. The first type consists of a reading passage followed by a question based on the text. Both short and long narratives are provided. The reading passages can also be classified according to the kind of information processing required, including explicit statements related to the main idea, explicit statements related to a secondary idea, application, and inference. The second type of question, sentence relationships, presents two sentences followed by a question about the relationship between these two sentences. The question may ask, for example, if the statement in the second sentence supports that in the first, if it contradicts it, or if it repeats the same information. Table 4-1: Content Area Specifications for Reading Comprehension Test Reading Comprehension Content Areas Identifying Main Ideas Direct Statements/Secondary Ideas Inferences Applications Sentence Relationships Approximate Percentage of Test 12-25 12-40 12-40 12-25 24-29

Examples of the two types of questions that appear on the Reading Comprehension test appear below. Sample Reading Comprehension Test Passage-Based Item There are two types of pottery that I do. There is production pottery---mugs, tableware--- the kinds of things that sell easily. These pay for my time to do the other work, which is more creative and satisfies my needs as an artist. The author of the passage implies that (a) artists have a tendency to waste valuable time *(b) creativity and mass-production are incompatible (c) most people do not appreciate good art (d) pottery is not produced by creative artists

ACCUPLACER

A-17

THE COLLEGE BOARD

Appendix A Sample Reading Comprehension Sentence Relationship Item The Midwest is experiencing its worst drought in fifteen years. Corn and soybean prices are expected to be very high this year. What does the second sentence do? (a) It restates the idea found in the first. *(b) It states an effect. (c) It gives an example. (d) It analyzes the statement made in the first. Both reading passages and sentence relationship questions are varied according to content categories to help prevent bias because of a student' particular knowledge. These s categories include social sciences, natural and physical sciences, human relations and practical affairs, and the arts. In the Reading Comprehension test, for example, each student will receive four long reading passages, eight to nine questions based on short passages, and four to five questions involving sentence relationships.

Proficiency Statements for the Reading Comprehension Test

As described in Chapter 3, Proficiency Statements describe the knowledge and skills associated with specific ACCUPLACER Total Right scores. These statements provide useful information for understanding students' skill levels. The Proficiency Statements for the Reading Comprehension Tests are: Total Right Score of about 51: Students at this level are able to comprehend short passages that are characterized by uncomplicated ideas, straightforward presentation, and for the most part, subject matter that reflect everyday experience. These students are able to: · recognize the main idea and less central ideas · recognize the tone of the passage when questions do not require fine distinctions · recognize relationships between sentences, such as the use of one sentence to illustrate another Total Right Score of about 80: Students at this level are able to comprehend short passages that are characterized by moderately uncomplicated ideas and organization. These students are able to: · answer questions that require them to synthesize information, including gauging point of view and intended audience · recognize organizing principles in a paragraph or passage · identify contradictory or contrasting statements Total Right Score of about 103 or higher: Students at this level are able to comprehend passages that, although short, are somewhat complex in terms of the ideas conveyed, and that deal with academic subject matter, often in a theoretical framework. These students are able to: · extract points that are merely implied · follow moderately complex arguments or speculations · recognize tone · analyze the logic employed by the author in making an argument ACCUPLACER A-18 THE COLLEGE BOARD

Appendix A

Chapter 5: Sentence Skills Test

The Sentence Skills Test measures students' understanding of sentence structure --how sentences are put together and what makes a sentence complete and clear. There are three content areas measured on this test: (a) Recognizing Complete Sentences, (b) Coordination/Subordination, and (c) Clear Sentence Logic. The approximate distribution of these items is listed in Table 5-1. Each student receives 20 Sentence Skills items of two types. The first type is sentence correction questions, which require an understanding of sentence structure. These questions ask students to choose the most appropriate word or phrase to substitute for the underlined portion of the sentence. The second type is construction shift questions. These ask that a sentence be rewritten according to the criteria shown while maintaining essentially the same meaning as the original sentence. Within these two primary categories, the questions are also classified according to the skills being tested. Some questions deal with the logic of the sentence, others with whether or not the answer is a complete sentence, and still others with the relationship between coordination and subordination. In a manner similar to Reading Comprehension questions, these questions are varied according to categories to prevent bias because of a student' particular knowledge. s These categories include social sciences, natural and physical sciences, human relations and practical affairs, and the arts. Table 5-1: Content Area Specifications for Sentence Skills Test Sentence Skills Content Area Recognizing Complete Sentences Coordination/Subordination Clear Sentence Logic Approximate Percentage of Test 30-40 30-40 30-40

An example of a Sentence Skills item appears below. The correct answer is denoted with an asterisk. Sample Sentence Skills Item Ms. Rose planning to teach a course in biology next summer. (a) planning (b) are planning (c) with a plan *(d) plans

ACCUPLACER

A-19

THE COLLEGE BOARD

Appendix A

Proficiency Statements for the Sentence Skills Test

As described in Chapter 3, Proficiency Statements describe the knowledge and skills associated with specific ACCUPLACER Total Right scores. These statements provide useful information for understanding students' skill levels. The Proficiency Statements for the Sentence Skills Test are: Total Right Score of about 53: Students at this level can: · solve problems in simple subordination and coordination · correct sentence fragments Total Right Score of about 86: Students at this level can: · solve problems of faulty coordination and subordination in a sentence with one or two clauses · manipulate complex verb tenses · correct misplaced modifiers · solve problems that combine grammar and logic Total Right Score of about 110 or above: Students at this level can: · manipulate complex sentences with two or more subordinate clauses · correct problems of syntax and repetitive diction · recognize correct and incorrect linkages of clauses, including problems involving semicolons

ACCUPLACER

A-20

THE COLLEGE BOARD

Appendix A

Chapter 6: Arithmetic Test

The Arithmetic test measures students' ability to perform basic arithmetic operations and to solve problems that involve fundamental arithmetic concepts. There are three content areas measured on this test: (a) Whole Numbers and Fractions, (b) Decimals and Percents, and (c) Applications. The approximate distribution of these items is listed in Table 6-1. Each examinee is administered 17 items. The Whole Numbers and Fractions area includes addition, subtraction, multiplication, division, recognizing equivalent fractions and mixed numbers, and estimating. The Decimals and Percents area includes addition, subtraction, multiplication, and division with decimals. Percent problems, recognition of decimals, fraction and percent equivalencies, and estimating problems are also given. The Applications and Problem Solving area includes rate, percent, and measurement problems, simple geometry problems, and distribution of a quantity into its fractional parts. Questions from all three categories are always presented to the student although the number of questions from each category varies with the student's skill level. For example, if the student' responses show minimal arithmetic skills, presenting too many applications problems s is pointless. On the other hand, a student exhibiting good skills with whole numbers and fractions will be presented with more of these types of problems. Thus, the proportion of questions in the various categories will automatically vary according to the student' responses. s

Table 6-1: Content Area Specifications for Arithmetic Test

Arithmetic Content Area Whole Numbers and Fractions Decimals and Percents Applications Approximate Percentage of Test 31-44 31-38 25-31

An example of an Arithmetic item appears below. The correct answer is denoted with an asterisk. Sample Arithmetic Test Item Solve the following problem. You may use the paper you have been given for scratch work. 10 + 3 = (a) 7 *(b) 13 (c) 30 (d) 40

ACCUPLACER

A-21

THE COLLEGE BOARD

Appendix A

Proficiency Statements for the Arithmetic Test

As described in Chapter 3, Proficiency Statements describe the knowledge and skills associated with specific ACCUPLACER Total Right scores. These statements provide useful information for understanding students' skill levels. The Proficiency Statements for the Arithmetic Test are: Total Right Score of about 31: Students at this level have minimal arithmetic skills. These students can: · perform simple operations with whole numbers and decimals (addition, subtraction, and multiplication) · calculate an average, given integer values · solve simple word problems · identify data represented by simple graphs Total Right Score of about 57: Students at this level have basic arithmetic skills. These students can: · perform the basic arithmetic operations of addition, subtraction, multiplication, and division using whole numbers, fractions, decimals, and mixed numbers · make conversions among fractions, decimals, and percents Total right score of about 90: Students at this level have adequate arithmetic skills. These students can: · estimate products and squares of decimals and square roots of whole numbers and decimals · solve simple percent problems of the form p% of q = ? and ?% of q = r · divide whole numbers by decimals and fractions · solve simple word problems involving fractions, ratio, percent increase and decrease, and area Total right score of about 112: Students at this level have substantial arithmetic skills. These students can: · find equivalent forms of fractions · estimate computations involving fractions · solve simple percent problems of the form p% of ? = r · solve word problems involving the manipulation of units of measurement · solve complex word problems involving percent, average, and proportional reasoning · find the square root of decimal numbers · solve simple number sentences involving a variable

ACCUPLACER

A-22

THE COLLEGE BOARD

Appendix A

Chapter 7: Elementary Algebra Test

The Elementary Algebra test measures students' ability to perform basic algebraic operations and to solve problems that involve elementary algebraic concepts. Three content areas are measured on this test: (a) Integers and Rationals, (b) Algebraic Expressions, and (c) Equations, Inequalities, and Word Problems. The approximate distribution of these items is listed in Table 7-1. Students are administered 12 questions on this test. The Integers and Rational Numbers area includes computation with integers and negative rationals, the use of absolute values, and ordering. These questions test minimal skill levels of the student. The Algebraic Expressions content area tests minimal skill levels using evaluation of simple formulas and expressions, and adding and subtracting monomials and polynomials. At all levels of skill, questions are provided involving multiplying and dividing monomials and polynomials, the evaluation of positive rational roots and exponents, simplifying algebraic fractions, and factoring. The Equations content area involves the solution of equations, inequalities, and word problems. As with the Arithmetic Test, few questions from this category are presented to the student unless he or she shows a high enough skill level. When a high degree of competence is indicated, questions from this category include solving linear equations and inequalities, the solution of quadratic equations by factoring, solving verbal problems presented in an algebraic context, including geometric reasoning and graphing, and the translation of written phrases into algebraic expressions.

Table 7-1: Content Area Specifications for Elementary Algebra Test

Elementary Algebra Content Area Integers and Rationals Algebraic Expressions Equations, Inequalities, and Word Problems An example of an Elementary Algebra item appears below. denoted with an asterisk. Sample Elementary Algebra Test Item Solve the following problem. You may use the paper you have been given for scratch work. 2x + 3x + y = (a) *(b) (c) (d) 6xy 5x + y 5(x + y) 6x + y Approximate Percentage of Test 8-17 42-67 17-50 The correct answer is

ACCUPLACER

A-23

THE COLLEGE BOARD

Appendix A

Proficiency Statements for the Elementary Algebra Test

As described in Chapter 3, Proficiency Statements describe the knowledge and skills associated with specific ACCUPLACER Total Right scores. These statements provide useful information for understanding students' skill levels. The Proficiency Statements for the Elementary Algebra Tests are: Total right score of about 25: Students at this level have minimal pre-algebra skills. These students demonstrate: · a sense of order relationships and the relative size of signed numbers · the ability to multiply a whole number by a binomial Total right score of about 57: Students scoring at this level have minimal elementary algebra skills. These students can: · perform operations with signed numbers · combine like terms · multiply binomials · evaluate algebraic expressions Total right score of about 76: Students at this level have sufficient elementary algebra skills. At this level, the skills that were beginning to emerge at a Total Right Score of 57 have been developed. Students at this level can: · add radicals, add algebraic fractions, and evaluate algebraic expressions · factor quadratic expressions in the form ax2 + bx + c, where a = 1 · factor the difference of squares · square binomials · solve linear equations with integer coefficients Total right score of about 108: Students at this level have substantial elementary algebra skills. These students can: · simplify algebraic expressions · factor quadratic expressions where a = 1 · solve quadratic equations · solve linear equations with fractional and literal coefficients and linear inequalities with integer coefficients · solve systems of equations · identify graphical properties of equations and inequalities

ACCUPLACER

A-24

THE COLLEGE BOARD

Appendix A

Chapter 8: College-Level Mathematics Test

The College Level Mathematics test measures students' ability to solve problems that involve college-level mathematics concepts. There are six content areas measured on this test: (a) Algebraic Operations, (b) Solutions of Equations and Inequalities, (c) Coordinate Geometry, (d) Functions, (e) Trigonometry, and (f) Applications and other Topics. The approximate distribution of these items is listed in Table 8-1. The Algebraic Operations content area includes simplification of rational algebraic expressions, factoring and expanding polynomials, and manipulating roots and exponents. The Solutions of Equations and Inequalities content area includes the solution of linear and quadratic equations and inequalities, systems of equations, and other algebraic equations. The Coordinate Geometry area presents questions involving plane geometry, the coordinate plane, straight lines, conics, sets of points in the plane, and graphs of algebraic functions. The Functions content area includes questions involving polynomial, algebraic, exponential, and logarithmic functions. The Trigonometry area includes trigonometric functions. The Applications and other Algebra Topics area contains complex numbers, series and sequences, determinants, permutations and combinations, factorials, and word problems. A total of 20 questions are administered on this test. This test assesses proficiency in intermediate algebra through Precalculus. Therefore, it enables institutions to place students into intermediate algebra, college algebra, Precalculus, and introductory calculus courses. Table 8-1: Content Area Specifications for College Level Math Test College Level Math Content Area Algebraic Operations Solutions of Equations and Inequalities Coordinate Geometry Functions Trigonometry Applications and other Topics Approximate Percentage of Test 20 15 15 20 20 10

An example of a College Level Math test item appears below. The correct answer is denoted with an asterisk. Sample College Level Math Test Item Solve the following problem. You may use the paper you have been given for scratch work. If the 1st and 3rd terms of a geometric sequence are 3 and 27, respectively, then the 2nd term could be (a) *(b) (c) (d) (e) 6 9 12 15 18

ACCUPLACER

A-25

THE COLLEGE BOARD

Appendix A

Proficiency Statements for College-Level Math Test

As described in Chapter 3, Proficiency Statements describe the knowledge and skills associated with specific ACCUPLACER Total Right scores. These statements provide useful information for understanding students' skill levels. The Proficiency Statements for the CollegeLevel Math Test are: Total Right Score of about 40 or less: These students should take the Elementary Algebra test before any placement decisions are finalized. Total Right Score of about 40: Students scoring at this level can: · identify common factors · factor binomials and trinomials · manipulate factors to simplify complex fractions. These students should be considered for placement into intermediate algebra. For further guidance in placement, have these students take the Elementary Algebra test. Total Right Score of about 63: Students scoring at this level can demonstrate the following additional skills: · work with algebraic expressions involving real number exponents · factor polynomial expressions · simplify and perform arithmetic operations with rational expressions, including complex fractions · solve and graph linear equations and inequalities · solve absolute value equations · solve quadratic equations by factoring · graph simple parabolas · understand function notation, such as determining the value of a function for a specific number in the domain · a limited understanding of the concept of function on a more sophisticated level, such as determining the value of the composition of two functions · a rudimentary understanding of coordinate geometry and trigonometry These students should be considered for placement into college algebra or a credit-bearing course immediately preceding calculus. Total Right Score of about 86: Students scoring at this level can demonstrate the following additional skills: · understand polynomial functions · evaluate and simplify expressions involving functional notation, including composition of functions · solve simple equations involving trigonometric functions, logarithmic functions, and exponential functions These students can be considered for a Pre-calculus course or a non-rigorous course in beginning calculus.

ACCUPLACER

A-26

THE COLLEGE BOARD

Appendix A Total Right Score of about 103 or above: Students scoring at this level can demonstrate the following additional skills: · perform algebraic operations and solve equations with complex numbers · understand the relationship between exponents and logarithms and the rules that govern the manipulation of logarithms and exponents · understand trigonometric functions and their inverses · solve trigonometric equations · manipulate trigonometric identities · solve right-triangle problems · recognize graphic properties of functions such as absolute value, quadratic, and logarithmic These students should be considered for placement into calculus.

ACCUPLACER

A-27

THE COLLEGE BOARD

Appendix A

Chapter 9: Levels of English Proficiency Tests

The Levels of English Proficiency (LOEP) assessment is a computerized-adaptive battery intended for use in placing students of low English proficiency in the appropriate language courses. LOEP was developed primarily as a response to an expressed need for assessment of the skills of English as a Second Language (ESL) student who would not perform well on the ACCUPLACER Sentence Skills and Reading Comprehension tests. It is also appropriate for students for whom English is the best language but whose skills are at a level for which these core verbal tests are too difficult. LOEP consists of three components: Sentence Meaning, Language Use, and Reading Skills. Each component requires students to answer 20 items. Courses into which students could be placed on the basis of LOEP performance include ESL and developmental courses in Reading, Language Arts, and English. In most cases, placement into college-level courses will be based on the ACCUPLACER core verbal tests rather than on LOEP, and students for whom English is the best language will be tested first on the core tests before determining that LOEP is appropriate in difficulty for them. ACCUPLACER seamless serial testing branching capabilities may be used to direct students to LOEP, based on the student' score on the s Reading Comprehension test or on the answer to a background question indicating that English is a second language. The content of each LOEP component is described below.

Content of LOEP Reading Skills Test

The LOEP Reading Skills test measures students' ability to read English. Specifically, it assesses students' comprehension of short passages. It contains brief passages of 50 words or less and moderate length passages of 50 to 90 words. Reading passages are drawn from five content areas: (a) Arts/Humanities, (b) History/Social Science, (c) Practical Situations Narrative, (d) Psychology/Human Relations, and (e) Science. The approximate distribution of these items is listed in Table 9-1. Half of this subtest contains straightforward comprehension items (paraphrase, locating information, vocabulary on a phrase level, and pronoun reference). The other half assesses inference skills (main idea, fact vs. opinion, cause/effect logic, identifying irrelevant information, author' point of view, and applying the author' logic to s s another situation). Table 9-1: Content Area Specifications for LOEP Reading Skills Test LOEP Reading Skills Content Area Arts/Humanities History/Social Science Practical Situations Narrative Psychology/Human Relations Science Approximate Percentage of Test 10-15 10-15 10-15 10-15 10-15

An example of a LOEP Reading Skills test item appears below. The correct answer is denoted with an asterisk.

ACCUPLACER

A-28

THE COLLEGE BOARD

Appendix A Sample LOEP Reading Skills Test Item Read the passage, and then choose the best answer to the question. My cousin lived in the city from 1984 to 1989. When did my cousin live in the city? (a) (b) *(c) (d) After 1989 Before 1984 Between 1984 and 1989 Neither in 1984 nor in 1989

Proficiency Statements for the LOEP Reading Skills Test

As described in Chapter 3, Proficiency Statements describe the knowledge and skills associated with specific ACCUPLACER Total Right scores. These statements provide useful information for understanding students' skill levels. The Proficiency Statements for the LOEP Reading Skills Test are: Total Right Score of about 57: Students at this level can demonstrate the following skills: · locate information in a passage by answering literal comprehension questions on even the longest passages, if the question posed and the answer to that question are in the same sentence or in close proximity to each other · answer questions in which the wording in the answer is very similar to the wording in the passage or uses minimal paraphrasing · answer some questions requiring small inferences (including questions asking for the main idea of the passage) if the options do not require fine distinctions · answer questions based on maps and charts Total Right Score of about 82: Students at this level can demonstrate the following additional skills: · answer questions that require drawing conclusions on the basis of the information presented in the passage or making inferences from the information presented · recognize the main idea of a passage even when presented with wrong answer choices mentioned in the passage as supporting information. Total Right Score of about 102: Students at this level can demonstrate the following additional skills: · answer questions that require dealing with a passage as a whole or manipulating the information presented in the passage · make generalizations on the basis of the information in the passage, recognize what was implied, and answer questions about the author' tone and purpose s

ACCUPLACER

A-29

THE COLLEGE BOARD

Appendix A

Content of LOEP Language Use Test

The LOEP Language Use test measures students' proficiency in using correct grammar in English sentences. There are five content areas measured on this test: (a) Nouns, Pronouns, Pronoun Case Structure; (b) Subject-verb Agreement; (c) Comparatives, Adverbs, Adjectives; (d) Verbs; and (e) Subordination/Coordination. The approximate distribution of these items is listed in Table 9-2. Table 9-2: Content Area Specifications for LOEP Language Use Test LOEP Language Use Content Area Nouns, Pronouns, Pronoun Case Structure Subject-verb Agreement Comparatives, Adverbs, Adjectives Verbs Subordination/Coordination Approximate Percentage of Test 10-15 20-25 10-15 10-15 20-25

Items on the LOEP Language Use test come in two formats: completing a sentence by filling in a blank with the word or phrase from the choices given, and choosing a sentence that best combines two discrete sentences that are given. Examples of each item format are provided below. The skills covered are subject-verb agreement, verb tenses, forms of irregular verbs, appropriate verb forms in structures, noun-noun agreement, noun forms, pronouns, modifiers, comparatives, prepositions, connectives, parallelism, and sentence fragments/runons. Sample LOEP Language Use Test Sentence Completion Item The sentence below has a blank space. There are four words or phrases under the sentence. Choose the word or phrase that makes a grammatically correct sentence. They __________ the newspaper every day. (a) (b) *(c) (d) reads is reading read does read

ACCUPLACER

A-30

THE COLLEGE BOARD

Appendix A Sample LOEP Language Use Test Sentence Construction Item Two sentences will be followed by four possible answers. Choose the answer that best combines the two sentences. The answer must be a grammatically correct sentence. It must also express the same thought as the first two sentences. I sang in the choir. I enjoyed it. (a) (b) (c) *(d) In the choir singing, I enjoyed. In the choir to sing, I enjoyed. I enjoyed to sing in the choir. I enjoyed singing in the choir.

Proficiency Statements for the LOEP Language Use Test

Total Right Score of about 55: Students scoring at this level can choose correct grammatical forms when they are controlled by the basic rules of grammar. For example, in simple sentences, they can recognize basic grammatical structures such as subject-verb agreement, pronoun case and form, noun forms (including recognizing subject, case, and number), and verb forms. They can handle questions involving word order, prepositional phrases, and simple clauses Total Right Score of about 82: Students scoring at this level can handle a variety of complex structures such as comparatives at the phrase level such as "so tall that," relative clauses, structures at the clause level such as "not only...but also," simple subordination, function at the whole-sentence level. Total Right Score of about 100: Students scoring at this level can demonstrate the following additional skills: · Recognize irregular verb forms such as "draw/drawn," fairly unusual idioms such as "couldn'get over it," and indirect object structures such as "gave her one" t · Handle questions involving: transformations of declarative sentences into questions, the conditional, and mood parallelism · Choose appropriate structures to state complex ideas, often in complex sentences using subordination or coordination

Content of LOEP Sentence Meaning Test

The LOEP Sentence Meaning test measures how well students understand the meaning of sentences in English. It assesses the understanding of word meanings in one- or twosentence contexts. The sentences are drawn from the content areas of natural science, history/social studies, arts/humanities, psychology/human relations, and practical situations. There are four content areas measured: (a) Particle, Phrasal Verbs, Prepositions of Direction; (b) Adverbs, Adjectives, Connectives Sequence, (c) Basic Nouns and Verbs; and (d) Basic and important Idioms. The approximate distribution of these items is listed in Table 9-3.

ACCUPLACER

A-31

THE COLLEGE BOARD

Appendix A

Table 9-3: Content Area Specifications for LOEP Sentence Meaning Test

LOEP Sentence Meaning Content Area Particle, Phrasal Verbs, Prepositions of Direction Adverbs, Adjectives, Connectives Sequence Basic Nouns, Verbs Basic Idioms Approximate Percentage of Test 10-15 51-25 25-35 5-10

Examples of LOEP Sentence Meaning test items appear below. The correct answers are denoted with an asterisk. Sample LOEP Sentence Meaning Test Items The sentence below has a blank space. Choose the word or phrase that makes the sentence meaningful and correct. Mr. Swoboda grows roses, tomatoes, and carrots. That is why his friends say that he is a good __________. *(a) (b) (c) (d) gardener builder painter florist Sample LOEP Sentence Meaning Test Items One or two sentences will be followed by a question. question. Choose the correct answer to the

The teacher called on Joe in class. What did the teacher do? (a) *(b) (c) (d) Shouted to Joe Asked Joe a question Telephoned Joe Visited Joe

ACCUPLACER

A-32

THE COLLEGE BOARD

Appendix A

Proficiency Statements for the LOEP Sentence Meaning Test

Total Right Score of about 61: Students at this level can demonstrate the following skills: · handle sentences with simple structures characterized by everyday subjects and simple vocabulary, including common nouns, adjectives, and verbs · select the appropriate vocabulary in sentences that provide multiple contextual clues Total Right Score of about 88: Students at this level can demonstrate the following additional skills: · handle vocabulary in sentences that have compound or complex structures, or present more complex situations than the sentences at the 20th percentile level · handle the following kinds of vocabulary: two-word verbs adverbs of comparison more extended idiomatic expressions longer descriptions · select appropriate vocabulary in sentences that provide a single contextual clue Total Right Score of about 106: Students at this level can demonstrate the following additional skills: · handle vocabulary in sentences with complex structures that are characterized by abstract statements or idiomatic expressions · demonstrate knowledge of idioms that are two-word verbs or the use of idioms to express the appropriate meaning · deduce the appropriate vocabulary from an entire sentence rather than from specific contextual clues, often in situations where grammar and vocabulary intersect

ACCUPLACER

A-33

THE COLLEGE BOARD

Appendix A

Chapter 10: Statistical Characteristics of ACCUPLACER Item Banks

The quality of a computerized-adaptive testing system is derived from the quality of its items. After ensuring content quality of the items, the statistical characteristics must also be considered. In this chapter, we provide descriptive information regarding the pools of items from which ACCUPLACER tests are drawn. As stated in the Guidelines for ComputerizedAdaptive Test Development and Use In Education (American Council on Education, 1995) "The quality of the pool of items available during a computerized-adaptive test has strong implications for the utility of the resulting scores. The pool should be of sufficient breadth and depth to support the purposes of the test" (p. 4). Presented in this chapter are summary statistics for the item parameters in each ACCUPLACER item pool as well as histograms of these parameters. The tables and figures presented in this chapter illustrate that the ACCUPLACER item pools have appropriate breadth of item difficulty and adequate levels of item discrimination.

Statistical Characteristics of Reading Comprehension Item Pool

The descriptive statistics for the Reading Comprehension item pool are presented in Table 10-1. The distributions of these item parameters are presented in Figure 10.1. The item difficulty parameters (b-parameters) exhibit a wide spread of difficulty with the majority of the items being in the easy-to-moderate range. Although there are some items with low discrimination parameters (a-parameters), the majority of these items display good discrimination. The pseudo-chance parameters (c-parameters) also exhibit variability, although most of the items are centered on the chance level. For Tables 10-1 to 10-6, the mean for each of the item parameters is represented by a µ, while the standard deviation for each of the item parameters is represented by a . Table 10-1: Summary Statistics for Reading Comprehension Item Pool Parameters # Items 236 b-Parameter (Item Difficulty) Range µ -3.34, -0.54 1.00 2.22 a-Parameter (Item Discrimination) Range µ 0.17, 0.99 0.34 1.64 c-Parameter (Pseudo-Chance) Range µ 0.00, 0.22 0.09 0.50

ACCUPLACER

A-34

THE COLLEGE BOARD

Appendix A Figure 10-1: Item Parameter Histograms for Reading Comprehension Item Pool

b-Parameter

30

30

a-Parameter

20

20

10

10

Frequency

Std. Dev = 1.00 Mean = -.54 0

.7 -1 .2 -1 -.7 75 1. 25 1. .2 -2 .7 -2 5 .2 5 -.2 5 .7 .2 -3 5 5 5 5 5 5

Frequency

Std. Dev = .34 Mean = .99 0

1 .8 4 .9 06 1. 6 .5 9 .6 1 .3 19 1. 4 .4 9 .1 31 1. 44 1.

N = 236.00

25 2.

N = 236.00

56 1.

b

a

c-Parameter

60 50

40

30

20

Frequency

10 0

00 .4 50 .3 50 .4 00 .2 50 .2 50 .0 0 00 0. 00 .3 00 .1 50 .1

Std. Dev = .10 Mean = .223 N = 236.00

00 .5

c

ACCUPLACER

A-35

THE COLLEGE BOARD

Appendix A

Statistical Characteristics of Sentence Skills Item Pool

The descriptive statistics for the Sentence Skills item pool are presented in Table 10-2 and Figure 10.2. The item difficulty parameters exhibit a wide spread of difficulty. Although there are some items with low discrimination parameters, the majority of these items display good discrimination. The pseudo-chance parameters also exhibit variability, although most of the items are centered on the chance level. Table 10-2: Summary Statistics for Sentence Skills Test Item Pool Parameters Test Sentence Skills # Items 230 b-Parameter (Item Difficulty) Range µ -4.18, -0.71 1.10 2.27 a-Parameter (Item Discrimination) Range µ 0.22, 0.89 0.30 1.46 c-Parameter (Pseudo-Chance) Range µ 0.04, 0.23 0.08 0.49

Figure 10-2: Item Parameter Histograms for Sentence Skills Test Item Pool

b-Parameter

30 30

a-Parameter

20 20

10

Frequency

10

Mean = -.71 0

5 -.7 5 .2 -1 5 .7 -1 5 .2 -2 5 .7 -2 5 .2 -3 5 .7 -3 5 .2 -4 5 .2 5 -.2

Frequency

Std. Dev = 1.10 N = 230.00

25 2. 75 1. 25 1. 5 .7

Std. Dev = .30 Mean = .89 0

4 .9 1 .8 9 .6 6 .5 4 .4 1 .3 9 .1 06 1. 19 1. 31 1.

N = 230.00

44 1.

b

60

c-Parameter

50

a

40

30

20

Frequency

10 0 .050 .100 .150 .200 .250 .300 .350 .400 .450 .475 .075 .125 .175 .225 .275 .325 .375 .425

Std. Dev = .08 Mean = .234 N = 230.00

c

ACCUPLACER

A-36

THE COLLEGE BOARD

Appendix A

Statistical Characteristics of Arithmetic Item Pool

The descriptive statistics for the Arithmetic item pool are presented in Table 10-3 and Figure 10-3. The item difficulty parameters are tightly grouped in the middle of the distribution, but a few extreme values are present. The majority of these items display good discrimination. The pseudo-chance parameters have a positively skewed distribution, although most of the items are centered on the chance level. Table 10-3: Summary Statistics for Arithmetic Test Item Pool Parameters # Items 188 b-Parameter (Item Difficulty) Range µ -6.84, -0.21 1.38 8.96 a-Parameter (Item Discrimination) Range µ 0.18, 0.98 0.35 1.63 c-Parameter (Pseudo-Chance) Range µ 0.00, 0.18 0.10 0.50

Figure 10-3: Item Parameter Histograms for Arithmetic Test Item Pool

b-Parameter

100

30

a-Parameter

80

60

20

40

10

Frequency

Frequency

20

Std. Dev = 1.38 Mean = -.2 N = 188.00 -7.0 -5.0 -3.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 -6.0 -4.0 -2.0

Std. Dev = .35 Mean = .98 0

1 .3 6 .5 9 .1 1 .8 4 .4 9 .6 4 .9

0

N = 188.00

56 1. 44 1. 31 1. 19 1. 06 1.

b

c-Parameter

40

a

30

20

Frequency

10 Std. Dev = .10 Mean = .180 0

00 0. 0 50 .0 00 .1 50 .1 00 .2 50 .2 00 .3 00 .4 50 .3 50 .4

N = 188.00

00 .5

c

ACCUPLACER

A-37

THE COLLEGE BOARD

Appendix A

Statistical Characteristics of Elementary Algebra Item Pool

The descriptive statistics for the Sentence Skills item pool are presented in Table 10-4 and Figure 10-4. The item difficulty parameters are negatively skewed with most centered in the middle. The vast majority of the discrimination parameters display good discrimination. The pseudo-chance parameters are positively skewed with most centered around the chance level. Table 10-4: Summary Statistics for Elementary Algebra Test Item Pool Parameters # Items 173 b-Parameter (Item Difficulty) Range µ -4.80, 0.03 1.15 2.43 a-Parameter (Item Discrimination) Range µ 0.29, 1.28 0.41 2.11 c-Parameter (Pseudo-Chance) Range µ 0.00, 0.18 0.10 0.50

Figure 10-4: Item Parameter Histograms for Elementary Algebra Test Item Pool

b-Parameter

30

a-Parameter

16 14 12

20

10 8 6

10

Frequency

Frequency

4 2 0

1 .8 9 .6 6 .5 4 .4 1 .3

Std. Dev = 1.15 Mean = .03 0

5 .7 5 .2 5 -. 2 5 -. 7 5 .2 -1 5 .7 -1 5 .2 -2 5 .7 -2 5 .2 -3 5 .7 -3 5 .2 -4 5 .7 -4

Std. Dev = .41 Mean = 1.28 N = 173.00

06 2. 94 1. 81 1. 69 1. 56 1. 44 1. 31 1. 19 1. 06 1. 4 .9

N = 173.00

25 2. 75 1. 25 1.

b

30

a

c-Parameter

20

10

Frequency

Std. Dev = .10 Mean = .183 0

0 00 0. 50 .0 00 .1 50 .1 00 .2 50 .2 00 .3 50 .3 00 .4 50 .4

N = 173.00

00 .5

c

ACCUPLACER

A-38

THE COLLEGE BOARD

Appendix A

Statistical Characteristics of College-Level Math Item Pool

The descriptive statistics for the College-Level Math item pool are presented in Table 10-5 and Figure 10.5. The item difficulty parameters are roughly normally distributed and the majority of the discrimination parameters display good discrimination. The pseudo-chance parameters are positively skewed with most being at the chance level or below. Table 10-5: Summary Statistics for College-Level Math Test Item Pool Parameters # Items 302 b-Parameter (Item Difficulty) Range µ -2.97, 0.35 1.07 3.81 a-Parameter (Item Discrimination) Range µ 0.09, 0.88 0.31 1.75 c-Parameter (Pseudo-Chance) Range µ 0.00, 0.14 0.18 0.48

Figure 10-5: Item Parameter Histograms for College Level Math Test Item Pool

a-Parameter

b-Parameter

40

40

30

30

20

20

Frequency

10 Std. Dev = 1.07 Mean = .35 0

0 .5 00 0. 0 -.5 0 .0 -1 0 .5 -1 0 .0 -2 0 .5 -2 0 .0 -3

Frequency

10 Std. Dev = .31 Mean = .88 0

.9 .8 1. .6 1. .5 1. 1. .4 1. 1. .3 .1 .0 4 1 06 19 9 6 31 56 4 44 1 9 6

N = 302.00

50 3. 00 3. 50 2. 00 2. 50 1. 00 1.

N = 302.00

69

b

60

c-Parameter

a

50

40

30

20

Frequency

10 0

75 .3 25 .4 00 .4 25 .1 00 .1 75 .0 50 .0 50 .4 00 .3 75 .2 50 .3 25 .3 50 .2 50 .1 00 0. 75 .1 25 .2 00 .2 25 .0 0

Std. Dev = .08 Mean = .144 N = 302.00

75 .4

c

ACCUPLACER

A-39

THE COLLEGE BOARD

Appendix A

Statistical Characteristics of the LOEP Item Pools

The descriptive statistics for the LOEP item pools are presented in Table 10-6 and Figures 10.6-10.8. For the Reading Test, the item difficulty parameters are tightly grouped around the mean and a large number of items exhibit poor discrimination. The c-parameters were fixed at zero for about one-third of these items and were fixed at .25 for the remaining items. The difficulty parameters for the Sentence Meaning Test were roughly normally distributed and the majority of these items displayed adequate discrimination. The cparameters for both the Sentence Meaning Test appear to be fixed at .20. The distributions of parameters for the Language Usage Test are similar, with the c-parameters fixed at .19. Table 10-6: Summary Statistics for Levels of English Proficiency Tests Item Pool Parameters Test Reading Skills Sentence Meaning Language Usage # Items 361 193 194 b-Parameter (Item Difficulty) Range µ -4.26, -0.23 0.82 2.85 -3.83, -0.54 1.02 2.23 -4.72, -0.41 1.07 4.42 a-Parameter (Item Discrimination) Range µ 0.00, 0.63 0.62 2.00 0.30, 1.13 0.35 2.00 0.07, 0.96 0.43 2.00 c-Parameter (Pseudo-Chance) Range µ 0.00, 0.12 0.10 0.20 0.20, 0.20 0.00 0.20 0.19, 0.19 0.00 0.19

ACCUPLACER

A-40

THE COLLEGE BOARD

Appendix A Figure 10-6: Item Parameter Histograms for LOEP Reading Skills Test Item Pool

b-Parameter

200 160 140 120 100 100 80 60

a-Parameter

Frequency

Frequency

40 Std. Dev = .62 20 0 0.00 .13 .25 .38 .50 .63 .75 1.00 .88 1.25 1.50 1.75 2.00 1.13 1.38 1.63 1.88 Mean = .63 N = 361.00

Std. Dev = .82 Mean = -.23 0

5 .2 5 -.2 5 -.7 5 .2 -1 5 .7 -1 5 .2 -2 5 .7 -2 5 .2 -3 5 .7 -3 5 .2 -4

N = 361.00

75 2. 25 2. 75 1. 25 1. 5 .7

b

a

c-Parameter

300

200

100

Frequency

Std. Dev = .10 Mean = .12 0 0.00 .13 .25 N = 361.00

c

ACCUPLACER

A-41

THE COLLEGE BOARD

Appendix A Figure 10-7: Item Parameter Histograms for LOEP Sentence Meaning Test Item Pool

b-Parameter

40

16 14

a-Parameter

30

12 10

20

8 6

Frequency

Std. Dev = 1.02 Mean = -.54 0

5 -.7 5 .2 -1 5 .7 -1 5 .2 -2 5 .7 -2 5 .2 -3 5 .7 -3

Frequency

10

4 2 0 Std. Dev = .45 Mean = 1.13 N = 193.00

N = 193.00

25 2. 75 1. 25 1. 5 .7 5 .2 5 -.2

1 .3

4 .4

6 .5

9 .6

1 .8

4 .9

0 1.

1 1.

3 1.

4 1.

5 1.

6 1.

8 1.

9 1.

6

9

1

4

6

9

1

4

b

a

c-Parameter

300

200

100

Frequency

Std. Dev = .00 Mean = .20 0 .20 N = 193.00

c

ACCUPLACER

A-42

THE COLLEGE BOARD

Appendix A Figure 10-8: Item Parameter Histograms for LOEP Language Usage Test Item Pool

b-Parameter

50

a-Parameter

16 14

40

12 10 8

30

20

6

Frequency

10

Std. Dev = 1.07 Mean = -.41 N = 194.00

Frequency

4 2 0

4 .9 1 .8 9 .6 6 .5 4 .4 1 .3 9 .1 6 .0

Std. Dev = .43 Mean = .96 N = 194.00

94 1. 81 1. 69 1. 56 1. 44 1. 31 1. 19 1. 06 1.

0

Frequency

50 4. 0 0 4. 50 3. 0 0 3. 0 5 2. 0 0 2. 0 5 1. 0 0 1. 0 .5 00 0. 0 -. 5 0 .0 -1 0 .5 -1 0 .0 -2 0 .5 -2 0 .0 -3 0 .5 -3 0 .0 -4 0 .5 -4

b

a

c-Parameter

300

200

100

Std. Dev = 0.00 Mean = .19 0 .19 N = 194.00

c

ACCUPLACER

A-43

THE COLLEGE BOARD

Appendix A

Chapter 11: Reliability, Information, and Measurement Error

The reliability of a test score is a measure of the consistency or stability of the score. If a given test yields widely discrepant scores for the same individual on separate testing occasions, and the individual did not change on the proficiency measured, then the test scores are not reliable. As described by Anastasi (1988): Reliability refers to the consistency of scores obtained by the same persons when reexamined with the same test on different occasions or with different sets of equivalent items, or under other variable examining conditions. This concept of reliability underlies the computation of the error of measurement of a single score, whereby we can predict the range of fluctuation likely to occur in a single individual's score as a result of irrelevant, chance factors. (p. 109) Reliability is inversely related to the amount of measurement error associated with test scores, and so it is a crucial index of test quality. Clearly, if a person's test score on different occasions changes dramatically according to unintended variations in the testing process, little faith can be put into a particular score obtained on a particular day. For this reason, reliability has been described as a "...necessary, but not sufficient condition for validity" (Nunnally, 1978, p. 192). A great deal of statistical theory has been developed to provide indices of the reliability of test scores as well as measures of measurement error throughout the test score scale. Classical test theory defines reliability as the squared correlation between observed test scores and their unbiased values ("true scores"). Reliability indices typically range from zero to one, with values of .80 or higher signifying test scores that are likely to be consistent from one test administration to the next. Item response theory (IRT) describes measurement error in terms of test information and magnitude of error at specific score points. For ACCUPLACER tests, both classical and IRT estimates of reliability and measurement error are evaluated. In this chapter, we present (a) estimates of test-retest reliability, (b) internal consistency reliability estimates for all ACCUPLACER test scores, (c) standard errors of measurement at each point along the ACCUPLACER score scales (i.e., conditional standard errors of measurement), (d) test information functions, and (e) estimates of placement classification consistency. Each of these measures is described in turn. Following these definitions, reliability and measurement error data for each ACCUPLACER test is presented. This information should be useful to ACCUPLACER users for evaluating the likelihood that a student's score may change upon retesting.

Test-Retest Reliability

Test-retest reliability refers to the correlation between scores that result from two separate administrations of the same test to the same group of examinees. Since a test-retest reliability coefficient is simply the correlation between two sets of scores on a single exam taken at two different points in time, it is perhaps the easiest reliability coefficient to understand. It is also an important reliability estimate because it is affected by random error associated with a given testing occasion. Obviously, if we are interested in evaluating the consistency of test scores, measuring a set of people with the same test repeatedly will give us information regarding consistency. Although test-retest reliability estimates are desirable, in practice, they are difficult to obtain because the same group of examinees must be tested twice.

ACCUPLACER

A-44

THE COLLEGE BOARD

Appendix A Nevertheless, some test-retest reliability estimates have been computed for some ACCUPLACER tests as described below. Although these estimates are based on rather small samples, they do provide an important source for evaluating the consistency of ACCUPLACER test scores.

Internal Consistency Reliability

Internal consistency reliability estimates are easier to obtain than test-retest reliability coefficients because they can be computed from a single administration of a test. Rather than test examinees twice, examinees' tests can be "split" in various ways and then the correlation between the "part-tests" can be computed (and adjusted for test length). The most popular method for computing internal consistency reliability is coefficient alpha (Cronbach, 1951). Coefficient alpha and other measures of internal consistency provide an estimate of how well the items "hang together." If examinees score similarly on different parts of a test, then their performance on these parts is "internally consistent." For this reason, internal consistency reliability is sometimes described as item homogeneity. Because ACCUPLACER tests are computerized adaptive, internal consistency reliability estimates are computed from the conditional standard errors of measurement, as described below.

The Standard Error of Measurement and Conditional Standard Error

The standard error of measurement (SEM) is an important index of measurement error because it provides an estimate of the average amount of error associated with a test score. Unlike a reliability coefficient that describes measurement precision on a zero-to-one metric, the SEM is reported on the test score scale. The SEM is conceptualized as the standard deviation of errors of measurement associated with a set of test scores. It is often used to describe how far an examinee's observed test score may be from her or his "true" score (i.e., a score that is measured without error). The SEM can be used to form a confidence interval around an observed test score to suggest a score interval within which an examinee's true score may lie. Because it is the standard deviation of the distribution of measurement errors (which is typically assumed to be normally distributed), it is expected that an examinee's observed score will lie within one SEM of her or his true score about 68% of the time. The SEM is computed using the standard deviation of a group of test scores and an estimate of the reliability of the test:

SEM = 1 - rtt

(11-1),

where = the standard deviation of test scores and rtt =the reliability estimate. A significant limitation of the SEM is that it is an index of average error rather than an index of the error associated with a specific test score. It is well known that measurement error may differ at various points along the test score scale. For this reason, the AERA et al. (1999) Standards states "conditional standard errors of measurement should be reported at several score levels if constancy cannot be assumed" (p. 35). The Standards also state "where cut scores are specified for selection or classification the standard errors of measurement should be reported in the vicinity of each cut score" (p. 35). Given that colleges and universities use various ACCUPLACER "cut scores" for making placement decisions, it is important to provide conditional standard errors of measurement throughout the score scale. Below, we report conditional standard errors for all ACCUPLACER tests.

ACCUPLACER

A-45

THE COLLEGE BOARD

Appendix A

IRT Test Information

A measurement concept that is similar to internal consistency reliability is test information. In IRT, measurement precision is gauged throughout the test score scale (proficiency scale) by test information functions. Test information is the reciprocal of measurement error and so test information functions portray the measurement precision of the test across the entire test score scale. By viewing a test information function, information regarding all possible conditional standard errors of measurement is provided. The relationship between conditional standard error and test information is expressed as

SE ( ) =

^

1 I ( )

^

(11-2),

where SE ( ) is the conditional standard error of measurement within an IRT context (i.e., standard error of estimation), and I ( ) is the test information provided at (Hambleton, et al., 1991). Test information at is simply the sum of the item information functions at that same point. For the 3PL IRT model used for ACCUPLACER, item information, which represents the contribution an item makes to measurement precision at various points along the proficiency scale, is calculated using 2.89 ai2 (1 - ci ) I i ( ) =

^

^

^

^

ci + e1.7ai ( -bi ) 1 + e -1.7ai ( -bi )

^

^

2

(11-3),

and test information is calculated as

I ( ) =

n i =1

I i ( )

(11-4)

Information functions for each of the ACCUPLACER item pools are presented later in this chapter.

Classification Consistency

Classification consistency refers to the reliability of a classification made on the basis of an assessment. With respect to ACCUPLACER, classification consistency refers to the degree to which ACCUPLACER test scores would result in consistent placement decisions for the same student over repeated testing. Simulation studies were conducted to estimate the classification consistency at selected points on the ACCUPLACER score scale. These results are also reported in this chapter.

ACCUPLACER

A-46

THE COLLEGE BOARD

Appendix A

Reliability, Standard Errors and Test Information Data for ACCUPLACER Scores

A series of simulation studies was conducted to estimate the conditional standard errors of measurement (CSEMs) for selected score intervals along the ACCUPLACER score scale. A total of 1,800 hypothetical examinees were administered each test. Two hundred of these students had skill levels corresponding to a total right score between 20 and 30, 200 to a score between 31 and 40, and so on through a total right score of 120. The conditional standard error of measurement was then obtained at each score level. Table 11-1 displays these CSEMs at these selected score levels. It is important to note that these CSEMs are based on the total 120-item pool for each test. Because of that, they may appear large when compared with a test with only 30 or 40 items. These CSEMs indicate a high degree of score accuracy for ACCUPLACER tests as do the high internal consistency reliability coefficients calculated from these values. The internal consistency reliabilities were computed by working backwards from equation 11-1. The SEM was calculated by first adding CSEMs at each score level, where each CSEM was weighted by the score frequency at that level, and then dividing the total by the total N. (The score frequencies were obtained from a sample of 20,000 students from the database for each test.) This average CSEM was then squared, divided by the total test variance and then subtracted from one to obtain the reliability. Table 11-1: Conditional Standard Errors of Measurement at Selected Ten-point Scale Intervals and Estimated Coefficient Alpha Reliability

Total Right Score 110 100 90 80 70 60 50 40 30 Internal Consistency Reliability Reading Comp. 3.8 6.3 7.8 8.6 8.9 8.6 8.2 7.5 6.7 0.87 Sentence Skills 4.0 6.8 8.2 8.8 8.9 8.5 7.8 6.3 4.1 0.91 Arithmetic 3.6 6.4 8.2 9.4 10.0 9.8 8.9 7.7 6.4 0.92 Elementary Algebra 4.7 7.7 9.5 10.4 10.5 10.2 9.5 8.0 6.0 0.92 CollegeLevel Math 4.0 6.2 7.3 7.9 8.2 8.1 7.8 7.4 7.0 0.86 LOEP Reading Skills 3.7 6.0 7.3 8.0 8.3 8.2 7.6 6.3 4.0 0.88 LOEP Sent Mean 3.9 6.1 7.1 7.4 7.3 6.9 6.3 5.3 3.9 0.92 LOEP Language Use 4.7 6.3 7.5 8.2 8.7 8.9 8.7 7.7 5.3 0.87

Table 11-1 provides a succinct summary of measurement error at selected score intervals. To derive CSEMs for each ACCUPLACER Total Right Score, two additional studies were also conducted. Since data for the first five ACCUPLACER tests were based on the NJCBST and CLEP tests, data from those examinations were used to simulate ACCUPLACER examinees for the purposes of estimating CSEMs. These conditional standard errors are presented in Table 11-2.

ACCUPLACER

A-47

THE COLLEGE BOARD

Appendix A Table 11- 2: Total Right Scores and Their Conditional Standard Errors of Measurement

Total Right Score 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 79 78 77 76 75 74 73 72 71 70 69 68 67 66 Reading Comp* .0 .4 .8 1.2 1.6 2.0 2.4 2.7 3.1 3.4 3.8 4.1 4.4 4.7 4.9 5.2 5.4 5.7 5.9 6.1 6.3 6.5 6.7 6.9 7.0 7.2 7.3 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 8.5 8.5 8.6 8.6 8.7 8.7 8.8 8.8 8.8 8.9 8.9 8.9 8.9 8.8 8.8 8.8 8.8 Sentence Skills* .0 .4 .9 1.3 1.7 2.1 2.5 2.9 3.3 3.7 4.0 4.4 4.7 5.0 5.3 5.6 5.8 6.1 6.3 6.5 6.8 6.9 7.1 7.3 7.5 7.6 7.7 7.9 8.0 8.1 8.2 8.3 8.4 8.5 8.5 8.6 8.6 8.7 8.7 8.8 8.8 8.9 8.9 8.9 8.9 8.9 8.9 8.9 8.9 8.9 8.9 8.9 8.8 8.8 Arithmetic* 0.0 .4 .8 1.1 1.5 1.9 2.3 2.6 3.0 3.3 3.6 4.0 4.3 4.6 4.9 5.1 5.4 5.7 5.9 6.1 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.7 7.9 8.1 8.2 8.4 8.5 8.7 8.8 8.9 9.0 9.1 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.9 9.9 9.9 10.0 10.0 10.0 10.0 10.0 10.0 Elementary Algebra* .0 .5 1.0 1.5 2.0 2.5 2.9 3.4 3.8 4.3 4.7 5.1 5.4 5.8 6.1 6.4 6.7 7.0 7.3 7.5 7.7 8.0 8.2 8.4 8.6 8.8 8.9 9.1 9.3 9.4 9.5 9.7 9.8 9.9 10.0 10.1 10.1 10.2 10.3 10.3 10.4 10.5 10.5 10.5 10.5 10.6 10.6 10.6 10.6 10.5 10.5 10.5 10.5 10.5 College-Level Math** .0 .4 .9 1.3 1.7 2.2 2.6 3.0 3.3 3.7 4.0 4.3 4.6 4.9 5.1 5.3 5.5 5.7 5.9 6.1 6.2 6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3 7.4 7.4 7.5 7.5 7.6 7.7 7.7 7.8 7.8 7.9 8.0 8.0 8.1 8.1 8.1 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2

ACCUPLACER

A-48

THE COLLEGE BOARD

Appendix A Table 11-2 (Continued) Total Right Scores and Their Conditional Standard Errors of Measurement

Total Right Score 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 Reading Comp* 8.8 8.7 8.7 8.7 8.6 8.6 8.6 8.5 8.5 8.5 8.4 8.4 8.3 8.3 8.2 8.2 8.1 8.0 8.0 7.9 7.8 7.8 7.7 7.6 7.5 7.5 7.4 7.3 7.2 7.2 7.1 7.0 6.9 6.8 6.8 6.7 6.6 6.5 6.4 6.3 Sentence Skills* 8.8 8.7 8.7 8.6 8.6 8.5 8.5 8.4 8.4 8.3 8.2 8.2 8.1 8.0 7.9 7.8 7.7 7.6 7.4 7.3 7.2 7.0 6.8 6.7 6.5 6.3 6.1 5.9 5.7 5.5 5.2 5.0 4.8 4.5 4.3 4.1 3.8 3.6 Arithmetic* 10.0 9.9 9.9 9.9 9.8 9.8 9.7 9.7 9.6 9.5 9.4 9.3 9.2 9.1 9.0 8.9 8.8 8.7 8.6 8.5 8.4 8.2 8.1 8.0 7.9 7.7 7.6 7.5 7.3 7.2 7.0 6.9 6.8 6.6 6.5 6.4 6.2 6.1 5.9 5.8 5.7 5.5 5.4 5.3 5.1 5.0 Elementary Algebra* 10.4 10.4 10.4 10.3 10.3 10.2 10.2 10.1 10.1 10.0 9.9 9.8 9.8 9.7 9.6 9.5 9.4 9.2 9.1 9.0 8.8 8.7 8.5 8.4 8.2 8.0 7.8 7.6 7.4 7.2 7.0 6.8 6.6 6.4 6.2 6.0 5.7 5.5 5.3 5.1 4.9 4.7 4.6 4.3 4.1 3.9 College-Level Math** 8.2 8.2 8.2 8.1 8.1 8.1 8.1 8.0 8.0 8.0 8.0 8.0 7.9 7.9 7.9 7.8 7.8 7.8 7.7 7.7 7.6 7.6 7.6 7.5 7.5 7.4 7.4 7.3 7.3 7.2 7.2 7.2 7.1 7.1 7.1 7.0 7.0 6.9 6.9 6.9 6.8 6.8 6.8 6.7 6.7 6.6

*The conditional standard errors of measurement (SEM) were generated in a simulation study based on a sample of 3,000 college students who took the New Jersey College Basic Skills Placement Test. **The conditional standard errors of measurement (SEM) were generated in a simulation study based on a sample of 3000 college students who took a College-Level Examination Program (CLEP) College Algebra/Trigonometry test for college credit.

ACCUPLACER

A-49

THE COLLEGE BOARD

Appendix A Table 11-3: LOEP Total Right Scores and Their Conditional Standard Errors of Measurement

ACCUPLACER Score 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 Language Use 0.00 0.51 1.00 1.46 1.90 2.31 2.71 3.08 3.43 3.76 4.07 4.36 4.64 4.90 5.14 5.37 5.59 5.79 5.99 6.17 6.33 6.49 6.64 6.78 6.91 7.04 7.15 7.26 7.37 7.46 7.56 7.64 7.72 7.80 7.88 7.95 8.01 8.08 8.14 8.20 8.25 8.30 8.35 8.40 8.45 8.49 8.54 8.58 8.62 8.66 8.70 8.73 8.76 8.79 Sentence Meaning 0.00 0.50 0.95 1.41 1.83 2.23 2.61 2.97 3.31 3.63 3.94 4.22 4.49 4.75 4.99 5.21 5.42 5.61 5.80 5.97 6.13 6.27 6.41 6.53 6.64 6.75 6.85 6.93 7.01 7.09 7.15 7.20 7.25 7.30 7.33 7.36 7.38 7.41 7.42 7.43 7.44 7.44 7.43 7.43 7.42 7.40 7.3 7.37 7.35 7.33 7.30 7.27 7.24 7.20 Reading Skills 0.00 0.45 0.88 1.29 1.67 2.06 2.43 2.77 3.10 3.41 3.71 3.99 4.26 4.52 4.77 5.00 5.22 5.43 5.63 5.82 6.00 6.17 6.33 6.49 6.63 6.77 6.89 7.01 7.13 7.23 7.34 7.43 7.52 7.61 7.68 7.76 7.83 7.89 7.95 8.00 8.05 8.10 8.14 8.18 8.21 8.24 8.27 8.29 8.30 8.32 8.33 8.34 8.34 8.34

ACCUPLACER

A-50

THE COLLEGE BOARD

Appendix A Table 11- 3 (Continued) LOEP Total Right Scores and Their Conditional Standard Errors of Measurement

ACCUPLACER Score 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 Language Use 8.82 8.84 8.87 8.90 8.91 8.92 8.93 8.94 8.95 8.95 8.94 8.93 8.91 8.89 8.86 8.82 8.78 8.72 8.66 8.57 8.52 8.43 8.32 8.21 8.09 7.94 7.79 7.62 7.44 7.24 7.02 6.79 6.53 6.26 5.96 5.64 7.79 7.62 7.44 7.24 7.02 6.79 6.53 6.26 5.96 5.64 5.31 4.94 4.55 4.14 3.70 3.23 2.73 Sentence Meaning 7.17 7.13 7.10 7.05 7.00 6.96 6.91 6.86 6.80 6.75 6.70 6.63 6.57 6.51 6.44 6.38 6.30 6.23 6.15 6.07 5.98 5.90 5.80 5.70 5.60 5.50 5.39 5.27 5.15 5.02 4.88 4.75 4.59 4.44 4.28 4.11 5.39 5.27 5.15 5.02 4.88 4.75 4.59 4.44 4.28 4.11 3.93 3.74 3.55 3.34 3.12 2.90 2.65 Reading Skills 8.34 8.33 8.37 8.30 8.28 8.25 8.22 8.19 8.15 8.10 8.06 8.00 7.94 7.88 7.80 7.72 7.64 7.55 7.45 7.35 7.23 7.11 6.98 6.84 6.69 6.53 6.36 6.18 5.99 5.79 5.58 5.35 5.11 4.86 4.59 4.31 6.36 6.18 5.99 5.79 5.58 5.35 5.11 4.86 4.59 4.31 4.01 3.70 3.38 3.03 2.67 2.30 1.90

ACCUPLACER

A-51

THE COLLEGE BOARD

Appendix A

Test-Retest Reliability

In Table 11-4 test-retest reliability coefficients are displayed for the four original ACCUPLACER Tests. There are three sources for these data. The first source is test-retest results from a study completed at a Community College in central California that serves a ruralsuburban student population. The student body is approximately 40% White, 10% American Indian, and 40% Hispanic, Black, and Asian American. During registration for the Fall semester, students were recruited with a monetary incentive to take the tests twice. Approximately, 115 students initially signed up for the study. The testing occurred outside regular classes and was monitored by college personnel. The students took the second test two weeks after the first testing session. The second source of test-retest data is from a study where students were administered the tests twice as part of the placement procedure. These students' scores were selected from an archive database from data disks returned to the College Board. About 40% of the students in the database were minority students (5% American Indian students, 5% Asian-American students, 15% Black students, and 15% Hispanic students). The database was searched for students who took ACCUPLACER tests twice within an intervening time period of two weeks to two months. These students took ACCUPLACER twice under actual placement conditions where their scores impacted on their recommended courses. Because the testing occurred as part of the placement process, college personnel in a controlled testing setting monitored the sessions. These students were most likely unhappy with their first test scores and elected to retest to achieve a higher score. As a result, they probably do not represent the full range of student proficiency. Thus, the resulting correlations are probably attenuated because of this restriction in range. The third source of test-retest evidence is a study conducted when the tests were first introduced in 1985. The test-retest correlations reported in Table 11-4 range from .73 to .96. Although this range is acceptable and coefficients below .80 are disconcerting, the small samples sizes from which these correlations were computed require that these results be interpreted cautiously. Table 11- 4: Test-Retest Reliability Estimates

Test Reading Data Source California N 43 First Mean 74.76 66.42 78.80 79.37 74.82 83.20 41.13 53.43 69.80 48.00 54.17 57.20 S.D. 23.69 20.88 22.00 23.85 24.28 23.80 24.08 24.73 26.60 27.68 28.79 30.80 Second Mean S.D. 72.06 24.22 67.21 18.80 79.90 21.90 80.40 78.63 87.70 44.10 60.22 71.70 37.69 61.46 59.10 26.74 23.91 20.20 22.52 23.87 28.60 30.86 30.13 31.80 Correlation .81 .76 .90 .73 .77 .83 .96 .78 .91 .89 .79 .96

Sentence Skills

California Database 1985 Study California Database 1985 Study California Database 1985 Study

45 100 40 10 47 39 16 49 39

Arithmetic

Elementary Algebra

ACCUPLACER

A-52

THE COLLEGE BOARD

Appendix A

Accuracy of Classification

Because the primary purpose of the ACCUPLACER Tests is to identify and classify students in need of remediation, it is important to estimate the accuracy of these classifications. The reliability of classification shows the extent to which the classifications made on the basis of the examinees' Total Right Scores are the same as those that would be made on the basis of their true scores, were the true scores known. True scores here are defined as the average of examinees' scores over all possible forms of the test, assuming forms of equal difficulty. The estimated true score can be thought of as the score that would be assigned for an examinee if there were no measurement error. Table 11-5 displays reliability of classification indices for selected cut-scores. For example, for Reading Comprehension and a cut-score of 64, 91% of the examinees were classified the same way using Total Right scores as they would have been if classification decisions were based on examinee true scores. It is estimated that 9% of the classifications based on total right scores would have been different if based on examinee true scores. Because cut-scores vary for any test, the cut-scores in the table below are for illustrative purposes only. Given the cut-scores studied here, all of the reliability of classification indices are at or above .90, indicating substantial agreement between classifications based on ACCUPLACER Total Right scores and classifications that would be based on true scores, were they known. Table 11- 5: Reliability of Classification Estimates for Selected ACCUPLACER Cut-scores Test Reading Comprehension Sentence Skills Arithmetic Elementary Algebra College-Level Mathematics Cut-score 64 70 58 80 60 Classification Agreement .91 .94 .96 .96 .93

Information Functions for ACCUPLACER Item Pools

As described earlier, the analogue of reliability in a test developed using item response theory is test information. Test score precision in IRT can be displayed graphically by plotting the test information function, which illustrates the amount of information (i.e., reciprocal of standard error) at all points along the test score scale. The information for a specific ACCUPLACER test depends on the specific items taken by the examinee. However, item pool information functions can be calculated by computing the test information function for all items comprising the pool. These pool information functions are illuminating regarding the potential measurement precision throughout the ACCUPLACER score scale. The pool information functions for the ACCUPLACER Tests are presented below. These functions indicate that the items in the pool can provide high levels of measurement precision throughout the entire score scale. Hambleton (personal communication) explained that an information value of 10 is analogous to an internal consistency reliability estimate of

ACCUPLACER

A-53

THE COLLEGE BOARD

Appendix A .90 . An inspection of the ACCUPLACER item pool information functions reveals that all pools exhibit extremely high measurement precision in the middle of the distribution and all pools have information of 10 or greater within a range of almost two standard deviations from the mean. These data indicate that the ACCUPLACER item pools have sufficient depth for accurate measurement precision throughout a wide range of proficiency.

1

Reading Comprehension Item Pool Information Function

90.0 80.0 70.0 60.0 I( ) 50.0 40.0 30.0 20.0 10.0 0.0 -4 -3 -2 -1 0 1 2 3 4

PIF

This statement can be proven by substituting a reliability of .90 and standard deviation of 1 into equation 11-1 and then plugging these values into equation 11-2 and solving for the information value. ACCUPLACER A-54 THE COLLEGE BOARD

1

Appendix A Sentence Skills Item Pool Information Function

60.0 50.0 40.0 I( ) 30.0 20.0 10.0 0.0 -4 -3 -2 -1 0 1 2 3 4

PIF

ACCUPLACER

A-55

THE COLLEGE BOARD

Appendix A Arithmetic Item Pool Information Function

80.0 70.0 60.0 50.0 I( ) 40.0 30.0 20.0 10.0 0.0 -4 -3 -2 -1 0 1 2 3 4

PIF

ACCUPLACER

A-56

THE COLLEGE BOARD

Appendix A Elementary Algebra Item Pool Information Function

100.0 90.0 80.0 70.0 60.0 I( ) 50.0 40.0 30.0 20.0 10.0 0.0 -4 -3 -2 -1 0 1 2 3 4

PIF

College Level Math Item Pool Information Function

90.0 80.0 70.0 60.0 I( ) 50.0 40.0 30.0 20.0 10.0 0.0 -4 -3 -2 -1 0 1 2 3 4

PIF

ACCUPLACER

A-57

THE COLLEGE BOARD

Appendix A

LOEP Reading Skills Item Pool Information Function

90.0 80.0 70.0 60.0 I( ) 50.0 40.0 30.0 20.0 10.0 0.0 -4 -3 -2 -1 0 1 2 3 4

PIF

ACCUPLACER

A-58

THE COLLEGE BOARD

Appendix A LOEP Language Usage Item Pool Information Function

80.0 70.0 60.0 50.0 I( ) 40.0 30.0 20.0 10.0 0.0 -4 -3 -2 -1 0 1 2 3 4

PIF

ACCUPLACER

A-59

THE COLLEGE BOARD

Appendix A LOEP Sentence Meaning Item Pool Information Function

80.0 70.0 60.0 50.0 I( ) 40.0 30.0 20.0 10.0 0.0 -4 -3 -2 -1 0 1 2 3 4

PIF

ACCUPLACER

A-60

THE COLLEGE BOARD

Appendix A

Chapter 12: Validity Evidence for ACCUPLACER

According to the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999), "validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests" (p. 9). Validity is the extent to which the inferences (interpretations) derived from test scores are justifiable from both scientific and equity perspectives. For decisions based on test scores to be valid, the use of a test for a particular purpose must be supported by theory and empirical evidence, and biases in the measurement process must be ruled out. With respect to ACCUPLACER, test validity must be evaluated with respect to the degree to which students' ACCUPLACER scores are useful for making appropriate placement decisions. Factors that might threaten the validity of the use of ACCUPLACER scores for placement decisions include poor quality items, inability to provide accurate placement information, and measurement of knowledge or skills that are extraneous to the subject matter tested. In this chapter, research that evaluated the validity of ACCUPLACER scores is described and summarized. Before presenting these descriptions and summaries, some basic terms in test validity and validation are described.

A Brief Overview of Test Validity and Validation

From the outset, it is important to bear in mind that validity is not an intrinsic property of a test. As many psychometricians have pointed out (e.g., Cronbach, 1971; Messick, 1989; Shepard, 1993), in judging the worth of a test, it is the inferences derived from the test scores that must be validated, not the test itself. Therefore, the specific purpose(s) for which test scores are being used must be considered when evaluating validity. For example, a test may be useful for one purpose, such as college admissions, but not for another, such as course placement. Contemporary definitions of validity in testing borrow largely from Messick (1989) who stated "validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment" (p. 13). From this definition, it is clear that validity is not something that can be established by a single study and that tests cannot be labeled `valid' or `invalid.' Given that (a) validity is the most important consideration in evaluating the use of a test for a particular purpose, and (b) such utility can never be unequivocally established, establishing that a test is appropriate for a particular purpose is an arduous task. Thus, the following facts about validity should be clear: (a) tests must be evaluated with respect to a particular purpose, (b) what needs to be validated are the inferences derived from test scores, not the test itself, (c) evaluating inferences made from test scores involves several different types of qualitative and quantitative evidence, and (d) evaluating the validity of inferences derived from test scores is not a one-time event; it is a continuous process. In addition, it should be noted that although test developers must provide evidence to support the validity of the interpretations that are likely to be made from test scores, ultimately, it is the responsibility of the users of a test to evaluate this evidence to ensure the test is appropriate for the purpose(s) for which it is being used.

ACCUPLACER

A-61

THE COLLEGE BOARD

Appendix A

Test Validation

To make the task of validating inferences derived from test scores both scientifically sound and manageable, Kane (1992) proposed an "argument-based approach to validity." In this approach, the validator builds an argument based on empirical evidence to support the use of a test for a particular purpose. Although this validation framework acknowledges that validity can never be established absolutely, it requires evidence that (a) the test measures what it claims to measure, (b) the test scores display adequate reliability, and (c) test scores display relationships with other variables in a manner congruent with its predicted properties. Kane's practical perspective is congruent with the Standards for Educational and Psychological Testing (AERA et al., 1999), which provide detailed guidance regarding the types of evidence that should be brought forward to support the use of a test for a particular purpose. For example, the Standards state A sound validity argument integrates various strands of evidence into a coherent account of the degree to which existing evidence and theory support the intended interpretation of test scores for specific uses...Ultimately, the validity of an intended interpretation...relies on all the available evidence relevant to the technical quality of a testing system. This includes evidence of careful test construction; adequate score reliability; appropriate test administration and scoring; accurate score scaling, equating, and standard setting; and careful attention to fairness for all examinees... (p. 17)

Gathering Validity Evidence

To build a validity argument for a test, there are several types of evidence that can be brought forward. Traditionally, the major forms of validity evidence are content validity, criterion-related validity, and construct validity. Content validity evidence involves gathering data from content experts regarding the degree to which the behaviors sampled on the test represent the behaviors the test is designed to measure. Criterion-related validity evidence involves evaluating correlations among test scores and other variables related to the construct measured. Predictive and concurrent validity are special cases of criterion-related validity that involve correlating test scores with future or current criterion performance. With respect to ACCUPLACER, many criterion-related validity studies looked at the correlation between ACCUPLACER scores and final course grades. Construct validity involves gathering data that show test scores are indicative of the construct measured. Many test theorists (e.g., AERA et al. 1999; Messick, 1989) consider content and criterion validity to be subcomponents of construct validity because such evidence assists in evaluating test-construct congruence. For ACCUPLACER scores, evidence of content and predictive validity is particularly important. For a test to be used to identify subject area deficiencies that require placement in developmental courses, the test needs to contain content relevant to that subject area. In addition, the placement test scores should be predictive of students' performance in the course where his or her success is predicted.

ACCUPLACER

A-62

THE COLLEGE BOARD

Appendix A

Content Validity Evidence for ACCUPLACER

Sireci (1998a, 1998b) describes four critical aspects of content validity: (a) domain definition, (b) domain representation, (c) domain relevance, and (d) appropriate test construction procedures. For the content of a test to be considered valid, the subject domain tested should be clearly defined and external content specialists should verify that the items represent the intended domain and they are relevant to that domain. These important test construction steps were described in Chapter 2. As illustrated in Chapter 2, the multiple rounds of quality control checks on ACCUPLACER items facilitated content validity. In addition, all items are coded according to their content specifications within the computerized item selection algorithm, which ensures that all examinees get the appropriate breadth and depth of test content as delineated in the test specifications. Furthermore, ACCUPLACER items undergo comprehensive sensitivity reviews to ensure no offensive or derogatory material is present (see Chapter 2). Thus, the degree to which ACCPLACER tests represent their intended domains is high.

Predictive Validity

Numerous studies of the degree to which ACCUPLACER test scores are related to students' subsequent course grades have been conducted. Some of these studies have cut across institutions and were coordinated by the College Board. Many other studies were conducted by specific institutions to help evaluate the utility of ACCUPLACER for making placement decisions or to help determine the most appropriate ACCUPLACER cut scores for their school. In this section, the results of a predictive validity study involving 50 institutions are reported. These results are followed by a summary of selected other studies conducted at the state or institution level. Before summarizing these studies, the limitations of predictive validity studies must be reviewed. The primary limitation of ACCUPLACER predictive validity studies is that there is no perfect measure of appropriate course placement. The most common validity criterion used is students' overall grades or grades in specific courses. It makes sense that ACCUPLACER scores should correlate positively with students' grades in one or more courses. However, student-level factors such as attendance, motivation, and perseverance have great effects on course grades, but these factors are not measured by ACCUPLACER. Thus, grades in a subsequent course are imperfect validity criteria. Statistical factors also play a role in predictive validity studies. Small sample sizes, unreliability of students' grades, and restriction of range all tend to reduce the magnitude of predictive validity coefficients. Thus, it is very likely that the predictive validity correlations noted for ACCUPLACER scores underestimate the utility of ACCUPLACER scores as a placement measure. The under estimation of validity coefficients due to restriction of score range on the predictor and/or criterion measure is a well-known problem in predictive validity research. Validity coefficients, like any correlation coefficient, can be underestimated due to a decrease in the variability of the test scores used to predict course grades. The effect of this range restriction can be illustrated through the use of a range restriction correction formula (Gulliksen, 1950).

ACCUPLACER

A-63

THE COLLEGE BOARD

Appendix A

Predictive Validity Across 50 Institutions

A large-scale study of the predictive validity of the ACCUPLACER tests began in January 1990 and continued through early 1992. The colleges made their own decisions about such issues as when testing would take place (in relation to the beginning of instruction); which test would be administered to an examinee; and what criteria would be used in determining placement in a course. Thus, the results reported here represent the experience of a number of test users under the variety of conditions found in actual practice, rather than the outcome of a well-controlled experimental study. Fifty colleges and universities took part in the study--38 two-year colleges and 12 four-year institutions. All institutions are listed in Appendix C. Each student had a score on at least one module of ACCUPLACER and a placement and grade in one course. About one-third of the records included the student's self-reported gender and ethnic group membership. Frequency distributions, means, and standard deviations of test scores are presented in Tables 12-1 and 12-2. Course categories represented by fewer than 100 students were excluded from the analysis. Table 12-2 shows the sex and ethnic group breakdown of the sample based on answers to background questions; many individuals chose to "omit" rather than report this information, and many institutions did not choose to ask these questions of their students. Table 12-1: Frequency Distributions of ACCUPLACER Scores Used in Validity Studies

Scores 110.001-120 100.001-110 90.001-100 80.001-90 70.001-80 60.001-70 50.001-60 40.001-50 30.001-40 20.001-30 0-20 College-Level Mathematics 11 31 57 101 169 254 349 449 799 1857 1145 5222 34.68 19.15 Elementary Algebra 321 359 413 555 516 518 682 846 1675 2243 8 8136 52.06 27.74 Arithmetic 255 408 356 468 527 531 629 754 993 1166 27 6114 57.67 27.80 Reading Comp. 274 1002 1397 1675 1605 1169 792 616 422 116 13 9081 76.88 20.95 Sentence Skills 976 1364 1273 1510 1200 736 773 543 417 95 2 8889 81.79 23.12

N Mean S.D.

ACCUPLACER

A-64

THE COLLEGE BOARD

Appendix A Table 12-2: Frequency Distributions of Gender and Ethnic Group Members in Validity Studies

Female Male Total Sex N 3143 3180 6323 N 239 737 1021 302 3813 98 6210

aCombining three groups: Mexican American or Chicano; Puerto Rican; Other Hispanic, Latino, Central American, or South American

Ethnicity Native American, American Indian, or Alaskan Black or African American Hispanica Asian or Pacific American White (non-Hispanic) or Caucasian Other Total

Correlation results

Tables 12-3 through 12-11 present analyses of the relations of test scores with grades for the Reading Comprehension, Sentence Skills, Arithmetic, Elementary Algebra, and CollegeLevel Mathematics tests, respectively. The same organization is used in each table. For each course level examined, the correlation (merging across colleges the data from all students), the number of colleges (n) whose students provided data, the number of students (N) on whom the coefficient is based, and the sample means and standard deviations of the scores and grades are presented. Each within-discipline combination of scores and grades for which at least 30 cases were available is included in the tables. Next, the regression coefficients for predicting the grade from the test score are given, again, based on data merged across all colleges. (The coefficients a and b are to be entered into the regression equation, Y = a + bX, where X is the test score and Y is the predicted grade.) Below this are given the median correlation obtained from analyses within individual colleges, utilizing data from each institution for which at least 30 cases were available for the test-course combination, and then the institution-by-institution correlations. Note that the columns of individual institution results are independent of one another; for example, the first entry in one column might or might not represent data from the college that provided the first entry in the next column. The analyses for individual institutions generally include the majority of the available cases, but an appreciable number of students come from institutions providing smaller numbers of cases. The overall coefficients obtained by merging data across institutions are similar to the median results obtained in the institution-by-institution analyses; the magnitude of the difference between comparable coefficients is typically small, and neither set shows consistently higher values than the other. It should be noted that these coefficients are based on situations in which the test scores were used in placing students into courses. Thus, there is generally some restriction in the range of scores--sometimes rather severe restriction--as compared to that for all students who took one of the tests, and the coefficients underestimate the magnitude of the relations that would be found if the scores were not used in placement.

ACCUPLACER

A-65

THE COLLEGE BOARD

Appendix A

Reading Comprehension

The correlation across institutions of Reading Comprehension test scores with grades in Developmental Reading is .18. (Tables 12-5 and 12-6.) The median correlation within institutions of Reading Comprehension test scores with grades in Developmental Reading is .19. The highest correlation between the test scores and course grades in Developmental Reading at a particular institution was .38. Because most colleges do not offer college-level reading courses, analyses are not presented for those courses.

Sentence Skills

The correlation of Sentence Skills test scores (Table 12-7) with grades in Developmental English across institutions was .15. The median correlation within colleges of Sentence Skills test scores with grades in Developmental English was .20. The highest correlation at an institution with final course grades in Developmental English was .34. The correlation of Sentence Skills test scores with grades in College English across institutions was .20. The median correlation within colleges of Sentence Skills test scores with grades in College English was .22. The highest correlation of Sentence Skills test scores with final course at a particular college in College English was .34.

Arithmetic

Arithmetic test scores have overall correlations between .31 and .38 with grades in General Mathematics, Arithmetic, Elementary Algebra, and Intermediate Algebra courses. The median correlations within colleges between Arithmetic test scores and these same courses range from .25 to .39.

Elementary Algebra

The Elementary Algebra test scores across institutions have a median correlation of .19 with grades in Elementary Algebra courses. This coefficient reflects substantial restriction in range due to the use of the test in placement in these courses; the standard deviation of test scores contributing to each coefficient is about 14.8, compared with one of 27.7 for all students taking the test. Those taking this test and placing in more advanced courses constitute more proficient but less restricted samples; the mean score range from about 60 for Intermediate Algebra, 86 for College Algebra, and 87 for Pre-calculus, to about 103 for Calculus. Overall correlations of test scores with grades in these courses range from .19 to .38.

College-Level Mathematics

The College-Level Mathematics (CLM) test is intended to place students in courses in Intermediate Algebra, College Algebra, Pre-calculus, and Calculus. The overall correlation of CLM test scores with grades falls in the range from .32 to .49 for these courses. The median within-college CLM test score-course grade correlation for these same courses ranged from .25 to .53.

ACCUPLACER

A-66

THE COLLEGE BOARD

Appendix A Table 12-3: Correlations of Reading Comprehension Scores with Grades in Reading Courses

Correlation N of Colleges N of Students Score Mean Score S.D. Grade Mean Grade S.D. Regression a b Developmental Reading .18 15 940 55.77 15.96 5.05 3.88 2.3960 0.0342

Table 12-4: Results for Individual Colleges2

Developmental Reading .19 N 264 152 134 47 160 46 63 r .03 .38 .04 .25 .20 -.15 .19

Median College I II III IV V VI VII

Table 12-5: Correlations of Sentence Skills Scores with Grades in English Courses Developmental English .15 14 1108 65.04 18.69 4.84 4.02 2.5265 0.0261 College English

Correlation n of Colleges N of Students Score Mean Score S.D. Grade Mean Grade S.D. Regression a b

.20 17 1522 92.50 18.19 5.74 3.88 3.0371 0.0333

Table 12-6: Results for Individual Colleges

Columns of individual institution results are independent of one another; for example, the first entry in one column does not necessarily represent data from the college that provided the first entry in the next column. ACCUPLACER A-67 THE COLLEGE BOARD

2

Appendix A Median College I II III IV V VI VII VIII IX .20 N 96 130 365 62 223 151 R .23 .17 .11 .34 .08 .29 N 444 262 38 220 30 162 61 50 162 .20 r .32 .23 .22 .08 .34 .13 .31 -.03 .21

Table 12-7: Correlations of Arithmetic Scores with Grades in Mathematics Courses General Mathematics .38 18 263 64.27 25.88 5.31 4.16 2.5544 0.0548 Course Elementary Arithmetic Algebra .31 .33 20 19 1118 890 40.05 62.12 16.61 23.01 4.51 5.03 4.28 4.40 2.2593 1.5162 0.0477 0.0512 Intermediate Algebra .38 21 464 79.36 22.76 4.64 4.55 1.5902 0.0605

Correlation N of Colleges N of Students Score Mean Score S.D. Grade Mean Grade S.D. Regression a b

Table 12-8: Results for Individuals Colleges Median College I II III IV V VI VII VIII IX X .25 N 39 49 33 33 r .26 -.05 .25 .54 N 33 74 121 230 81 308 39 55 32 104 .31 r .56 .18 .37 .30 .47 .32 .53 .15 .11 .23 N 73 54 85 72 125 146 37 141 .27 r .35 .17 .32 .55 .28 .08 .02 .26 N 76 229 52 .39 r .61 .21 .39 N 66 66 156 65 .19 r .31 .25 .13 .14

ACCUPLACER

A-68

THE COLLEGE BOARD

Appendix A Table 12-9: Correlations of Elementary Algebra Scores with Grades in Mathematics Courses Elementary Algebra .19 21 1360 39.39 14.80 4.71 4.44 2.9328 0.0295 Intermediate Algebra .33 25 1040 60.19 23.37 4.71 4.36 2.3843 0.0518 Course College Algebra .26 25 866 86.17 18.80 5.02 4.33 3.0474 0.0402 Pre-calculus .38 24 238 86.82 24.67 5.24 4.40 2.8463 0.0569 Calculus .31 20 168 103.29 16.56 5.70 4.00 2.8764 0.0441

Correlation N of Colleges N of Students Score Mean Score S.D. Grade Mean Grade S.D. Regression a b

Table 12-10: Results for Individual Colleges

Median College I II III IV V VI VII VIII IX X XI

.15 N 90 69 30 179 128 229 243 47 188 r .09 -.44 .36 .15 .51 .10 .12 .19 .04 N 88 34 118 81 102 315 31 50 80

.34 r .41 .34 .15 .43 .35 .04 .31 .75 .17 N 73 43 65 40 59 154 31 64 42 74 80

.17 r .34 .43 .14 .17 .43 .16 .42 .12 .12 .12 .47 N 81

r .01 N 47

r .47

ACCUPLACER

A-69

THE COLLEGE BOARD

Appendix A Table 12-11: Correlations of College-Level Mathematics Scores with Grades in Mathematics Courses Course Elementary Algebra .34 20 413 21.22 4.36 4.16 4.10 1.7663 0.0779 Intermediate Algebra .34 27 711 29.13 10.66 5.67 4.28 2.3527 0.0786 College Algebra .32 30 863 36.07 13.62 5.09 4.33 2.6734 0.0711 Precalculus .33 26 250 49.28 19.22 5.91 4.25 3.3004 0.0724 Calculus .49 25 747 61.19 21.08 4.98 3.99 1.6092 0.1027

Correlation N of Colleges N of Students Score Mean Score S.D. Grade Mean Grade S.D. Regression a b

Table 12-13: Results for Individual Colleges Median College I II III IV V VI VII VIII IX X XI .25 N 38 30 66 91 77 r .05 .20 .25 .56 .25 N 96 34 86 66 56 76 114 60 .34 r .33 .47 .40 .37 .11 .35 .32 .32 N 91 37 64 32 37 62 54 67 33 151 71 .35 r .51 .25 .19 .43 .35 .42 .22 .12 .18 .42 .55 N r N 37 241 260 48 .53 r .37 .60 .57 .49

Validity Study for the Levels of English Proficiency (LOEP) Test

The LOEP validity study was part of the LOEP field trial that began in August 1992. The LOEP paper-and-pencil pretests were distributed to colleges and high schools that volunteered to take part in the field trial. Participating institutions were asked to test motivated students who would normally be tested for placement into English and reading courses. The 58 LOEP pretest forms were spiraled, so that in any given class few students were given the same form. Because of this, and because no scores were provided to the participating institutions, LOEP scores could not be used in placing students. Some of the colleges agreed to return test scores and information on students' course placements and grades for the validity study. Participants in the validity study were asked to provide placement and grade information for all the relevant English and reading courses in

ACCUPLACER

A-70

THE COLLEGE BOARD

Appendix A which their students were placed for the fall, 1992 semester. Seventeen colleges and universities took part in the validity study. The institutions are listed in Appendix D. Sample sizes, means, and standard deviations of LOEP scores and course grades from the separate LOEP validation study are presented in Tables 12-14 and 12-15. Table 12-14 provides this information for the total group as well as for men and women. Table 12-15 displays the same information for the English Best Language and English as Second Language groups. A group is included in the table if at least 25 cases were available for analysis. These and the following analyses are based on examinee answers to sex and language background questions administered as part of the tests. The sex group results show very similar mean test scores for men and women, with the largest difference much less than one-tenth of a standard deviation. However, women obtained higher grades in each course level for which a comparison could be made, though the differences were generally small. ESL students obtained mean test scores close to the middle of the 20 to 120 point scale, indicating a good match between the proficiencies of these students and the difficulty of the tests. The English Best Language students in this sample scored much higher on average for many of them a more difficult test would be appropriate as the basis for placement decisions. The comparisons of grades within a course level that could be made for these groups did not suggest a consistent direction of difference. Preparation of data The major steps in preparing the data for analysis included matching the test scores with the grades and placements, coding the course and grade information, and generating comparable LOEP scores for all examines. The different types of records were matched on the basis of the institution ID and the student name. Many colleges offer multiple courses covering content similar in level; for example, there may be two or three (and occasionally many more) developmental or ESL courses available to entering students, distinguished not by the difficulty of the material but by its tailoring to the students' field of study. To assure adequate numbers of cases for analysis within college and to permit combining data cross colleges, courses were coded into 4 levels: developmental, lower level ESL, upper level ESL and college level for Reading, Writing, Vocabulary, Reading and Writing combined and Language Arts courses. (See Table 12-16.) The testing department at each college provided course classifications, with additional information obtained when necessary from the college catalog. All analyses within and across colleges were performed using the course level, rather than the individual course title, as the unit for analysis.

ACCUPLACER

A-71

THE COLLEGE BOARD

Appendix A Table 12-14: Descriptive Statistics for LOEP Scores and Course Grades by Gender and Total Group

Variable TEST SCORES Reading Skills Sentence Meaning Language Use COURSE GRADES Developmental Reading Low ESL Reading Upper ESL Reading Developmental Writing Low ESL Writing Upper ESL Writing College Writing Developmental Vocabulary Low ESL Vocabulary Develop Read & Writing Low ESL Read & Writing Upper ESL Read & Writing Low ESL Language Arts N 470 349 375 Male Mean 85.21 91.61 84.67 SD 26.08 27.48 26.02 N 655 501 516 Female Mean 85.13 91.04 82.85 SD 25.96 27.74 29.28 N 1138 869 898 Total Mean 84.87 90.21 83.28 SD 26.18 28.58 28.16

85 48 69 96 66 254 90 29 26 260

6.44 7.50 7.52 6.90 6.45 4.08 4.62 4.93 2.77 4.53

3.97 3.82 3.35 3.53 3.81 3.17 3.39 3.08 2.34 3.27

184 67 87 130 84 337 129 34 360

8.98 7.91 8.01 7.07 7.51 4.87 5.88 6.06 5.17

2.66 3.42 3.23 3.65 3.61 3.62 3.53 3.17 3.43

270 116 35 158 228 150 594 33 30 220 64 49 622

8.17 7.75 8.11 7.77 6.99 7.05 4.53 7.42 7.03 5.34 5.53 3.67 4.90

3.34 3.58 3.28 3.34 3.59 3.74 3.46 3.14 4.37 3.54 3.16 2.98 3.38

Table 12-15: Descriptive Statistics for LOEP Scores and Course Grades by Language Group

Variable TEST SCORES Reading skills Sentence Meaning Language Use COURSE GRADES Developmental Reading Low ESL Reading Upper ESL Reading Developmental Writing Low ESL Writing Upper ESL Writing College Writing Developmental Vocabulary Low ESL Vocabulary Develop Reading & Writing Low ESL Reading & Writing Upper ESL Read & Writing Low ESL Language Arts English Best Language N Mean SD 480 376 325 100.57 111.97 105.69 16.54 11.05 15.21 N 658 493 573 ESL Mean 73.42 73.61 70.57 SD 25.97 26.65 25.79

228 119 576 25 172

8.41 7.46 4.55 7.44 5.37

3.16 3.33 3.45 3.09 3.57

42 114 35 39 224 147 30 48 60 44 609

6.88 7.76 8.11 8.69 7.02 7.01 7.03 5.23 5.42 3.77 4.90

3.95 3.60 3.28 3.20 3.61 3.76 4.37 3.43 3.21 3.06 3.38

ACCUPLACER

A-72

THE COLLEGE BOARD

Appendix A Table 12-16: Coding of Courses and Levels for LOEP Validity Study

Levels 1 2 3 4 Levels 1 Description Developmental Lower Level English as a Second Language Upper Level English as a Second Language College-Level Courses Reading Writing Vocabulary Reading and Writing Combined Language Arts

2 3 4 5

Students' grades were converted to a numerical scale ranging from 0 (F) to 12 (A+), as illustrated in Table 12-16. Grades of satisfactory/unsatisfactory or pass/fail were not included in the analyses. Such grades provide a poor criterion against which to evaluate the predictive validity of a test score, both because of restriction of range in the criterion and because often their distribution in a course is highly unbalanced, with the great majority of the students receiving the higher of the available grades. Withdrawals and incompletes were included as failing grades. While some do not complete a course for reasons unrelated to the adequacy of academic performance, eliminating students with these codes also would remove many if not most who experienced academic difficulty. Table 12-16: Numerical Coding of Grades Used in the LOEP Validity Study Grade 12 11 10 9 8 7 6 5 4 3 2 1 0 Code A+ A AB+ B BC+ C CD+ D DF, W, I

The sample for the validity study consisted of records for about 3,000 students. Each of these individuals had at least a score3 on one test of the LOEP and a placement and grade in In the LOEP pretest design with 58-spiraled forms, individual students took different sets of items. As a result, the total number right score was not comparable across students. However, a meaningful LOEP score was needed as the predictor variable in the validity study. ACCUPLACER A-73 THE COLLEGE BOARD

3

Appendix A one course. Three institutions contributed grading distributions in some courses very unlike the other colleges. In these institutions, a very high percentage of students failed the developmental courses. For example, in one case 47 of 53 students failed a developmental course. The results from these three colleges were not included in the across-colleges analyses; they are, however, included in the within-institution results.

LOEP Correlational Results

Tables 12-17 through 12-28 present analyses of the relations of LOEP test scores in Reading Skills, Sentence Meaning, and Language Use with course grades. The same organization is used in each pair of tables. In the first table in each pair, for each course level examined, the correlation (merging across colleges the data from all students), the number of colleges (n) whose students provided data, the number of students (N) on whom the coefficient is based, and the sample means and standard deviations of the grades and test scores are presented. Each within-discipline combination of grades and test scores for which at least 25 cases were available is included in the tables. Next, the regression coefficients for predicting the grade from the test score are given, again, based on data merged across all colleges. In the second table in each pair the institution-by-institution correlations obtained from analyses within individual colleges, utilizing data from each institution for which at least 25 cases were available for the test-course combination are given, followed by the median correlations, where appropriate. Note that in the first table in each pair the number of colleges may exceed the number displayed in the second table. This is primarily due to the colleges that contributed less than 25 students to the overall analysis. Occasionally, the difference is caused by the presentation of within college results for three colleges whose results were not used in the overall analysis. The analyses for individual institutions generally include the majority of the available cases, but an appreciable number of students come from institutions providing smaller numbers of cases. Sentence Meaning: For ESL students, the correlations of Sentence Meaning test scores (Table 12-17) with grades across institutions in Low ESL Writing, Upper ESL Writing and Low ESL Language Arts courses were .16, .11 and .22, respectively. The median correlation within colleges of Sentence Meaning test scores with grades in Low ESL Language Arts was .31. The highest correlation between the Sentence Meaning test scores and a course grades at a particular institution was .53 in a Low ESL Language Arts course. For English Best Language students, the correlations of Sentence Meaning test scores with grades across institutions in Developmental Writing, Developmental Reading, College Writing, and Developmental Reading and Writing courses were .34, .11, .29 and .09, respectively. The median correlation within colleges of Sentence Meaning test scores with grades in Developmental Writing was .24. The

To arrive at such a score, IRT ability parameters, which are comparable across students, were obtained. These ability parameters were used along with IRT item parameters from an item calibration run. The three-parameter logistic IRT model was used to calculate the probability of each examinee correctly answering each item. For each student, these probabilities were summed across items in each test pool (see equation 3-1). The summed probabilities were then put on the ACCUPLACER 20 to 120-point scale through a linear transformation. As a result, each student who participated in the validation study received an estimated LOEP scaled score.

ACCUPLACER

A-74

THE COLLEGE BOARD

Appendix A highest correlation between the Sentence Meaning test scores and course grades at a particular institution was .34 in a Developmental Writing course. Table 12-17: Correlations for ESL Students of LOEP Sentence Meaning Scores with Grades in English Courses Course Correlation N of Colleges N of Students Grade Mean Standard Deviation Score Mean Standard Deviation Regression a b Low ESL Writing .16 2 79 6.17 4.25 81.88 20.10 .0335 4.36 Upper ESL Writing .11 2 29 7.90 3.42 87.34 16.01 .0231 5.88 Low ESL Language Arts .22 3 184 5.08 3.24 59.23 24.56 .0289 3.37

Table 12-18 Low ESL Writing College 4 5 8 10 Median N 57 r .13 Results for Individual Colleges Upper ESL Writing N r Low ESL Language Arts N r 45 131 .08 .53 .31

ACCUPLACER

A-75

THE COLLEGE BOARD

Appendix A Table 12-19: Correlations for English Best Language Students of LOEP Sentence Meaning Scores with Grades in English Courses Developmental Writing .34 1 51 3.05 3.58 111.49 9.43 .1701 -11.52 Course Developmental College Reading Writing .29 .11 2 2 39 199 8.72 4.37 3.52 3.45 112.95 113.51 15.41 25.86 .0260 5.78 Table 12-20 Results for Individual Colleges Developmental College Reading Writing N r N r 184 .27 .1736 -15.34 Developmental Reading & Writing .09 2 60 5.53 3.28 108.13 17.52 .0160 3.80

Correlation N of Colleges N of Students Grade Mean Grade S.D, Score Mean Score S.D Regression a b

College 1 2 6 9 10 11 Median

Developmental Writing N r 75 .24 68 51 147 .24 .23 .34 .19

Developmental Reading & Writing N r

56

.07

Reading Skills: For ESL students, the correlations across institutions of Reading Skills test scores (Table 12-20) with grades in Developmental Reading, Low ESL Reading, Low ESL Vocabulary and Low ESL Language Arts courses were .10, .41, .41 and .19, respectively. The median correlation within institutions of Reading Skills test scores with grades in Low ESL Language Arts (the one course for which there is more than one within-institution sample large enough for separate analysis) is .24. The highest correlation between the Reading Skills test scores and course grades at a particular institution was .41 in a Low ESL Vocabulary course. For English Best Language students, the correlations across institutions of Reading Skills test scores with grades in Developmental Reading, College Writing, and Developmental Reading and Writing courses were .01, .37, and .18, respectively. The median correlation with institutions of Reading Skills test scores with grades in Developmental Reading and Writing is .20. The highest correlation between the Reading Skills test scores and course grades at a particular institution was .37 in a College Writing course.

ACCUPLACER

A-76

THE COLLEGE BOARD

Appendix A Table 12-21: Correlations for ESL Students of LOEP Reading Skills Scores with Grades in Reading-Related Courses Course Developmental Writing .10 3 30 6.80 4.09 85.77 24.32 .0174 5.31 Low ESL Reading .41 3 89 7.69 3.63 73.31 21.21 .0697 2.57 Table 12-22 Results for Individual Colleges Low ESL Low ESL Reading Vocabulary N r N r 25 .41 .32 42 Low ESL Vocabulary .41 1 25 7.12 4.22 67.92 23.82 .0718 2.24 Low ESL Language Arts .19 3 215 4.53 3.38 60.37 24.07 .0274 2.87

Correlation n of Colleges N of Students Grade Mean Grade SD Score Mean Score SD Regression a b

College 3 4 5 8 Median

Developmental Writing N r

Low ESL Language Arts N r 60 136 .24 .19 .29

Table 12-23: Correlations for English Best Language Students of LOEP Reading Skills Scores with Grades in Reading-Related Courses

Correlation N of Colleges N of Students Grade Mean Grade SD Score Mean Score SD Regression a b

ACCUPLACER

Developmental Writing .01 3 149 8.28 3.09 100.28 17.60

.00024 8.04

Course College Writing .37 1 209 4.79 3.44 102.45 15.92

.0807 -3.47

Developmental Reading & Writing .18 2 71 5.32 4.01 99.01 15.01

.0471 .66

A-77

THE COLLEGE BOARD

Appendix A Table 12-24 Results for Individual Colleges Developmental College Writing Developmental Reading Reading & Writing r N r N r N 209 .37 27 .28 127 .11 .11 44 .20

College 2 7 9 11 Median

Language Use: For ESL students, the correlations of Language Use test scores (Table 12-25) with grades in Low ESL Writing, Upper ESL Writing, Low ESL Reading and Writing and Low ESL Language Arts courses were .26, .48, and .28, respectively. The median correlations within colleges of Language Use test scores with grades in Low ESL Language Arts and Low ESL Writing courses were .43 and .26, respectively. The highest correlation between the Language Use test scores and course grades at a particular institution was .53 in a Low ESL Language Arts course. For English Best Language students, the correlations of Language Use test scores (Table 12-27) with grades in Developmental Writing, College Writing, and Developmental Reading and Writing were .05, .23 and .12, respectively. The median correlation within colleges of Language Use test scores with grades in Developmental Writing was .16. The highest correlation between the Language Use test scores and course grades at a particular institution was .25 in a Developmental Writing course. Table 12-25: Correlations for ESL Students of LOEP Language Use Scores with Grades in English Courses Low ESL Writing .26 3 105 5.13 4.35 76.74 21.79 .0366 5.00 Upper ESL Writing .48 3 41 6.78 3.89 88.17 13.80 .1354 -5.16 Course Low ESL Reading & Writing .16 2 30 5.77 3.39 62.67 28.59 .0185 4.61 Low ESL Language Arts .28 3 210 5.13 3.48 63.50 26.58 .0361 2.84

Correlation N of Colleges N of Students Grade Mean Grade SD Score Mean Score SD Regression a b

ACCUPLACER

A-78

THE COLLEGE BOARD

Appendix A Table 12-26 Low ESL Writing N r 63 .21 55 .26 .31 Results for Individual Colleges Upper ESL Low ESL Writing Reading & Writing Low ESL Language Arts N r 31 -.22 55 .53 124 .43 .43

College 4 5 8 10 Median

Table 12-27: Correlations for English Best Language Students of LOEP Language Use Scores with Grades in English Courses Course Developmental Writing .05 1 46 7.74 3.00 105.25 13.57 .00087 6.85 College Writing .23 2 168 4.46 3.47 108.33 10.35 .0755 -3.72 Developmental Reading & Writing .12 1 41 5.22 3.25 105.49 8.96 .0450 0.47

Correlation N of Colleges N of Students Grade Mean Grade SD Score Mean Score SD Regression a b

Table 12-28 Results for Individual Colleges Developmental Reading N r 78 .13 78 46 81 .16 .25 .05 .19 College Writing N 153 r .21 Developmental Reading & Writing N r

College 1 2 6 9 10 11 Median

41

.12

ACCUPLACER

A-79

THE COLLEGE BOARD

Appendix A

Differential Predictive Validity

Bias in testing can occur on the item or test level. In this section, evidence for freedom from test bias is presented. The data for these analyses are from the aforementioned study of the predictive validity of ACCUPLACER Tests across 50 institutions that occurred from January 1990 to early 1992. To assure adequate numbers of cases for analysis within colleges and to permit combining data across colleges, courses were coded into levels--10 levels for mathematics, 5 for reading, and 4 for English. All analyses within and across colleges were performed using the course level, rather than the individual course title, as the unit for analysis. As described earlier, students' grades were converted to a numerical scale ranging from 0 (F) to 12 (A+). Grades of satisfactory/unsatisfactory or pass/fail were excluded from the coding. Withdrawals and incompletes were included as failing grades. The sample of these differential analyses consisted of records for 6,323 students. Each of these individuals had a score on at least one ACCUPLACER module, a grade in one course, and self-reported sex and/or ethnic group membership.

Differential Predictive Validity Results

Correlation and regression analyses for each test-by-course combination were performed separately by for men and women and by ethnic group membership. An overall analysis was run for each group having at least 30 cases in the sample combined across colleges; correlational analyses were done within an institution whenever both sex groups or two ethnic groups provided at least 25 cases each for a given test-course combination. In the overall results for sex, there were 22 test score-course combinations for which a comparison could be made. (CLM as a predictor of General Mathematics grades was one combination, Elementary Algebra predicting Arithmetic grades was a second, and so on.) Combined sample sizes for these analyses range from a low of 67 to a high of 615. Correlations with grades were higher for women in 13 instances and for men in 9. Three of the comparisons showed a group difference significant at the .05 level of confidence, with higher coefficients for females in two of these comparisons and for males in the third. Twenty-five within-institution comparisons were available for sex. In 15 of these, females showed a higher correlation with grades, while the correlation was higher for males in the other 10. Five of the differences, each representing a different test-course pair, reached statistical significance, two of them showing higher correlations for females and three for males. The results across institutions for ethnic group included 14 instances in which samples of Hispanic and White students could be compared: the correlations were higher in 10 for the Hispanic students, significantly so in three cases. Seven comparisons of Black and White students could be made, with five of them showing higher correlations for Black students, but none reaching statistical significance. Nine within-institution comparisons could be made, generally limited to contrasting the results for Hispanic and White students. The coefficient was higher for Hispanic students in seven of eight comparisons; no differences reached statistical significance. Ethnic group correlational analyses were also carried out contrasting those who identified themselves as White with those who indicated any other ethnic group membership. In 20 comparisons for the samples combined across colleges, 13 showed higher correlations for Nonwhites, 6 for Whites, and 1, no difference, with none of the differences reaching

ACCUPLACER

A-80

THE COLLEGE BOARD

Appendix A significance. In 16 within-institution comparisons, 9 showed higher correlations for Nonwhites, with one of these significant, while 7 show higher correlations for Whites, with one reaching statistical significance. The results of these analyses suggest that there are no systematic differences associated with sex or ethnic group membership in the relation of the ACCUPLACER test scores to grades. Thus, the predictive validity results noted earlier appear to hold across the sex and ethnicity groups studied. LOEP: Differential Predictive Validity For the LOEP tests, the possibility of differential validity in the prediction of course grades from test scores was checked by comparing predictive validity coefficients across men and women. Correlations were only calculated when the group had at least 25 members with the relevant test score-course grade combination. For each test score, the correlations were calculated across institutions for each gender group by course. For differential prediction, within each course, least squares regression analyses were used to predict course grades from test scores. The analyses were conducted with data from the total group in the course, and each of the gender groups separately. As a result, for each course separate regression equations were produced for the total group and each gender group. Regression analyses were only conducted when the gender group had more than 25 members with the relevant test score-course grade combination. Because significance tests of slopes and intercepts do not directly correspond to important prediction differences, analyses of differential prediction were conducted. Predictions were made by using the total group mean test score in the regression equations for the total group and for each gender group. That is, the regression equation based on the data from the total group was used to predict a course grade using the total group test score mean. This prediction was compared to the prediction with the same total group mean score from the regression equation based on data from each gender group. Differential prediction was defined as the difference between the course grade predicted from the total group equation and the course grade predicted from the gender group prediction equation. These differences were weighted by the number of students in the course and summed across institutions for each test score-course grade combination. Use of the total group mean in both regression equations controlled for any ability differences between the different gender groups and the total group. Use of regression equations rather than individual data points to determine over- or under-prediction should produce better generalizability of the results. Table 12-29 indicates that the correlation coefficients across institutions are higher for the female group than the male group in 9 of the 13 test score-course grade combinations. Note that these results combine across ESL and English Best Language groups. The validity coefficients were higher for the male group using any of the three test scores to predict grades in the Lowest ESL Language Arts course. The validity coefficients were higher for the females in most of the other courses.

ACCUPLACER

A-81

THE COLLEGE BOARD

Appendix A Table 12-29: Validity Coefficients for Men and Women for LOEP Test Scores Male Course Reading Skills Developmental Reading Develop Read & Writing Low ESL Reading Low ESL Language Arts Sentence Meaning Developmental Writing College Writing Low ESL Writing Low ESL Language Arts Develop Read & Writing Language Use Developmental Writing College Writing Low ESL Writing Low ESL Language Arts N 61 31 37 95 26 88 31 74 39 28 67 46 91 r .02 .19 .32 .23 .08 .16 -.02 .26 -.19 -.15 .10 .22 .31 N 118 61 52 125 43 120 49 114 34 26 105 61 121 Female r .18 .00 .45 .19 .12 .35 .18 .24 .45 .37 .24 .27 .28

Table 12-30 displays the results of the analyses to determine over- or under-prediction of course grades by test scores for the different gender groups. Positive differences indicate over-prediction of course grades for a gender group and negative differences show underprediction of course grades. Table 12-30 shows that for male students most test scores over-predict course grades when using the prediction equation based on the total group versus the prediction equation based on the male group. The results indicate that this over-prediction seldom exceeds one point. (Grades were on a 0 to 12 point scale.) Under-prediction occurred in only one test score-course combination for male students, especially when the Language Use test scores were used to predict course grades in Low ESL Writing. The reverse was true for female students, where course grades were consistently under-predicted by all three test scores. The under-prediction was always below .70 of a point and was usually around the .30 range. On a 12-point scale, this under-prediction is very small.

ACCUPLACER

A-82

THE COLLEGE BOARD

Appendix A Table 12-30: Over- or Under-Predictiona of Course Grades for Gender Subgroups with Total Versus Group Regression Equations on LOEP Test Scores Course Reading Skills Developmental Reading Low ESL Reading Develop Read & Writing Low ESL Language Arts Sentence Meaning Developmental Writing Low ESL Writing College Writing Develop Read & Writing Low ESL Language Arts Language Use Developmental Writing Low ESL Writing College Writing Low ESL Language Arts Male 1.187 0.509 1.352 0.459 0.373 1.101 0.535 0.296 0.346 0.119 -0.421 0.319 0.510 Female -0.538 -0.345 -0.658 -0.314 -0.327 -0.653 -0.397 -0.681 -0.202 -0.090 0.311 -0.171 -0.330

Over- and under-prediction is calculated as (Total Group Course Prediction) - (Gender Group Course Grade Prediction). Positive numbers indicate over-prediction of course grades for the gender group and negative numbers indicate under-prediction for the gender group.

a

Summary for LOEP Validity Study The validity study of the Levels of English Proficiency (LOEP) Test, carried out in 17 two- and four-year colleges, used records for about 3,000 students, each of whom had a score on at least one LOEP test and a placement and grade in at least one relevant English or reading course. Courses were coded by level; grades were converted to a numerical scale ranging from 0 (Fall, including Withdraw and Incomplete) to 12 (A+). Most correlations between LOEP test scores and relevant course grades were positive and moderate. Average results for the total group and for individual colleges were similar. For ESL students, the total group test score-course grade correlations fell in the range around .20, somewhat higher for Reading Skills and Language Use, and lower for Sentence Meaning. Three of the 11 correlations were above .40. For English Best Language students, the test score-course grade correlations typically were in the range from .10 to .20. When one considers the poor reliability of course grades and the relatively large restriction in the range of the test scores that exists in developmental courses, these correlations are acceptable. If these correlations were corrected for range restriction, they would be higher. For example, the correlation between Reading Skills test scores and course grades for English Best Language students in Developmental Reading and Writing was .18. The Reading Skills standard deviation for the total group was 26.18, and for the group in the course, 15.01. When corrected for this range restriction, the correlation increased to .30. Median correlations across relevant courses for Reading Skills and Language Use test scores indicate they predict course grades somewhat better for ESL than English Best ACCUPLACER A-83 THE COLLEGE BOARD

Appendix A Language students. The opposite was true for the Sentence Meaning test scores. However, comparisons between ESL and English Best Language students were clouded by a number of factors. For example, relatively large groups of both kinds of students were found only in the Developmental Writing course. The students in the other courses were predominantly either ESL or English Best Language students. Moreover, the tests were very appropriate in difficulty for the ESL students, but easier than would be ideal for the English Best Language students in the sample. The differential validity and prediction results indicated relatively small differences in the direction of higher correlations and under-prediction of course grades for females. However, generally the results gave no indication of practically important differences between females and males in the relation of the test scores to grades.

Other ACCUPLACER Validity Studies

In addition to the multi-institution research conducted by The College Board, numerous validity studies have been conducted on ACCUPLACER at specific institutions, or at a group of institutions, such as the state or county levels. The College Board encourages institutions to conduct such studies and offers assistance in carrying them out. The exact number of ACCUPLACER validity studies conducted at institutions is not known, because not all institutions report their results to the College Board. However, several institutions acquire assistance from the College Board in conducting their studies or send a report to the Board when it is completed. In this section, we summarize some of these studies. Table 12-31 lists seven ACCUPLACER validity studies conducted since the College Board's 50-institution study. For each study, a citation, the specific subtests studied, overall sample sizes, and abbreviated conclusions are presented. One or more institutions conducted four of the studies; the College Board in cooperation conducted the other three with one or more institutions. The studies are best described as concurrent validity studies (2), predictive validity studies (1), or both (4). An inspection of the results in Table 12-31 indicates that when ACCUPLACER scores are correlated with scores from similar tests, the concurrent validity coefficients tend to be high (i.e., above .60). The correlations of ACCUPLACER scores with overall GPA were also high (.41 to .84). Three studies gathered data on placement accuracy (using either teachers' ratings or grades as the validation criterion). In these studies, placement decisions based on ACCUPLACER scores "agreed" with placements made using the validity criterion 69-90% of the time.

ACCUPLACER

A-84

THE COLLEGE BOARD

Appendix A Table 12-31: Summary of Selected ACCUPLACER Validity Studies Validity Study Napoli & Wortman (1995) Brookdale CC (1996, February) Cole, Muenz, & Bates (1998) Napoli (1998) Type(s) of Validity Predictive validity Concurrent, Predictive Predictive validity, DPV Concurrent Location Suffolk, CC (NY) Lincroft, New Jersey 2 Midwest CCs Suffolk, CC (NY) Tests Studied RC Overall Sample Size 16,000 Results and Conclusions RC&GPA r = .41; RC & Psych r = .52; Placement agreement range = 69-77% Concurrent r's w/ NJCBST range = .74-.90; Placement accuracies range from 74%-93% RC&GPA r =.84; magnitude of PV increased with age of cohort. AR.&EA r with local math test =.33-.45; Concurrent r's range =.68.71 (.74-.80 after correction for range restriction); Average placement agreement = 64% Average placement accuracies: RC=79%; RC w/ SS = 86%; AR w/ EA = 80%; CLM=90% RC&DRP r=.80; WP&DRP r =.41; AR r range =.18-.35; EA r range =.25-.40

AR, EA, RC, SS

976

RC AR, EA

4,298 642

College Board (1999, Concurrent May) College Predictive, Board (1999, Consequentia November) l College Concurrent; Board (2000, standard June) setting

Tennessee

AR, EA, RC AR, EA, CLM, RC, SS, LOEP AR, EA, RC, WP

3,800

California National Louis University

29,000

1,450

Table Notes: AR=Arithmetic, EA=Elementary Algebra, CLM=College Level Math, LOEP=Levels of English Proficiency, RC=Reading Comprehension, SS=Sentence Skills, WP= WritePlacer Plus; CC=Community College, DPV=Differential Predictive Validity.

In general, the results of the validity studies conducted on ACCUPLACER since the College Board's 1992 multi-institution study provide evidence that ACCUPLACER scores are correlated with other assessments measuring similar subject areas and that the decisions made on the basis of ACCUPLACER scores are congruent with placement decisions that were made using other criteria. Clearly, more research on the validity of inferences made on the basis of ACCUPLACER scores is warranted. However, the studies that have been conducted tend to support the validity of the use of ACCUPLACER for making course placement decisions.

ACCUPLACER

A-85

THE COLLEGE BOARD

Appendix A

Differential Item Functioning

In addition to studies conducted at the ACCUPLACER test score level, two major empirical studies have been conducted to investigate the degree to which specific ACCUPLACER items may be problematic or cause an unfair advantage or disadvantage to particular groups of examinees. These studies focused on detection of differential item functioning (DIF), which refers to the situation when an item is more difficult for one group of examinees relative to another group of equally proficient examinees. Although the sensitivity review process described in Chapter 2 goes a long way toward eliminating potentially unfair test material, such reviews cannot catch all potential problems. DIF analyses provide a complementary, statistical means for evaluating item fairness. Specifically, DIF studies evaluate the statistical characteristics of items across different groups of examinees. Rather than focus on "overall" item statistics, DIF techniques are conditional. As Dorans and Holland (1993) pointed out, "In contrast to impact, which often can be explained by stable consistent differences in examinee ability distributions across groups, DIF refers to differences in item functioning after groups have been matched with respect to the ability or attribute that the item purportedly measures" (p. 37). In DIF analyses, test-takers from different groups are matched on the psychological attribute measured, and the probability of differential responses across matched test-takers is evaluated. Items are considered to be functioning differentially across groups if the probability of a particular response differs significantly across test-takers who are equivalent (i.e., matched) on proficiency (see Clauser & Mazor, 1998, or Holland & Wainer, 1993 for more complete descriptions of DIF theory and methodology). It should be pointed out that if items are flagged for DIF, it does not mean that the items are biased. Item bias is present when the statistical occurrence of DIF can be explained after a thorough inspection of the item within the context of understanding the characteristics of the examinee groups involved in the analysis. Thus, DIF is a necessary, but insufficient condition for item bias. Typically content specialists inspected items flagged for DIF to determine if they may contribute to unintended bias in test scores.

ACCUPLACER DIF Studies

There have been two major evaluations of DIF on ACCUPLACER exams. The first set of studies, reported in College Board (1993), evaluated DIF on the Reading Comprehension, Sentence Skills, Arithmetic, Elementary Algebra, and College-Level Mathematics tests. The examinee group comparisons studied were Male/Female, Black/White, Hispanic-Latino/White, Asian-Pacific Islander/White, and American Indian/White. The second set of studies (Sireci, 2001; Sireci, 2002) focused on the three LOEP exams and evaluated DIF across Male/Female, Hispanic-Latino/Non-Hispanic-Latino, and Asian-Pacific Islander/Non-Asian-Pacific Islander examinee groups. The different groups compared in these studies were a function of the characteristics of examinees that took the exams and numbers of examinees in each group in the databases. These studies are summarized below.

ACCUPLACER

A-86

THE COLLEGE BOARD

Appendix A

Evaluating DIF on the Reading Comprehension, Sentence Skills, Arithmetic, Elementary Algebra, and College-Level Mathematics Tests

The College Board (1993) study used the Mantel-Haenszel DIF (MH-DIF) detection method to identify potentially problematic items. This procedure was first applied in the DIF context by Holland (1985) to detect items that function differently in two groups of examinees and was formally introduced in Holland and Thayer (1988). In the MH-DIF detection method, the performance of a focal group is compared with that of a reference group after matching the two groups on a relevant criterion variable (usually the total test score) for each item on the test. The criterion for matching examinees across the two relevant groups was ACCUPLACER Total Right score (see Chapter 3). The Mantel-Haenszel DIF statistics (MH-DIF) produced in 4 the analyses are on the ETS delta scale , with a value of 1.0 meaning the item is one delta point more difficult for the reference group. Using the MH D-DIF statistics and their related significance test, all items were classified into three categories, A, B, or C, defined as follows:

A DIF Items: MH D-DIF not significantly different from zero OR absolute value less than 1.0; B DIF Items: MH D-DIF significantly different from zero and absolute value of at least 1.0 AND EITHER less than 1.5 OR not significantly greater than 1.0; C DIF Items: MH D-DIF significantly greater than 1.0 AND absolute value 1.5 or more.

The B and C items were further classified into positive or negative categories, depending on the sign of the MH-DIF statistics. Positive MH D-DIF statistics favor the focal group (e.g., Females, Blacks, Hispanic-Latinos) and negative statistics favor the reference (e.g., Males, Whites) group. The use of these categories reduces the likelihood that non-DIF items will be flagged due to chance (i.e., Type I error). Only examines who indicated that English was their first language were included in these DIF analyses. Minimum sample sizes were set at 200 examines in the focal group and 600 examines in the combined focal and reference groups. Because of the nature of the adaptive test, not all items were administered to the examinees. The items that were analyzed for DIF were those administered to sufficient numbers of examinees. After consideration of minimum sample sizes and whether the items were actually administered to examinees, all of the ACCUPLACER tests had some items analyzed for all of the comparisons except the White/Asian American comparison. For the Male/Female comparison, approximately 50 percent of the items administered on each test were analyzed. For the White/Black and White/Hispanic comparisons, approximately 31 percent of the items on each test were analyzed. Only about 7 percent of the items on each test were analyzed for the White/American Indian comparison. Because of sample size requirements, no items were analyzed for the White/Asian American comparison. However, all the items for the CLM were analyzed for DIF for all comparisons. The results of the DIF analyses on these tests are summarized in the Tables 12-32 to 12-36 below. The majority of the items exhibited A DIF (i.e., negligible DIF) and very few C DIF items were discovered. All C DIF items were removed from the pool. The delta scale has a mean of 13 and a standard deviation of 4. Delta-values are transformations of the proportion of examinees answering an item correctly. Delta is equal to 13-4z, where z is the unit normal deviate corresponding to the proportion of examinees answering an item correctly. ACCUPLACER A-87 THE COLLEGE BOARD

4

Appendix A

Table 12-32: Differential Item Functioning (MH D-IF) Summary Reading Comprehension Test Category of Maximum Absolute DIF Value for all Comparisons DIF Category +C +B A -B -C Total

1 2 3

Male/ Female Number

2

Comparisons White/ White/ Black Hispanic

White/ Am. Ind.

3

1

% of Items 3.7 6.1 81.7 7.3 1.2 100.0

Number of Items by DIF Category 0 2 78 2 0 82 3 3 56 4 1 67 0 0 59 1 0 60 0 0 17 0 0 17

3 5 67 6 1 82

A positive (negative) DIF category indicates that the item is easier (more difficult) for the focal (minority) Each item is identified in only one DIF category. If the item was flagged for more than one comparison analysis, then the largest absolute DIF value indicates its category across all comparisons. Due to the nature of an adaptive test, the total number of items eligible for analysis differs for each comparison.

Table 12-33: Differential Item Functioning (MH D-IF) Summary Sentence Skills Test Category of Maximum Absolute DIF Value For all Comparisons DIF Category +C +B A -B -C Total

1 2 3

Male/ Female Number2 3 7 58 6 2 76 % of Items 3.9 9.2 76.3 7.9 2.6 100.0

Comparisons White/ White/ Black Hispanic

White/ Am. Ind.

1

Number of Items by DIF Category3 1 1 72 1 0 75 2 4 57 4 2 69 0 4 50 3 0 57 0 1 19 0 0 20

A positive (negative) DIF category indicates that the item is easier (more difficult) for the focal (minority) group than for the reference (non-minority) group after matching. Each item is identified in only one DIF category. If the item was flagged for more than one comparison analysis, then the largest absolute DIF value indicates its category across all comparisons. Due to the nature of an adaptive test, the total number of items eligible for analysis differs for each comparison.

ACCUPLACER

A-88

THE COLLEGE BOARD

Appendix A Table 12-34: Differential Item Functioning (MH D-IF) Summary Arithmetic Skills Test Category of Maximum Absolute DIF Value For All Comparisons DIF Category +C +B A -B -C Total

1 2 3

Comparisons Male/ White/ White/ White/ Female Black Hispanic Am. Ind. Number2 8 11 46 6 5 76 % of Items 10.5 14.5 60.5 7.9 6.6 100.0 Number of Items by DIF Category3 3 6 0 0 6 4 3 1 64 41 52 6 1 7 3 1 2 3 0 0 76 61 58 8

1

A positive (negative) DIF category indicates that the item is easier (more difficult) for the focal (minority) group than for the reference (non-minority) group after matching. Each item is identified in only one DIF category. If the item was flagged for more than one comparison analysis, then the largest absolute DIF value indicates its category across all comparisons. Due to the nature of an adaptive test, the total number of items eligible for analysis differs for each comparison.

Table 12-35: Differential Item Functioning (MH D-IF) Summary Elementary Algebra Test Category of Maximum Absolute DIF Value For All Comparisons DIF Category +C +B A -B -C Total

1 2 3

Comparisons Male/ White/ White/ White/ Female Black Hispanic Am. Ind. Number2 1 8 45 4 2 60 % of Items 1.7 13.3 75.0 6.7 3.3 100.0 Number of Items by DIF Category3 0 1 0 0 3 5 1 1 53 27 31 7 3 2 1 0 1 2 0 0 60 37 33 8

1

A positive (negative) DIF category indicates that the item is easier (more difficult) for the focal (minority) group than for the reference (non-minority) group after matching. Each item is identified in only one DIF category. If the item was flagged for more than one comparison analysis, then the largest absolute DIF value indicates its category across all comparisons. Due to the nature of an adaptive test, the total number of items eligible for analysis differs for each comparison.

ACCUPLACER

A-89

THE COLLEGE BOARD

Appendix A Table 12-36: Differential Item Functioning (MH D-IF) Summary College-Level Mathematics Test Category of Maximum Absolute DIF Value For all Comparisons IF Category +C +B A -B -C Total

1

Male/ Female Number 14 20 73 9 4 120

2

Comparisons White/ White/ Black Hispanic

White/ Am. Ind.

3

% of Items 11.7 16.7 60.8 7.5 3.3 100.0

Number of items by DIF Category 0 4 113 2 1 120 7 11 49 7 1 75 0 5 29 1 0 35

10 15 41 5 4 75

2 3

A positive (negative) DIF category indicates that the item is easier (more difficult) for the focal (minority) group than for the reference (non-minority) group after matching. Each item is identified in only one DIF category. If the item was flagged for more than one comparison analysis, then the largest absolute DIF value indicates its category across all comparisons. Due to the nature of an adaptive test, the total number of items eligible for analysis differs for each comparison.

LOEP DIF Study

Sireci (2001) investigated DIF across females and males on the LOEP tests and Sireci (2002) investigated DIF across examinee groups defined by ethnicity (i.e., Asian-Pacific Islander & Hispanic-Latino). Given that these tests measure English proficiency, samples of examinees were only of sufficient size for the Asian-Pacific Islander and Hispanic-Latino examinee groups. In these studies, the logistic regression DIF (LR-DIF) detection method (Swaminathan & Rogers, 1990) was used. This method was selected because it can detect both uniform and non-uniform DIF5 and because simulation studies have shown that LR-DIF has acceptable power and control over type I error, particularly when used in conjunction with an effect size measure (Jodoin, 1999; Jodoin & Gierl, 2001; Rogers & Swaminathan, 1993; Swaminathan & Rogers, 1990; Zumbo, 1999; Zumbo & Thomas, 1996). Also, since the initial DIF studies conducted on ACCUPLACER tests, recent improvements have been made in LR-DIF effect size measures and they appear to be comparable to those used in most applied DIF investigations (Jodoin & Gierl, 2001). Thus, LR-DIF was an appropriate measure for evaluating DIF on LOEP tests. To flag LOEP items for DIF, an omnibus analysis that tested for both uniform and nonuniform DIF was conducted (Jodoin & Gierl, 2001; Swaminathan & Rogers, 1990; Zumbo, Uniform DIF occurs when the probability of getting an item correct is higher for one group across the entire proficiency continuum. Non-uniform DIF occurs when there is a difference in probability in getting the item correct across the groups, but the direction or magnitude of the difference is not consistent across the entire continuum. Many studies have shown that both types of DIF can occur on educational tests. ACCUPLACER A-90 THE COLLEGE BOARD

5

Appendix A 1999). The R-squared delta (RSD) effect size measure (Jodoin & Gierl, 2001) was used to classify items flagged as DIF into one of three categories described earlier (A DIF, B DIF, or C DIF). The LOEP data analyzed came from LOEP exams administered throughout the United States from approximately June 2000 through April 2001. Approximately 50,000 examinees took at least one of the three LOEP subtests: Sentence Meaning, Language Usage, or Reading Skills. However, examinees were only administered 20 of the approximately 200 items in each sub-test item bank and not all examinees provided data regarding their sex or ethnicity. Thus, the sample sizes for some of the analyses were unfortunately small. To ensure the analyses were not biased by the vagaries of small sample sizes, any items that were not taken by at least 50 focal and reference group examinees were omitted from the analyses.

LOEP DIF Results

Tables 12-37 to 12-39 summarize the results of the LOEP DIF analyses. The data in these tables report the number of B and C DIF items for each comparison stratified by uniform and nonuniform DIF. Where uniform DIF was observed, the direction of DIF is also provided. The results indicate that very few LOEP items were flagged for DIF. Across all 1,710 comparisons (570 items x 3 group comparisons), only 16 items (0.9%) were flagged for large DIF and only nine of these exhibited uniform DIF. No patterns of DIF against any particular group of examinees emerged. Although substantive explanations for why some of these items functioned differentially across groups could not be made, all 16 items were dropped from the operational item pools. Table 12-37: Summary of LR-DIF Results on LOEP Sentence Meaning DIF Category C B Total (%) Comparison Type U N U N Male/Female 1M 4 5 (3) HispanicLatino/Non-Hisp. 2H 1 3H, 1R 3 10 (6) AsianPacific/NonAsian-Pacific 1R 1 2A, 2R 1 7 (4)

Total (%) 4 (2) 2 (1) 8 (5) 8 (5) 22 (13)

Notes: U=Uniform DIF, N=Non-uniform DIF. Letters for Uniform DIF indicate DIF direction (e.g., 3H, 3R indicates 3 items favoring Hispanic-Latino group and 3 items favoring Reference group).

ACCUPLACER

A-91

THE COLLEGE BOARD

Appendix A Table 12-38: Summary of LR-DIF Results on LOEP Language Usage DIF Category C B Total (%) Comparison Type U N U N Male/Female 0 1 0 2 3 (2) HispanicLatino/Non-Hisp. 1R 1 1H, 2R 6 11 (6) AsianPacific/NonAsian-Pacific 1R 1N 1R 2 5 (3)

Total (%) 2 (1) 3 (2) 4 (2) 10 (5) 19 (10)

Notes: U=Uniform DIF, N=Non-uniform DIF. Letters for Uniform DIF indicate DIF direction (e.g., 1H, 2R indicates 1 items favoring Hispanic-Latino group and 2 items favoring Reference group).

Table 12-39: Summary of LR-DIF Results on LOEP Reading Skills DIF Category C B Total (%) Comparison Type U N U N Male/Female 1M, 1F 2 1F 4 9 (4) HispanicLatino/Non-Hisp. 1H 0 2H, 2R 6 11 (5) AsianPacific/NonAsian-Pacific 0 0 1R 3 4 (2)

Total (%) 3 (1) 2 (1) 6 (3) 13 (6) 24 (11)

Notes: U=Uniform DIF, N=Non-uniform DIF. Letters for Uniform DIF indicate DIF direction (e.g., 2H, 3R indicates 2 items favoring Hispanic-Latino group and 3 items favoring Reference group).

Summary of ACCUPLACER Validity Evidence

The vast and comprehensive analyses conducted on ACCUPLACER provide criterionrelated and construct validity evidence that ACCUPLACER scores are valid for making placement decisions. Criterion-related studies illustrate that ACCUPLACER scores are correlated with other placement test criteria and the DIF studies indicate that ACCUPLACER items function appropriately for important sub-groups of the examinee population. In addition, the quality item and test development processes used to develop ACCUPLACER tests, including content and sensitivity review, provide important evidence of content validity. Although test validation is an ongoing process and validity studies on ACCUPLACER will continue, at this juncture, it appears that the validity argument for the use of ACCUPLACER scores for making placement decisions is strong.

ACCUPLACER

A-92

THE COLLEGE BOARD

Appendix A

Chapter 13: WritePlacer PlusTM

WritePlacerPlus is a direct measure of student writing skills offered as part of the College Board' ACCUPLACER Program. Examinees are asked to provide a writing sample in s response to a specific prompt. This assessment measures writing skill at the level expected of an entering level college student. It is used to assess the writing skill of entering college students as one basis for determining whether or not the student requires developmental instruction prior to taking college-level coursework. WritePlacer Plus is different than the other ACCUPLACER subtests in several ways. First, the test is not adaptive. Second, rather than present multiple-choice items to examinees, examinees are required to write an essay. The writing skill areas that are evaluated on students' WritePlacer Plus essays are (a) Focus, (b) Organization, (c) Development and Support, and (d) Mechanical Conventions. In this chapter, we describe the development of WritePlacer Plus prompts and summarize the research conducted on this test.

Development of WritePlacer Plus Prompts

Approximately 12 WritePlacer Plus prompts were developed in 1998 and 1999. The prompts were reviewed by content experts and were reviewed for sensitivity, before being field tested in 1998 and 1999 to approximately 800 students. About 50 to 100 students responded to each prompt. The responses were scored according to the rubric established for the WritePlacer Plus program, which is on a scale ranging from 1-4. Two scorers rated each response. If the two scorers disagreed by more than one point, a third scorer reviewed the response. If two of the three scorers "matched," this was accepted as the correct score. If all three scorers disagreed, a chief reader reviewed the essay and determined the correct score. Scorers were also asked to judge the effectiveness of the prompts for eliciting the writing required of an entry-level college student, issues of bias and issues of accessibility. The results of the field test were analyzed. Any prompts judged by scorers to be problematic on qualitative grounds were rejected. Similarly, any prompts that differed significantly in difficulty or that failed to yield a range of scores were rejected. Two prompts of the set of approximately twelve were rejected for operational use.

Equivalence of WritePlacer Plus Prompts

The equivalence of WritePlacer Plus prompts is established in three ways. First, all prompts are written to a common set of specifications and follow the same basic format for presentation. All prompts are targeted to measure academic writing required of entry-level college students. Second, the responses are scored on a common rubric across prompt topics. The rubric is designed to measure academic writing at the college entry level regardless of topic. Third, the distribution of scores following the field-testing is examined. Any prompt that is significantly different in difficulty, or that deviates significantly from the expected distribution, is not accepted for operational use.

WritePlacer Plus Scoring

Writing samples for WritePlacer Plus are scored using modified holistic scoring, which is a procedure used to evaluate the overall quality of writing based on important features of writing. Holistic scoring is used to evaluate the overall effectiveness of the writing sample as evidenced by how well a piece of writing communicates a whole message. Each writing sample ACCUPLACER A-93 THE COLLEGE BOARD

Appendix A is evaluated based on its overall impression, not on the basis of the individual writing characteristics in isolation. The specific features of writing on which each response is evaluated are: focus; organization; development and support; and Mechanical Conventions. WritePlacer Plus scores are reported on a scale from 2-12 reflecting the sum of two 6 readers' scores or the IntelliMetric model of the two readers' scores. If the two readers disagree by more than one point, a third reader evaluates the writing. A score of zero indicates that the response was off topic, in a language other than English, too short to score, or in some other way unscorable. A description of the qualities and characteristics of essays at each score point is provided in Table 13-1. Several studies investigated the accuracy of WritePlacer Plus scores. Since almost all WritePlacer Plus essays are scored by the computer (IntelliMetric, Vantage Learning, 2000a), most of this research has focused on the automated scoring of WritePlacer Plus essays. Vantage Learning, the contractor for automated scoring of WritePlacer Plus essays, used a committee of expert scorers to validate IntelliMetric scoring of essays from a statewide high school assessment program (Vantage Learning, 2000a). In this study, committees of 8 -10 writing experts scored 600 essays (about 200 essays from three different prompts) and the mean score of the committee was taken as the "true" score for the essay. Although this study did not involve WritePlacer Plus essays, the automated scoring program employed by WritePlacer Plus was the same. The program was able to reproduce the mean score for these essays between 53-82% of the time, depending on the scoring dimension and writing prompt. It should be noted, however, that these essays were scored on a four-point scale, which facilitates greater inter-scorer agreement than the six-point scoring scale upon which WritePlacer Plus is based. Vantage Learning (2000b) investigated the accuracy of IntelliMetric for scoring WritePlacer Plus essays. In this study, approximately 250 of 300 essays were used to calibrate the automated scoring system and the remaining essays were used to evaluate how well the system could reproduce the mean scores of the essays computed from the average of six human scorers. The results indicated that IntelliMetric provided scores within one-point of the mean human scores 100% of the time and reproduced the same score for about 72% of the essays. The correlation between the scores provided by the computer and the mean score of the human graders was .71. In an independent study, Rizavi and Sireci (1999) compared IntelliMetric Scoring of WritePlacer Plus essays with the scores provided by two human readers. They reported IntelliMetric/Human correlations ranging from .64 to .69 and intraclass correlations of .74 to .76. They also reported the percentage of essays that were scored the same by IntelliMetric and each reader and corrected these statistics for chance agreement (i.e., kappa coefficients). These chance-corrected agreements ranged from .44 to.52. They concluded that IntelliMetric was scoring the majority of these essays in a manner congruent with human scores, and they called for more additional research in this area.

Summary Of Research On WritePlacer Plus

In many cases (e.g., WritePlacer on-line), WritePlacer essays are scored by the IntelliMetric computerized essay-scoring program. ACCUPLACER A-94 THE COLLEGE BOARD

6

Appendix A WritePlacer Plus is a relatively new test and so placement validity studies are lacking. As reported in Chapter 12 (see Table 12-31), only one study has documented the concurrent validity of WritePlacer Plus and that study used a reading test (the Degrees of Reading Power) as the criterion measure (r=41). The majority of research on WritePlacer Plus has focused on how well its computerized scoring system reproduces scores assigned by human graders. This research is encouraging and tends to show high levels of scoring congruence. As WritePlacer Plus is used more frequently, predictive validity data should become available.

ACCUPLACER

A-95

THE COLLEGE BOARD

Appendix A Table 13-1: Description of WritePlacer Plus Score Scale

Score 12 Score Point Description An outstanding writing sample that is very effective at communicating a whole message to a specified audience. The response is well organized and maintains a clear central focus with a clearly stated purpose. The writer exhibits superior control in the development and support of ideas. The writer demonstrates superior facility with mechanical conventions such as sentence structure, usage, spelling and punctuation. An excellent writing sample that very effective at communicating a whole message to a specified audience. The writer establishes a clear purpose and focus is effectively maintained throughout the writing sample. Ideas are well developed and well supported. The writer clearly demonstrates mastery of sentence structure, usage, spelling and punctuation. A strong writing sample that effectively communicates a whole message to a specified audience. The writer establishes a purpose and maintains focus throughout the writing sample. The writer exhibits strong control in the development of ideas and clearly specifies the supporting detail. There is evidence of mastery of mechanical conventions such as sentence structure, usage, spelling and punctuation. A very good writing sample that substantially communicates a whole message to a specified audience. A purpose and focus is established, but only partially developed. An organizational pattern is evident, but is only partially fulfilled. The writer competently handles mechanical conventions such as sentence structure, usage, spelling and punctuation, though very minor errors in the use of conventions may be present. An adequate writing sample that competently communicates a message to a specified audience. Though the purpose of the writing sample may be clear, the development of supporting details may not be fully realized. The writer' organization of ideas is evident but may lack specificity, be incomplete or not s developed in effective sequence. There is evidence of control in the use of mechanical conventions such as sentence structure, usage, spelling and punctuation, though minor errors in the use of conventions may be present. A restricted writing sample that only partially communicates a message to the specified audience. The purpose may be evident but only partially formed. Focus on the main idea is only partially evident. The main idea is only partially developed with limited supporting details. While there is some evidence of control in the use of mechanical conventions such as sentence structure, usage, spelling and punctuation, some distracting errors may be present. A limited writing sample in which the characteristics of effective written communication are only partially formed. Statement of purpose is not totally clear and although a main idea or point of view may be stated, continued focus on the main idea is not evident. Development of ideas by the use of specific supporting detail and sequencing of ideas may be present, but is incomplete or unclear. The response may exhibit distracting errors or poor precision in the use of grammatical conventions including poor sentence structure, poor word choice, poor usage, poor spelling and punctuation. This writing sample addresses the topic with little success. There is some evidence of a main idea or point of view, but there is difficulty in articulation. An attempt at organization is made, but meets with limited success. There are significant errors in mechanical conventions of usage, sentence structure, grammar, spelling, and punctuation. This writing sample attempts to address the topic, but is only partially successful. There is often no clear statement of a main idea or point of view and there is confusion found in the writer' efforts in presenting s supporting detail. Any organization that is present fails to present an effective sequence of ideas. There are many errors in mechanical conventions of usage, sentence structure, grammar, spelling, and punctuation. This writing sample is largely unsuccessful at communicating a main idea or point of view, and there is little evidence of an organizational structure. Ideas lack focus and development and there are many errors in mechanical conventions of usage, sentence structure, grammar, spelling, and punctuation. This writing sample shows little evidence of mastery of organization, development, focus, sentence structure, usage, and conventions.

11

10

9

8

7

6

5

4

3 2

ACCUPLACER

A-96

THE COLLEGE BOARD

Appendix A

References

American Council on Education (1995). Guidelines for computerized-adaptive test development and use in education. Washigton, DC: Author. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Anastasi, A. (1988). Psychological testing (6th edition). New York: Macmillan. testing. Association of Test Publishers (2002, February). Guidelines for computer-based

Brookdale Community College (1996, February). Basic skills placement pilot study summary report. Lincroft, NJ: Office of Institutional Research, Brookdale Community College. Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 3144. Cole, J. C., Muenz, T. A., & Bates, H. G. (1998). Age in correations between ACCUPLACER's Reading Comprehension Subtest and GPA. Perceptual and Motor Skills, 86, 1251-1256. College Board (1993). Accuplacer Computerized Placement Test technical data supplement. New York, NY: The College Board. College Board, (1999, May). Comparison of ACCUPLACER and the Academic Assessment and Placement Program prepared for the Tennessee Board of Regents. New York, NY: The College Board. College Board (1999, November). Submission for approval of the ACCUPLACER system for placement use in California community colleges. New York, NY: The College Board. College Board (2000, June). Analysis of recommended ACCUPLACER cut scores prepared for National Louis University. New York, NY: The College Board. Cronbach, L. J. (1951). Psychometrika, 16, 297-334. Coefficient alpha and the internal structure of tests.

Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.) Educational measurement (2nd ed., pp. 443-507). Washington, DC: American Council on Education. Dorans, N. J., & Holland. P. W. (1993). DIF detection and description: Mantel­Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.) Differential item functioning (pp. 3566). Hillsdale, New Jersey: Lawrence Erlbaum.

ACCUPLACER

A-97

THE COLLEGE BOARD

Appendix A Ebel, R.L. (1977). Comments on some problems of employment testing. Personnel Psychology, 30, 55-63. Gulliksen, H. (1950b). Theory of mental tests. New York: Wiley. Haladyna, T. M., & Downing, S. M. (1989). A taxonomy of multiple-choice item writing rules. Applied Measurement in Education, 2, 37-50. Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: principles and applications. Boston, MA: Kluwer. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage. Holland, P.W. (1985). "On the study of differential item performance without IRT." Paper presented at the annual meeting of the Military Testing Association, San Diego, CA. Holland, P. W. & Thayer, D. T. (1988). Differential item functioning and the MantelHaenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum. Jodoin, M. G., (1999). Reducing Type I error rates using an effect size measure with the logistic regression procedure for DIF detection. Unpublished master's thesis, University of Alberta, Edmonton, AB, Canada. Jodoin, M. G., & Gierl, M. J. (2001). Evaluating power and Type I error rates using an effect size with the Logistic Regression procedure for DIF. Applied Measurement in Education, 14, 329349. Kane, M. T. (1992). An argument based approach to validity. Psychological Bulletin, 112, 527-535. Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley. Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 13-100). Washington, DC: American Council on Education. Millman, J., & Greene, J. (1989). The specification and development of tests of achievement and abilities. In R. Linn (Ed.), Educational measurement, (3rd ed. (pp. 335-366). Washington, D.C.: American Council on Education. Napoli, A. (1998). A psychometric analysis of the Tri-Campus Mathematics Tests. Suffolk, NY: Office of Institutional Research and Assessment, Suffolk Community College. Napoli, A., & Wortman, P.M. (1995). Validating college-level reading placement test standards. Journal of Applied Research in the Community College, 1(2), 143-151. Patelis, T. (2000, April). An overview of computer-based testing. Research Notes (RN09), New York, NY: The College Board, Office of Research and Development.

ACCUPLACER

A-98

THE COLLEGE BOARD

Appendix A

Ramsey, P. A. (1993). Sensitivity review: The ETS experience as a case study. In P.W. Holland & H. Wainer (Eds.). Differential item functioning (pp. 367-388). Hillsdale, NJ: Erlbaum. Rizavi, S., & Sireci, S. G., (1999). Comparing computerized and human scoring of student' essays. Laboratory of Psychometric and Evaluative Research Report No. 354. Amherst, MA: University of Massachusetts. Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105-116. Shepard, L. A. (1993). Evaluating test validity. Review of Research in Education, 19, 405-450. Sireci, S.G. (1998a). Assessment, 5, 299-321. Gathering and analyzing content validity data. Educational

Sireci, S.G. (1998b). The construct of content validity. Social Indicators Research, 45, 83-117. Sireci, S. G. (2001, November). Analysis of differential item functioning across males and females on the Levels of English Proficiency Tests: Report prepared for the College Board. Northampton, MA: Sireci Psychometric Services. Sireci, S. G. (2002, January). Analysis of differential item functioning across selected ethnicity groups on the Levels of English Proficiency Tests. Northampton, MA: Sireci Psychometric Services. Sireci, S. G., & Mullane, L. A. (1994). Evaluating test fairness in licensure testing: the sensitivity review process. CLEAR Exam Review, 5 (2) 22-28. Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370. Swanson, L & Stocking, M.L. (1993). A model and heuristic for solving very large item selection problems. Applied Psychological Measurement, 17, 177-186. Vantage Learning (2000a). A study of expert scoring and IntelliMetricTM scoring accuracy for dimensional scoring of grade 11 student writing responses (Report No. RB-397), Yardley, PA: Author. Vantage Learning (2000b). IntelliMetricTM scoring for WritePlacer Plus: A study of two additional prompts (Report No. RB-724), Yardley, PA: Author. Wainer, H. (2000). Computerized-adaptive testing: A primer (2nd ed). Mahwah, NJ: Erlbaum.

ACCUPLACER

A-99

THE COLLEGE BOARD

Appendix A Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning: Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of Defense. Zumbo, B D., & Thomas, D. R. (1996). A measure of DIF effect size using logistic regression procedures. Paper presented at the National Board of Medical Examiners, Philadelphia, A.

ACCUPLACER

A-100

THE COLLEGE BOARD

Appendix A

Appendix A: National Reference Group Data

ACCUPLACER

A-101

THE COLLEGE BOARD

Appendix A National Reference Group Data for ACCUPLACER Percentile Rank Scores This Appendix provides reference data for the ACCUPLACER tests based on a national group of students. Since the introduction of the ACCUPLACER System in 1986, user institutions have been encouraged to return their test data. The returned data were used to create ACCUPLACER test normative data. In 1994 the ACCUPLACER item pools were refreshed. The refreshment process involved a review of the Reading Comprehension, Sentence Skills, Arithmetic, Elementary Algebra, and College-Level Mathematics test items. The removal of outdated items and the addition of new items to the pool combined with the use of a new item selection algorithm required the generation of new normative data. The Total Right Score scale remains the same 0 to 120 scale. New normative data was calculated and were made available in December 1994. The norm tables are shown below. The characteristics of students in the norm group may be briefly summarized as follows: · Minority students made up 38% of the national group. 5 % Asian-American, 15 % African American, 14 % Hispanic, 4 % other minority groups. Female students comprised 56% of the national group. More than 37 percent of the students were 25 years of age or over.

· ·

Selected characteristics of the colleges that presented reference data are presented in the table below. Most of the colleges are public comprehensive community colleges and four-year colleges, located in every region of the country.

Table A-1: Number of Colleges Represented in the Reference Data By Geographic Region and Type of College Region Midwest Northeast South Southwest West Total Public 2-year College 10 14 16 1 8 49 Vocational Technical Institute 2 3 5 Public 4-Year College 3 9 3 1 5 21 Private 4-Year College 2 3 3 1 3 12 Total 15 28 25 3 16 87

ACCUPLACER

A-102

THE COLLEGE BOARD

Appendix A Table A-2: ACCUPLACER

ACCUPLACER Scores 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 Reading Comprehension 99 99 99 98 98 97 96 95 95 94 93 92 91 90 89 88 87 85 84 83 81 80 78 77 76 74 73 71 69 68 66 65 63 61 59 58 56 55 53 52 50 49 47 46 44 43 41 39 38 36 35 34 99 99 98 96 95 94 92 90 89 87 86 84 83 82 80 78 77 76 74 73 71 70 68 67 66 64 63 61 60 58 56 55 53 52 50 49 47 46 45 43 42 41 39 38 37 36 35 34 32 31 30 29

Sentence Skills

Language Tests Percentile Scores

ACCUPLACER Scores 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 Mean SD N Reading Comprehension 32 31 30 28 27 26 25 24 23 22 21 20 19 18 17 17 16 15 15 14 13 13 12 12 11 11 10 10 9 9 8 8 7 6 6 5 5 4 3 2 1 <1 <1 <1 <1 <1 <1 <1 <1 77.10 23.40 5,000

Sentence Skills 28 27 26 25 24 23 23 22 21 20 19 18 18 17 16 15 15 14 13 13 12 12 11 10 9 9 8 7 6 6 5 4 4 3 2 2 1 1 <1 <1 <1 <1 <1 <1 <1 <1 <1 <1 <1 81.91 24.19 5,000

ACCUPLACER

A-103

THE COLLEGE BOARD

Appendix A Table A-3: ACCUPLACER

Arithmetic 99 99 99 98 98 97 96 96 95 94 94 93 93 92 91 90 90 89 89 88 87 86 86 85 85 84 83 82 82 81 80 80 79 78 77 75 75 74 73 72 71 70 69 68 68 67 66 65 63 62 61 Elementary Algebra >99 >99 99 99 98 98 98 97 97 96 96 96 95 95 94 94 94 93 93 93 92 92 91 91 90 90 89 89 88 88 87 87 87 86 86 85 85 84 84 83 83 82 82 81 80 80 79 79 78 77 76

Total Right Scores 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 71 70 69

College-Level Math >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 99 99 99 99 99 99 99 99 98 98 98 98 98 98 97 97 97 97 97 97 96 96 96 96 96 95 95 95 95 95 94 94 94 93 93

Mathematics Tests Percentile Scores

Total Right Scores 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 Mean SD N Arithmetic 61 60 60 59 58 57 57 56 55 54 53 52 52 50 50 49 47 46 45 44 43 42 41 40 39 38 37 35 34 33 32 31 30 29 27 26 25 24 22 21 19 17 15 13 11 9 7 5 60.25 28.9 5,000 Elementary Algebra 76 76 76 75 75 74 74 73 72 71 71 70 69 68 68 67 66 65 65 64 63 63 61 61 60 59 58 57 56 54 53 52 50 49 47 45 43 41 38 36 34 31 27 23 19 16 11 7 48.83 27.38 5,000

College-Level Math 93 93 93 92 92 92 92 91 91 90 90 89 88 88 87 87 86 86 85 84 84 83 83 82 81 80 80 79 77 76 74 73 71 70 68 66 63 61 59 55 52 48 44 39 35 30 25 20 35.17 18.02 5,000

ACCUPLACER

A-104

THE COLLEGE BOARD

Appendix A Reference Data for LOEP Tests Reference data for performance on the LOEP test is presented in Table 9. The population used to develop the normative reference tables for the LOEP tests includes a sample of two- and four-year colleges (See pilot study participant list) that tested students in their ESL and developmental courses. Of the students in the sample, 90% reported that English was not their Best Language, while approximately 10% reported English as their Best Language but scored low on the Reading Skills test. Table A-4: Sample From Which LOEP Percentile Norms Were Derived Best Language English Spanish Asian* European (except Spanish) Other** 51% 16% 14% 6% 6% Percentage of Sample in Norms 9% 35% 30% 13% 13%

*Asian best language speakers are divided roughly evenly among Chinese, Japanese, Korean, Vietnamese, and others. **Includes small numbers of people identifying as their best language a Native American language, a Pacific Island language, Filipino, an African language, Urdu, Hindi, and Arabic.

ACCUPLACER

A-105

THE COLLEGE BOARD

Appendix A ACCUPLACER Levels of English Proficiency Tests Percentile Scores

Language Use

>99 >99 >99 99 99 99 98 97 97 96 94 93 91 90 88 87 85 84 82 81 79 78 76 75 73 71 69 67 66 65 63 61 60 58 56 55 53 51 48 47 46 44 42 41 40 39 38 36 36 35 33 32 31 30 29

Total Right Score

120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65

Reading Skills

99 99 99 99 98 97 96 95 94 93 92 90 89 88 86 85 83 82 80 78 76 75 73 72 70 69 68 66 65 64 62 61 59 58 56 54 53 51 48 47 46 45 43 42 41 40 38 37 36 35 33 32 31 30 29

Sentence Meaning

99 99 99 98 96 95 93 91 90 88 87 85 83 81 80 78 77 75 73 72 70 69 67 66 64 62 60 59 57 56 54 52 50 49 47 46 44 43 40 39 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23

ACCUPLACER

A-106

THE COLLEGE BOARD

Appendix A

Total Right Score

64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 N= Mean S.D.

Language Use

28 27 26 25 24 23 22 22 21 20 19 18 17 16 16 15 14 13 12 12 11 10 9 8 8 7 6 6 5 4 4 3 3 2 2 2 1 1 1 <1 <1 <1 3,000 77.85 23.39

Reading Skills

28 27 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 7 6 6 5 5 4 3 3 3 2 2 1 1 1 1 1 1 <1 <1 <1 <1 3,000 79.19 22.91

Sentence Meaning

22 22 21 20 19 18 17 16 15 14 14 13 13 12 11 11 10 10 9 9 8 7 7 6 6 5 5 4 4 4 3 3 3 2 2 1 1 1 1 <1 <1 1 3,000 83.08 23.71

ACCUPLACER

A-107

THE COLLEGE BOARD

Appendix A User Institutions Contributing Reference Data

ACCUPLACER

A-108

THE COLLEGE BOARD

Two-Year Institutions Anchorage Community College Bergen Community College Brevard Community College Bunker Hill Community College Cape Cod Community College Central Piedmont Community College Chipola Junior College College of the Desert College of Dupage Community College Allegheny County Cuyahoga College De Anza College Delaware Tech. Community College Dover Dundalk Community College Edison Community College El Camino College Evergreen Valley College Florida Community College at Jacksonville Glendale Community College Gore Albert Junior College Harrisburg Area Community College Horry-Georgetown Technical College Illinois Central College Parham Jefferson Community College John Tyler Community College Kirkwood Community College Lethbridge Community College Lewis and Clark Community College Linn-Benton Community College Los Angeles Harbor College Manchester Community College Mercer County Community College Miami-Dade Community College North Miami-Dade Community College South Mid-Michigan Community College Montgomery College Mount Wachusett Community College Mt. San Antonio College North Greenville College Oakton Community College Oklahoma City Community College Owensboro Community College Palm Beach Community College Prince George's Community College Raritan Valley Community College Richmond Community College Salt Lake Community College Santa Fe Community College Springfield Community College

Appendix B St. Louis Community College Suffolk County Community College SUNY Institute of Technology at Utica/Rome Tallahassee Community College Trident Tech. College North Victor Valley College Vincennes University Wright State University, Lake Campus Yuba College Four-Year Institutions Augustana College Brigham Young University California State University Northridge Catawba College Central Washington University Chicago City-Wide College George Mason University Glassboro State College Grand Rapids Baptist College Harding University Hunter College Mansfield University Pa. Metropolitan State College New York University Northeast Missouri State University Northwestern State University, Louisiana Peru State College Ramapo College of New Jersey Rider College Shorter College St. Bonaventure University University of Minnesota, Crookston University of Lowell University of New Mexico University of Virginia University Nevada Las Vegas US Merchant Marine Academy West Virginia Institute of Technology West Chester University Westfield State College

ACCUPLACER

B-2

THE COLLEGE BOARD

Appendix B

Appendix B: Test Development Committees

ACCUPLACER

B-3

THE COLLEGE BOARD

Appendix B Test Development Committee For Reading Comprehension, Sentence Skills, Arithmetic And Elementary Algebra The Academic Officers Association Basic Skills Committee (Community Colleges) James McGinty, Chairperson OA Basic Skills Committee Dean of Instruction Ocean County College College Drive, CN 2001 New Jersey Association For Developmental Education Dorothy Minkoff, President NJADE Coordinator of Basic Skills Trenton State College Pennington Road, CN 4700 Trenton, NJ 08650-4700 The New Jersey State College Basic Skills Advisor Council Dorothy Minkoff, Chairperson NJ State College Basic Skills Advisory Council Albert Porter, Chair Mercer County Community College Jose Adames Kean College of New Jersey Madan Capoor Middlesex County College Walter Cmielewski Mt. Olive High School Mary Cross Fairleigh Dickinson University Lewis Hirsch Rutgers University, New Brunswick Robert Kearney St. John Vianney High School Glenn Lang Department of Higher Education Robert Lynch New Jersey Institute of Technology Dorothy Minkoff Trenton State College Susan Mulligan Essex County College Raymond T. Smith Rutgers University, Newark Kurt Spellmeyer Rutgers University Manya Ungar National Congress of Parents & Teachers Nina Jemmott Ex Officio New Jersey Institute of Technology l

ACCUPLACER

B-4

THE COLLEGE BOARD

Appendix B ASSESSMENT ADVISORY COMMITTEE Walter Cmielewski, Chair Mt. Olive High School Philip Beardsley Department of Higher Education Commission Madan Capoor Middlesex County College Robert Cirasa Kean College of New Jersey Anthony J. Evangelisto Trenton State College Donald Generals Passaic County Community College Patricia Kalata Burlington County College Margaret Kilduff New Jersey Institute of Technology Ansley LaMar Jersey City State College Judy Marsailiano Warren County Community College

James McGinty Ocean County College Margaret McFadden Glassboro State College Susan Mulligan Essex County College Gerald Sircus Bergen County Community College Jeffrey Slovak Rutgers University Sybil Smith Montclair State College

READING AND WRITING ADVISORY COMMITTEE Dorothy Minkoff, Chair Trenton State College Jose Adams Kean College of New Jersey Mary Ellen Byrne Ocean County College Dennis Donahue New Jersey Institute of Technology Delbert Earisman Upsala College Pamela Farrell Red Bank Regional Carol Gavin Burlington County College Terrence Healey Cinnaminson Township Public Schools Charlee Harris North Plainfield High School Robert Lynch New Jersey Institute of Technology RoseAnn Morgan Middlesex County College Marianne Reynolds Mercer County Community College Joy Stone Montclair State College Charles Thomas Bergen Community College Ron Topham Brookdale Community College

ACCUPLACER

B-5

THE COLLEGE BOARD

Appendix B MATHEMATICS ADVISORY COMMITTEE Lewis Hirsh, Chair Rutgers University David E. Boliver Trenton State College Frank Cereto Richard Stockton State College Elizabeth Collins Glassboro State College Angel Eguaras, Jr. Atlantic Community College Helen Kuruc Essex County College Virginia Licata Camden County College Paul Lawrence Franklin Township Public Schools Ruth D. O'Dell County College of Morris Freda Robbins Jersey City State College John Rollino Upsala College Robert Urbanski Middlesex County College

Participants in LOEP Test Development Committee Meetings Barbara Baxter, Writing Director

ACCUPLACER

B-6

THE COLLEGE BOARD

State Technical Institute at Memphis Memphis, TN Barbara Echord, Dean for Academic Affairs Miami-Dade Community College Miami, FL Richard Delgado, Director of Assessment Albuquerque Technical Vocational Institute Albuquerque, NM Laurie Ojeda, Foreign Language and ESL Los Medanos College Lancaster, TX Ralph Radell, Chair, ESL Department Bunker Hill Community College Boston, MA Iris Ramer, Professor, ESL Department Middlesex County College Edison, NJ Barbara Ritchie, Professor, Communications Humber College Toronto, Ontario, Canada Nilda Suarez-Gago, Professor Colegio San Antonio Rio Pedras, Puerto Rico Agnes A. Yamada, Professor and Chair, Department of English California State University, Dominquez Hills Carson, CA Rodney W. Young, Program Evaluator, Federal Programs Dept. Clark County School District Las Vegas, NV

Appendix C

Appendix C: Institutions Participating In Validity Studies

ACCUPLACER

C-139

THE COLLEGE BOARD

Appendix C Institutions Participating in the Validity Studies Two-Year Colleges and Universities Brevard Community College Bunker Hill Community College Cape Cod Community College Central Piedmont Community College College of DuPage College of the Desert Cuyahoga College De Anza College Dundalk Community College Edison Community College El Camino College Florida Community College at Jacksonville Glendale Community College Harrisburg Area Community College Horry-Georgetown Technical College Jefferson Community College John Tyler Community College Lethbridge Community College Lewis and Clark Community College Los Angeles Harbor College Manchester Community College Miami-Dade Community College Montgomery College Mount Wachusett Community College North Greenville College Oklahoma City Community College Owensboro Community College Palm Beach Community College Richmond Community College Salt Lake Community College Santa Fe Community College Springfield Community College SUNY Institute of Technology at Utica/Rome Tallahassee Community College Trident Technical College Victor Valley College Vincennes University Wright State University Four-Year Colleges and Universities Catawba College Central Washington University Grand Rapids Baptist College Harding University Northwestern State University Peru State College Rider College Shorter College, GA University of Lowell University of Minnesota, Crookston US Merchant Marine Academy Westfield State College ACCUPLACER C-139

THE COLLEGE BOARD

Appendix D: Institutions Participating In LOEP Pretest And Validity Studies

Appendix D U.S. Participating Institutions Alms Community College CO Albany State College GA Andrew College GA Antelope Valley College CA Arizona Western College AZ Art Institute of Fort Lauderdale FL Atlanta Metropolitan College GA Bassist College OR Bishop State Community College AL Brazosport College TX Brevard Community College FL Briarwood College CT Bristol Community College MA Brookhaven College TX Broward Community College FL Bunker Hill Community College MA Burlington County College NJ Caldwell Community College NC Camden County College NJ Carteret Community College NC Casco Bay College ME Cedar Valley College TX Central Florida Community College FL Central Piedmont Community College NC Central Washington University WA Central Wyoming College WY Centralia College WA Chattahoochee Technical College AL Chowan Community College Cocoa High School FL College of the Desert FL College of Lake County IL Colorado Technical College CO Community Colleges of Spokane WA Contra Costa College CA Corning Community College NY Cumberland County College NJ Dallas Christian College TX Davis & Elkins College WV Delgado Community College LA Divine Word College IA Dixie College UT Donnelly College KS Eastern High School MI Eastern Michigan University MI Eastfield High School FL El Centro College TX Essex County College NJ Farmington High School NM Fayetteville State University NC ACCUPLACER D-139 Frederick Community College MD George Walton High School GA Hagerstown Junior College MD Harford Community College MD Harrisburg Area Community College PA Hellenic College MA Henry Ford Community College MI Howard Community College MD International Bible College AL Iowa Lakes Community College IA Iowa Western Community College IA John Marshal High School MN Josh M. Lofton Senior High School NY Lawless High School LA Keystone Junior College PA Kilgore College TX Laboure College MA Lake-Sumter Community College FL Lansing Community College MI Lewis College of Business WI Los Medanos College CA Lynn University VT Manatee Community College FL Manor Junior College PA McHenry County College IL Medaille College NY Mendocino Community College CA Merced College CA Miami-Dade Community College FL Middlesex Community College MA Milwaukee Area Technical College WI Mississippi Delta Community College MS Mohegan Community College CT Monroe Community College NY Monterey Peninsula College CA Montgomery College MD Morton College IL Mount Aloysius College PA Mount San Jacinto College CA Mount Wachusett Community College MA Mountain Empire Community College VA Muscatine Community College IA Muskingham Area Technical College OH New River Community College VA North Carolina Central University NC North Lake College TX North Shore Community College MA Northeastern Oklahoma A&M College OK Northwest Mississippi Community College MS Oak Hills Bible College MN

THE COLLEGE BOARD

Appendix D Ocean County College NJ Paris Junior College TX Quinisigamond Community College MA Richland College TX Rio Hondo College CA Rochester Community College MN Sacramento City College MN Salt Lake Community College UT San Jose Christian College CA Santa Fe Community College FL Selma University AL Sheridan College NY Solano Community College CA Southeast Missouri University MO South Suburban College IL Southwestern Community College NC State Technical Institute, Memphis TN Strayer College DC Suomi College MI Texas State Technical College TX Trinity Valley Community College TX Tulsa Junior College OK Union County College NJ University of Alaska AK University of Arkansas at Pine Buff AR University of Minnesota, Crookston MN University of Nevada, Las Vegas NV University of Rio Grande OH Valley Forge Military Academy PA Vennard College IA Victor Valley Community College CA Vincennes University IN Virginia Community College - Loudoun VA Wallace Community College AL Weatherford College TX Wesley College MS Westark Community College AR Western Iowa Technical College IA Youngstown State University OH Puerto Rico Institutions Colegio Universitario del Este Universidad Adventista University of Puerto Rico, Arecibo College University of Puerto Rico, Bayamon Other U.S. Institutions Community College of Micronesia Northern Marianas College International Participating Institutions Bermuda Bermuda College Canada Champlain Regional College QUE Dawson College QUE Grant MacEwan Community College Humber College ONT Kane Senior Elementary School ONT Red Deer College AB Selkirk College BC Seneca College of Applied Arts Technology ONT York University ONT

and

ACCUPLACER

D-139

THE COLLEGE BOARD

Appendix E

Appendix E: Surveys of Cut-Scores

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E Surveys of Cut-Scores During the process of setting cut-scores, it is often helpful to know what scores other institutions are using to place students into college-level and developmental courses. The information contained in these surveys was gathered from two- and four-year colleges that use ACCUPLACER. Each line represents the scores used at a single institution. The column headings show the courses/levels. The last two pages show retest policies that are used at various institutions.

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E Survey of ACCUPLACER Sentence Skills, Reading Comprehension and WritePlacer Plus Scores Used For Placement into English Classes College Level English SS RC WPP Score 81 - 120 80-120 90-120 83-120 98-120 80-120 82-110 88-120 101-120 95-120 71-94 95-104 71-120 92-101 82-120 58-120 69-120 69-120 26-68 7-8 7-8 Developmental English I SS RC WPP Score 60 - 80 0-79 35-57 79-82 20-97 0-92.9 48-87 101-120 57-100 28-55 56-70 56-70 71-94 71-74 95-104 60-82 71-82 74-86 0-97 98-105 98-105 0-67 Developmental English II SS RC WPP SS 40 - 59 58-89 Placement based on essay; 0-34 placement ABE Other Course Name

RC

20-78

111-120 102-120 Honors English 42-81 10-57 54-120 26-53 54-120 26-68 69-120 26-68 4-6 4-6 4-6 4-6 0-59 51-70 52-74 0-84 85-120 20-50 0-52 Dev Engl III Adult Basic Language 10-56 28-55 28-55 56-70 26-53 54-120 26-53 1-3 1-3

83-120 83-120 86-120 106-120 98-105 68-120

80-120 85-120 8-12

Skills

0-6

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E College Level English SS RC WP Score 100-120 80-120 65-120 97-120 84-120 77-83 79-120 86-120 71-120 86-120 100-120 96-120 87-105 86-120 80-120 85-120 86-120 85-120 81-85 75-120 93-120* 67-120 9-12 Developmental English I SS RC WP Score 71-99 79-70 55-64 71-82 33-76 77-83 0-78 75-85.9 51-70 0-85 84-99 74-95 30-53 50-85 73-77 0-84 57-80 71-84 52-66 6-8 Developmental English II SS RC SS 54-70 20-69 36-54 51-70 36-51 4-5 0-53 0-35 26-50 0-32 0-35 Other Course Name Individualized Center ABE/ESL Basic English ABE Learning

RC

68-120

0-67

80-120 70-120 80-120 84-120 90-120 78 - 92 80-120 5-12 8-12 86-120 In House Essay

57-69 0-79 70-120 70-89

0-65.9 0-50 66-84 50-73 54-86 0-49 60-72

0-56 65-120 50-69

51-120 0-65 0-49 0-49 105-118 1-5 46-59 1-5

Communications Skills Eng 030 Honors English

1-5 7 0-6 66-85

0-56 58-70

54-65

0-58

0-54

93-120*

81-92*

81-92*

66-80*

66-80*

0-65*

0-65*

Basic English (2 courses)

*placement is based on ((2*RC)+SS)/3 [(.76*RC)+(.33*SS))

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E College Level English SS RC WP Score 80-120 70-120 95-120 9-12 80-120 96-120 88-120 86 - 120 80 -120 Developmental English I SS RC WP Score 61-94

65 + 68-120 6-8

Developmental English II SS RC WP SS Score 35-60 0-65 59-74 0-73 52-74 0-34

RC

Other Course Name Basic Language Skills

66-79 75-95 87-74 74-86

52-58 0-51 0 - 51 0-4

Dev Engl III Dev Engl IV Advise Adult Basic Ed through Continuing Ed

98 - 120 98+ 85+

9-12 8-12 9-12 20 - 69 8 - 12 70-97 No essay required 8+ 6 ­ 8* 81 - 120

5-8 2-8 2-7 1-7 5*

0-80

5*

WPP 4 ­ 2*

*This is based on the Texas WritePlacer 8 point scale

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E

Survey of ACCUPLACER Sentence Skills, Reading Comprehension

Scores Used For Placement into Reading Classes Developmental Reading II RC SS 55-70 55-75 20-69 20-47 Developmental Reading III RC SS 0-54 28-54 No Reading Required RC 81-120 78-120 90-120 83-120 79-120 71-120 78-120 82-120 82-120 69-120 86-120 83-120 80-120 79-120 80-120 80-120 67-120 83-120 88-120 79-120 75-120 80-120 84-120 SS Developmental Reading I (Highest) RC SS 71-80 0-77 76-89 20-82 70-78 48-70 0-77 0-81.9 58-81 54-68 61-76 57-60 71-82 70-80 41-78 60-69 70-89 66-79 52-66 82-71 68-87 0-78 65-74 0-79 84-66 60-120 Other RC 0-27 SS Course ABE

39-57 26-53 0-56 57-60 51-70 51-70 50-59 20-65 36-51 51-70 43-67 0-64 0-59

10-38 77-85 20-50 0-51 0-40 0-49 College Reading & Study Skills (recommended)

86-120

26-50 28-42

0-27 64-74

ABE Decision Zone (Rapid Reading)

86-120

0-85 0-65

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E No Reading Required RC 80-120 78-120 78-120 75-120 72-120 85-120 85-120 82-120 80-120 96-120 81-120 68-120 81-120 93-120 87-120 SS Developmental Reading I (Highest) RC SS 65-79 70-77 56-77 0-71 84-46 68-84 71-81 50-79 75-95 50-80 67-46 70-80 78-92 87-120 65-70 0-49 59-74 30-49 0-45 51-70 51-77 44-55 48-58 <30 0-43 0-47 Developmental Reading II RC SS 50-64 58-69 46-55 Developmental Reading III RC SS 0-49 43-57 0-45 Other RC 20-40 SS Course College Option

Associate Degree Basic Reading (2 courses) Retest with TABE

0-50 28-50

Advise ABE through Continuing Ed

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E Survey of ACCUPLACER Mathematics Tests Scores Used For Placement into Developmental Mathematics Classes

Elementary Algebra AR EA CLM 75-120 35-53 87-120 51-105 20-70 50-66 0-63 38-52 73-120 0-59 69-79 80-120 65-120 30-57 81-120 0-79 51-75 0-75 0-71 0-57 0-34 35-55 32-52 0-49 25-75 69-120 69-120 79-89 90-120 72-89 90-120 57-76 56-120 40-66 Intermediate Algebra AR EA CLM 54-89 53-72 106-120 0-40 72-94 71-120 64-81 53-72 0-60 67-120 0-45 With RC 54-83 Immersion Math AR EA

AR 0-34 0-27 0-36 0-40

Basic Math

EA

CLM

AR 35-74 28-86

Pre Algebra EA CLM 0-52 26-50 37-71 20-49

Other CLM

Course

0-25

0-52 0-36 0-72 0-59 0-51 51-79 0-29 30-57 0-80

0-37

53-120 75-120

75-120

0-50 0-71

0-45 0-39 0-30 36-44 56-70 Adult Basic Studies Math 71-120 Integrated Algebra & Geometry I & II

30-64 57-120

20-71 0-57

57-120

0-49 57-89 0-112 0-51 19-29 0-57 0-59

50-120

0-31 90-111 113-120 52-120 48-120 57-120 75-120

53-79 50-81 112-120 76-107 77-120 52-81 72-80 70-120 78-120

40-62

0-57

0-24

Math Lab Admin AR test

0-51 19-47 57-120 60-74

20-71 0-38

0-51 20-71 38-70

57-120

78-120 Math for Teachers

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E

0-65 AR 0-73 20-68 40-64 31-54 31-54 51-90 33-75 0-40 0-44 42-53 0-49 31-60 Basic Math EA CLM 0-47 0-65 AR Pre Algebra EA CLM 66-120 40-79 Elementary Algebra AR EA CLM 74-120 48-67 94-120 20-66 65-120 0-41/50 55-120 44-68 91-120 0-40 75-99 41-120 54-73 61-120 0-48 0-34 45-64 73-120 50-120 100-120 40-90 41-120 65-89 48-64 0-50 35-59 50-120 65-120 50-81 60-84 0-41 75-120 0-30 Number Sense Operations with Whole numbers, Decimals & Percents Depends on HS Algebra background Retest with TABE 80-120 Intermediate Algebra AR EA CLM 68-120

AR 39-93

EA

Other CLM

Course Pre-Tech Mathreview of math fundamentals

65-120

0-43 44-68

55-120 55-84

0-43 44-68

51-75 69-120 63-120 90-120 40-65 41-63 0-50 0-33

63-120

BASK 0306 (lowest level) See advisor

0-59 40-75 76-120 0-63 <65 1-51 0-65.9 0-27 0-24 28-35 0-37 0-31

60-120 76-120 64-120 0-44 52-120 66-120 36-116

0-71 38-78 32-63 65 - 74 0-51 25-60 48-85

60-120 76-120

0-71 79-120 64-73 45-120 75 - 84 52-82 61-103 86-118 0-34 0-42 74-120 0-39

25-43 40-62

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E Survey of ACCUPLACER Mathematics Tests Scores Used For Placement into College Level Mathematics Classes AR Statistics EA CLM 43-69 106-120 71-120 41-78 College Level Math AR EA CLM 43-69 73-120 106-120 41-78 95-120 71-120 41-120 67-120 82-120 Pre-Calculus AR EA CLM 106-120 79-100 95-120 61-120 71-120 Trigonometry AR EA CLM 70-90 AR Calculus EA CLM 91-120 106120 101-120 71-120

72-120 53-85 0-81 53-85 0-81 46-58 40-65

72-120 86-120 92-120 95-120

60-120 80-120

60-120 80-120 69-120 90-120 90-120 80-120

60-120 80-120 69-120 90-120

82-91 59-85

69-120 90-120

60-120 80-120 86-120 69-120 90-120

90-120 80-120 56-120

40-62 0-45 45-75

82-120 63-85

0-45 45-65.0 56-120 45-75 80-104 0-69 82-120 108-120 63-92 81-89 0-33 41-120 0-51 90-120 34-71 63-85 74-120 43-120 44-69.9

90-120 63-120 w/ dept approval 56-120 80-104 108-120 65-80 76-95 0-69 63-92 72-90 52-101 86-120 74-120

90-120 63-120 w/ dept approval 56-120 105-120 69-120 108-120 93-120 91-120 102-120 103-120 43-120 80-120 96-120

43-120 44-69.9 70-120

74-120

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E 63-102 AR Statistics EA CLM 67-120 76-120 69-120 College Level Math AR EA CLM 67-120 65-120 76-120 69-120 51-120 66-120 50-120 25-120 64-120 90-120 62-75 82-120 60-120 72-120 85-100 60-120 72-120 76-120 79-120 101-120 57-120 70-120 90-120 35-79 0-51 76-88 0-49 60-120 82-120 76-120 79-120 80-120 88-103 50-74 60-120 82-120 104-120 63-102 Pre-Calculus AR EA CLM 74-120 69-120 78-120 Trigonometry AR EA CLM 103-120 AR Calculus EA CLM 69-120

51-85 51-120 86-120 66-120

66-120

66-120

66-120

57-120 70-120 90-120

90-120

90-120

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E Survey of ACCUPLACER LOEP Tests Reading Skills, Language Use, and Sentence Meaning Scores

Level 1 (highest) RS LU SM Average of three tests 84 -120 Average of three tests 85-102 86-105 with essay 116-120 116-120 116-120 Writing Sample required prior to exempti on 100-120 91-120 100-120 composite 311-360 Average of RS and LU 94 + Level 2 RS LU SM Average of three tests 72 - 83 Average of three tests 66-84 66-85 with essay 101-15 101-105 101-115 55-99 50-90 60-99 composite 250-310 Average of RS and LU Level 3 LU

Used For Placement into English-As-A-Second Language Classes Level 4 LU Other SM RS LU SM Course

RS

SM

RS

Average of three tests 45-65 46-65 with essay 81-100 81-100 81-100

31-45 with essay 55-80

Average of 103-120 branch to Sentence Skills test 0-30 with Non essay Credit 55-80 55-80 0-54 0-54 0-54

89-97

81-93

71-95

78-88

composite 0-249 Average of RS and LU

68-80

46-70

45-77

< = 44 refer to ABE 55-67

25-45 Avg 44 - 72 25-40

Average of RS and LU 105 + Take Reading Comprehension Avg 36 - 43 0-24 0-24 See Advisor

Avg 103 - 120 96-120 91-120

Avg 93 - 102 66-90

Avg 73 - 92 41-65

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E

Retest Policies from Survey of Cut Scores

FTE 3,500 1,548 520 Head 8,000 4,848 1,300 3,800 2,010 2,392 4,172 3,300 Type Public/ Private 2 Public 2 Public 2 Public 2 Public 2 Public 2 2 Public Public Fee $15 None None None $15 None Retake Policy

May retest once within 90 days. Must wait 24 hours. After 3 years or with permission of the Director of Counseling Students may retest twice within a calendar year on all or part of the placement tests With approval Student may retest if initial score is within 10 points of the cut score. Otherwise they need to take developmental courses. Student can only retest one time with out taking the developmental class. In the developmental course the final exam is the ACCUPLACER placement test again. Student must meet with Testing Center Coordinator, receive some study help, can retake after 1 week. Without a fee after each semester We use both the ACCUPLACER and the Asset tests for testing. Students, who test once and are not happy with their scores, can retest once using the opposite instrument. After that time they are still not happy, they have to go through an "Acknowledgement of academic liability" process with he testing director. 5 years or more without taking classes and/or permission of Dean of Student Services We do not retest Students (not testing for ATB purposes) may pay the test fee and retest the next day. Even though it is not recommended for students to attempt the test more than twice, there is not a limit to how many times they may attempt it. ATB students who score within two points of the minimum passing scores on their first attempt may petition through the Intervention Program staff to retest again (no waiting period). All other ATB students must wait a minimum of three months and complete a minimum of 30 hours of documented academic intervention before they will be allowed to retest for ATB purposes. The cycle of three months and 30 hours of intervention may be repeated until the students has had a maximum of 4 ATB test attempts within a year. After the fourth attempt, a student must wait one year before taking the tests again. EA 72-81 may retest, EA 40 - 49 may retest Advising office determines need for retesting on an individual basis Once May retest once prior to enrolling in and attending classes. Encourage people to do more practice. One retest in a two year period One re-test per student per semester Retest once prior to enrollment, after classes start with dept chair referral, after two years With advisor' request. If math score is over 3 years old. s

1,800

18,850

4,800 8,000

25,617

2 2

2

Public Public

Public

None None

$5

19,000 5,500 7,953 29,583 1,406

32,000

19,103 7,780 69,557 2,362

4 2 2 2 2 2

Public Public Public Public Public Public Public Public

None None None None None None

ACCUPLACER

E-139

THE COLLEGE BOARD

Appendix E FTE

3,921 5,620 3,048 2,500 4,000 3,593 12,000 3,441 2,800 870 4,654 7,953 7,177 7,058

Head

9,110 9,000 5,203 3,500 6,000 4,868 23,000 6,539 3,500 6,000 10,214 19,103 8,712 9,758

Type Public/ Private

2 2 2 4 2 4 4 2 2 2 4 2 2 4 4 Public Public Public Public Public Public Public Public Public Public Public Public Public Public Public

Fee

None None None None None Eng. $4, Math $5

Retake Policy

Once a term, no more than a total of three tests in twelve months Once with ACCU, once with timed p-and-p tests No re-tests, but use ACCUPLACER as a post test for Algebra and English Unlimited Students may re-test in writing and math, but not reading Once before classes start English 1 retest, then must wait 1 year; Math 1 retest per semester Reading Math one time with permission. A writing sample is required for retest in English. Cannot be done on same day as original test. once after one week for the term of application, or after evidence of tutoring/study No retests. Students are allowed to retake the test after 3 weeks Unlimited No test University policy is not to re-test.

$10 None

ACCUPLACER

E-139

THE COLLEGE BOARD

Information

Microsoft Word - Technical Manual January 2003.doc

139 pages

Report File (DMCA)

Our content is added by our users. We aim to remove reported files within 1 working day. Please use this link to notify us:

Report this file as copyright or inappropriate

606976