Construct Comparison Between the Language Proficiency Interview (LPI) and the PhonePassTM Test



During July and August, 1998, Ordinate Corporation conducted a correlational study of PhonePass test scores in relation to TOEIC and LPI test scores. The correlational study measured the degree of linear relationship between the sets of score pairs observed for the LPI and PhonePass candidates, and determined the proportion of variance in the LPI scores that could be accounted for by the PhonePass scores. These measures need to be understood in the context of the design of the two tests and the intended meaning (and use) of their respective scores. This memorandum reviews what a construct is, analyzes the intended construct of the LPI and PhonePass tests, and then compares the intended constructs measured by the two tests. The comparison should yield insight into the expected range and the limits of correlation values observed for the two tests. can be drawn from the test scores. The validation of a test is the compilation of evidence that the test scores do, in fact, reflect the intended construct and not construct-irrelevant characteristics of the candidates or of the test itself. Test constructs can be defined in two ways: operational and theoretical. An operational definition specifies the procedures (operations) that will elicit behavior, and the specific observations that will define the presence or absence of the attribute. Theoretically, a construct should also fit within a more general framework of related psychological concepts and measures as generally understood in the relevant scientific discipline. For example, for a test of "mechanical ability" to be valid, there should be evidence that mechanical ability is an individual trait that is different from other abilities and skills, but fits into an accepted general framework of cognitive abilities. The construct is further supported if there are other


Psychological tests (including language tests) are measures of non-physical human attributes. The construct of a test is a hypothesis about the attribute (or trait) that the test is designed to measure. Psychological constructs can represent many different domains of human character and ability. For example, various psychological tests have constructs such as "mechanical aptitude" or "knowledge of chemistry" or "language proficiency". Psychological tests can also measure different specific aspects of a single domain. For example, within the domain of language proficiency, various tests focus on specific constructs such as "vocabulary" or "grammatical knowledge" or "academic writing skills" or "basic reading skills in children" or "listening comprehension" or on combinations of these more specific constructs. The "construct" of a test is important because it indicates what the test scores should mean; that is, what inference(s)

accepted measures of this construct and if the other measures correlate reasonably with this measure and with each other. Measures of one construct should not, in principle, correlate as closely with measures of another construct as they do with measures the same or closely related constructs. Measures of a specific construct should not be completely subsumable under some more general measure, and the correlation between presumably related measures should not be an artifact of the testing method or the population sample.

LPI Construct: Functional Trisection of Oral Proficiency

The LPI measures a candidate's overall speaking ability as demonstrated during a structured conversation with a trained interviewer. The LPI elicits a sample of the candidate's performance in spoken English during a continuous dialogue that is designed to put the candidate at ease. The interviewer collaborates with the candidate to find the candidate's maximum level of speaking performance.

The LPI performance is scored according to a Functional Trisection of Oral Proficiency Levels. The trisection is defined in a 6x3 grid of performance descriptors. The trisection grid is published on page 7 of the TOEIC newsletter No. 58 of May, 1997. Further description of eleven integrated summary levels is given on page 3 of that newsletter. The three elements of proficiency are: Functions: what tasks can be done with the language. Context: what topics can be addressed in the language. Accuracy: how well are the language structures used. Each of these three elements has six levels of proficiency associated with it, ranging from the 0-level (no tasks, no topics, unintelligible) up to the 5-level which represents function and performance at the level of an educated native speaker on all topics. Trained human raters assign portions of a candidate's performance to one of the defined levels on each of the three elements. The LPI may be considered a direct test that serves as an operational definition of the "oral proficiency" construct, within the limits of the test-retest reliability of the procedure. The test-retest reliability of the LPI will be imperfect, in principle, because candidate performance may vary from one interview to another, the interviewer's collaboration may vary, or raters may vary somewhat in assigning a candidate's speaking performance to the cells of the 6x3 grid. In summary, the construct of the LPI is operationally defined by the LPI procedure and the proper use of the 6x3 grid of performance level descriptors.

to the candidate, each of which elicits a spoken response from the candidate. The candidate has the entire structure of the test available in a written form, including example prompts and examples of correct responses. The PhonePass computer system presents items as prompts that are spoken at a native conversational pace. The prompts use conversational vocabulary in discrete tasks that range in linguistic difficulty from basic through advanced. The PhonePass computer system measures candidate performance based on the exact words used in the spoken responses, as well as the pace, fluency, and pronunciation of those words in phrases and sentences. The overall score is calculated as a weighted average of the sub-scores. The subscores are defined as follows: Listening Vocabulary is familiarity with everyday English words as used in spoken phrases and sentences. Repeat Accuracy is the ability to repeat utterances verbatim. Reciting represents facility in reading and repeating sentences aloud, including pronunciation. Fluency measures rhythm, phrasing, and pausing in sentence production. All scores are reported in the range from 2 through 8 on a logistic scale. The Listening Vocabulary and Repeat Accuracy sub-scores have been scaled such that the median non -native score is 5.0, and the native score at the 25th percentile is 7.5. Scores for Reciting and Fluency are optimally mapped to criterion human judgements that are described in the score reports. The PhonePass test is a direct test of facility with conversational English materials. Performance on the

PhonePass Construct: Facility with Conversational English

The PhonePass test measures a candidate's facility in grasping utterances in real time and producing relevant, intelligible responses at a conversational pace. The PhonePass test presents a series of discrete recorded prompts

PhonePass t asks provides evidence of ability to participate in English conversation to the extent that the PhonePass tasks require many of the same skills that are used in natural conversation. Following the theoretical framework of Levelt (1989), the corresponding skills include:

elements with such a test and may account for much of the Listening/understanding elements 1. hear utterance 2. recognize lexical forms 3. extract linguistic structure 4. decode propositions 5. contextualize in discourse 6. infer demand Sampled in PhonePass test items all items repeats, opposites, questions long repeats, questions questions, opposites ? all items true variance measured by oral proficiency tests.

Common and Divergent Elements

The discussion above suggests that some elements of the target constructs of the LPI and the PhonePass tests are closely related, but different. The LPI is a direct test that provides an operational definition of speaking proficiency, whereas the PhonePass test is a more indirect test from which practical inferences about conversational proficiency may be made.

Speaking elements a. conceive message b. select register, plan discourse c. build phrase structure d. select lexical items e. encode response f. articulate response

Sampled in PhonePass test items all items ? long repeats, questions opposites, questions repeats, opposites, questions all items

The most important difference is that the LPI offers candidates an opportunity to perform high-level functions with the language. LPI candidates are given an opportunity to show their skills in complex tasks, while discussing abstract, technical, or professional topics in a range of challenging hypothetical social settings. The interview can provide dialogue settings for the exercise of language functions including detail description, problem resolution, persuasive explanation, advice, and even negotiation. Among the LPI functions, the PhonePass test only exercises certain discrete elements of the LPI level 1 and 2 tasks using single promptresponse pairs. Note also that the LPI is a test of speaking only; the 6x3 grid does not cover receptive competence. Some of the eleven integrated summary levels imply an interactive competence, but listening comprehension, as such, is not directly measured or reported. This seems appropriate in a context where the LPI is administered in conjunction with the TOEIC test, which does have a strong listening component. The PhonePass test measures both listening and speaking skills, emphasizing the candidate's facility (ease, fluency, latency) in responding to unpredictable material. The LPI scoring does not focus on the interviewer's accommodations to the candidate or on the candidate's quickness in accommodating changes in conversational content or aspect.

The reliability of the PhonePass scores (? 0.94) seems adequate for many purposes and is discu ssed in the memorandum "Validity of the PhonePass Test in Employee Selection" (Burns, 1998). The predictive validity of the PhonePass Test as a measure of "oral proficiency" has been studied with a population of 51 technical visitors to a U.S. Government training program in Texas. Candidates took ILR oral proficiency interviews (conducted by government testers) and took PhonePass tests. The correlation between the ILR speaking scores and the PhonePass scores was 0.74. Compared with the ILR's inter-rater reliability of about 0.76, a correlation of 0.74 is evidence that the PhonePass Overall scores' predictive ability with respect to oral proficiency scores is comparable to that obtained from individual expert human raters In summary, the PhonePass test is not a direct test of conversational proficiency, but it shares some key construct

The PhonePass test focuses on core lin guistic structures and basic psycholinguistic abilities, while the LPI construct extends to includes some social, cognitive and world knowledge. Some of these differences may best be seen in figures, although the figures must be taken as merely suggestive because the relations between the tests are quite complex. Each test is presented as a grid of skill elements against skill levels. The shaded portion of the grid is approximately that portion of the grid that the other test seems to cover.

LPI Construct

Functions Context Accuracy High FFacility

The PhonePass test measures a more basic set of linguistic skills, with emphasis on ease and immediacy of comprehension and production. PhonePass scores may be more appropriate in predicting how fully a candidate will be able to participate in a free-form discussion among highproficiency speakers.

PhonePass Construct

Listening Speaking

5 4 3 2 1

LLow FFacility

Covered by PhonePass

Covered by LPI

The figure above is intended to suggest that PhonePass scores relate most closely to the accuracy section of the LPI scoring. Further, the left figure suggests that PhonePass scores only cover the more basic levels of the function and context sections of the LPI scoring. The right-hand figure suggests that the LPI scores cover most of the speaking facility measures of the PhonePass scoring, but probably do not cover the range of listening facility that PhonePass measures.


Burns, W. (1998): "Validity of the PhonePass Test in Employee Selection". Los Altos, California, William C. Burns & Associates. Levelt, W. J. M. (1989): Speaking: From Intention to


The LPI test measures a wider and deeper set of cognitive, social, and speaking skills than does the PhonePass test. LPI scores may be more appropriate in predicting how well a candidate will be able to explain complex concepts in a dialogue or when presenting a lecture.

Articulation. Cambridge, Massachusetts, MIT Press

