Tuesday, January 14, 2014

Measuring Bilingualism

As you can probably infer form the discussion of the complex nature of defining bilingualism, the measurement of bilingual language skill is far from simple. We have seen that the range of what can constitute a definition of bilingualism falls anywhere along a continuum from full competence in one language plus limited knowledge of a second, to apparent competence in all skills in two or more languages. In addition to this, as we saw, we need also to be aware of the importance of identifying the range of variables which impact on bilingual language use—such as factors as the age of acquisition, the various domains of language use (which may be different for both languages), and the socio-cultural contexts in which the languages are used

Whom are we testing?

Because of the very varied nature of bilingualism and the myriad of different ways that people acquire some degree of bilingualism, it is helpful to delineate the circumstances under which we assess the language of people who are bilingual. Very roughly, we consider classifying bilingual speakers into three different groups:

  • Developing bilinguals
  • Stable bilinguals
  • Attriting bilinguals

Stable bilinguals are generally people for whom bilingualism is a way of life, and for whom the daily use of two languages is the norm rather than the exception. They may be either elective bilinguals or circumstantial bilinguals, although they are probably more likely to come from the latter group. They may use both languages in different contexts.

Attriting bilinguals consist of those individuals who are, for some reason, suffering from some aspect of language attrition. This may result from lack of contact and lack of use of one of their languages (and this often the case with elective bilinguals), or it may be the result of pathological factors either associated with age or as a consequence of some kind of accident.

This discussion will focus on developing bilinguals.

Valdes and Figueroa (1994) divide developing bilinguals as elective and circumstantial bilinguals.

Table 1: Characteristics and examples of elective and circumstantial bilinguals

Elective Bilinguals


Circumstantial Bilinguals

Characteristic of individuals

Choose to learn another language

Communicative opportunities usually sought artificially (e.g. in the classroom)

First language will usually remain the dominant language


Characteristics of groups

Second language required to meet needs of new circumstances

Communicative needs may relate to survival, or success; communicative needs will vary across individuals

Two languages will play a complementary role and the stronger language may vary depending on the domain


Examples of Elective Bilinguals


Examples of Circumstantial Bilinguals

A child raised with a French-speaking mother and Italian-speaking father in an English-speaking environment

A Japanese student who has learned English in order to study for a Master of Arts degree in Australia

An American man who learns Russian because he has married a Russian woman and moved to Russia

A diplomat who learns Mandarin Chinese for her job


Children raised in families where two languages are spoken both inside and outside the home

Immigrant groups who have moved to a country where another language in spoken

Indigenous groups living in countries which have been colonized

Groups whose first language is different from the prestige language of the surrounding community. (A prestige language is one which has higher statues in the community than other languages spoken, usually because it is the language of education and government.)


Assessing Bilingual Proficiency

The assessment of bilingual language proficiency is difficult in part because we are immediately confronted with the question of “what is a bilingual,” or, as Bialstok (2001a:10) puts it, “When is enough enough?”

Accepting that the standard assumption that no bilingual is ever equally competent in both languages, how much language is needed before we agree that a person is bilingual?

The answer depends on how we define language proficiency. We talk about language as though it had concrete existence and could be measured by scientific instruments.

What constitutes language proficiency is a difficult question. When we want to measure the language of bilinguals, we are often (though not always) concerned not only with measuring one language but with measuring both languages, and with measuring the interrelationship between these two languages.

We will argue that for elective bilinguals we almost never assess both languages. Rather, we assume the speaker’s proficiency in their first language, and assess only their proficiency in their second. For circumstantial bilinguals, however, the situation is rather different. This is because often we should be concerned with assessing both languages because circumstantial bilinguals use language in different contexts and to assess only one may not give us a full picture of the individual’s overall language ability.

Assessing the Bilingual Proficiency of Elective Bilinguals

When we are assessing the language of an elective bilingual, we are almost always concerned with assessing their second language: the role of their first language (the one of which they are a native speaker) does not generally come into consideration except inasmuch as it might impact on their acquisition of the second language. This is because we tend to accept native speakers as proficient speakers of the language even though, as native speakers, they may have greater or lesser vocabularies, and may each have slightly different grammars (Davis 1991b). Not all native speakers speak identically, but when we talk about native speakers we tend to think about an idealized native speaker who demonstrates full competence in language.

Measuring Language

The measurement of any phenomenon requires that initially we have a clear definition of the bounds of the phenomenon that we want to measure. With physical objects in the world this is considerably less problematic than with psychological constructs because physical entities generally are relatively clearly defined.

Language, like intelligence, cognitive ability and personality, is a psychological construct, and its measurements can involve the assessment of many different aspects of language (e.g. broadly, the four macro skills). In order to be able to begin to measure language knowledge, then we need to know what language is; we need to define the language construct.

Defining the Language Construct

What do we mean when we say that someone knows how to use a language (Spolsky 1985)? This is something to which language testers have given considerable thought because before measures can be developed, we must know what the construct is.

Language is a complex phenomenon, and there are many different aspects which can be assessed. This can include the four skills—speaking, reading, writing, and listening—and, more specifically, particular aspects of language—pronunciation, intelligibility, syntax, vocabulary, and discourse skill, to name but a few.

Another important aspect of testing that needs to be considered is that, when we are testing a person’s language, we can only elicit samples of their language performance. It is never possible to do more than this, since we cannot hope to be able to sample anyone’s language in all possible circumstances.

In evaluating a person’s linguistic performance, we have to make inferences about their ability to use the language in other contexts from those which we are able to sample in the test. Bachman and Palmer (1996: 66) argue:

If we are able to make inferences about language ability on the basis of performance on language tests, we need to define this ability in sufficiently precise terms to distinguish it from other individual characteristics that can affect test performance. We also need to define language ability in a way that is appropriate for each particular testing situation that is for a specific purpose, group of test takers, and TLU (test language use) domain.

The most widely cited model in language testing has been that Bachman (1990), further developed in Bachman and Palmer (1996), who have proposed models of communicative language ability that take into account both language knowledge and strategic competence.

Language knowledge includes the knowledge which is required to use language appropriately in particular contexts, and is made of several components (Bachman and Palmer 1996). Organizational knowledge refers to the formal properties of language required for understanding and producing language, as well as organizing language into longer stretches. It consists of grammatical knowledge and textual knowledge:

Grammatical knowledge is involved in producing or comprehending formally accurate utterances or sentences. This includes knowledge of vocabulary, syntax, phonology, and graphology.

Textual knowledge is involved in producing and comprehending texts . . . spoken or written—that consist of two or more utterances or sentences. There are two areas of textual knowledge: knowledge of cohesion and knowledge of rhetorical or conversational organization.
(Bachman and Palmer 1996: 68)

Pragmatic knowledge relates utterances, sentences or texts to the communicative goals of the people using the language and to the setting in which it is used. It can be divided into functional knowledge and sociolinguistic knowledge. Functional knowledge is made up of:

Knowledge of ideational functions enables us to express or interpret meaning in terms of our experience of the real world. These functions include the use of langue to express or exchange information about ideas, knowledge, or feelings. Descriptions, classifications, explanations, and expressions of sorrow or anger are examples of utterances that perform ideational functions.

Knowledge of manipulative functions enables us to use language to affect the world around us. This includes knowledge of the following:

Instrumental functions, which are performed to get other people to do things for us (examples include requests, suggestions, commands, and warnings)
Regulatory functions, which are used to control what other people do (examples include rules, regulations, and laws); and

Interpersonal functions, which are used to establish, maintain and change interpersonal relationships (examples include greetings and leave takings, compliments, insults, and apologies).

Knowledge of heuristic functions enables us to use language to extend our knowledge of the world around us, such as when we use language for teaching and learning, for problem-solving, and for the retention of information.

Knowledge of imaginative functions enables us to use language to create an imaginary world or extend the world around us for humorous or esthetic purposes; examples include jokes and the use of figurative language and poetry.
(Bachman and Palmer 1996: 69-70)

Sociolinguistic knowledge allows us to use language appropriate to the context in which it is being used – this may include knowledge of the most appropriate dialect to use in a particular situation, or when to use different registers, such as when to use formal register versus when use more informal registers.

Understanding of language in theoretical terms is critically important in testing because it is from this model of language that we develop our test construct—the abstract theoretical concept that is reflected in test performance.

In order to test language, therefore, we need to (1) know to describe language, and (2) know how to ‘operationalize’ it in terms which are precise enough to allow it to be measured.

In any assessment, there are two central concepts which are crucial to interpreting the outcome of the measurement. The first is validity. Validity refers to the extent to which the measurement instrument (e.g. the language test) is an appropriate measure of the phenomenon itself—in this case, language.  The second is reliability. Reliability refers to the extent to which something is measured consistently. There are various ways of measuring reliability –for example, a test may be given twice to the same group of people to see extent to which the scores are consistent across the two occasions.

Types of Test 

When we are measuring second-language proficiency, there are various purposes for which tests can be used. Henning (1987) discusses a range of types of test, of which the following are probably the most commonly used:

Proficiency tests: designed to measure a person’s language ability irrespective of the type of language experiences the person may have had. These are tests of general language proficiency where the required language proficiency is specified by a set of expectations of the types of activities the candidate would be expected to be able to do with language in order to be considered a proficient speaker of that language.  Two well-known proficiency tests are IELTS (International English Language Testing System) and TOEFL (Test of English as a Foreign Language).

Achievement tests: designed to evaluate the language learned in a specific language instruction program; such tests relate specifically to the curriculum of the course and will be designed to evaluate either progress or the final achievement of the learners.

Diagnostic tests: designed to identify areas of language strengths and weakness, generally for the purposes of providing additional and appropriate assistance at a later time.

Placement tests: designed to identify the most appropriate placement for learners in class where classes of varying proficiency levels are available.

Both language learning situations and in the testing and assessment of languages, we often think in terms of the four macro skills—speaking and writing (the productive skills) and reading and listening (the receptive skills)—and traditionally, test have often been designed to specifically evaluate each of these skills. Recently, however, there has been an increasing focus on the development of more integrated tasks which go beyond the notion of assessing each skill independently.

Test can vary on a number of different dimensions, and these dimensions have implications for both the validity and the reliability of the tests. Tests may be:

  • Direct or indirect
  • Scored objectively or subjectively
  • Criterion- or norm-referenced

Direct tests test the actual skill under investigation, while indirect tests measures the abilities which underlie the skill in which we are interested (Hughes 1989). To illustrate this difference, Hughes uses the example of writing an essay. In a direct test, we would ask the candidate to write an essay, which would then be scored; in an indirect test, we might devise a set of grammar correction items in which the candidate was required to identify the incorrect item and revise it. While this would not be a direct test of the candidate’s ability to write an essay, such skills are strongly correlated with the ability to write an essay.

Test can be scored either objectively or subjectively. Objective scoring does not require any judgment on the part of the assessor. Subjective scoring refers to the type of assessment where judgments need to be made by a rater, which usually involves some measure of judgment and expertise.

Rating scales may either be holistic—where the mark is assigned on the basis of a holistic evaluation of the candidate’s speech or writing—or analytic—where several scores are assigned on the basis of different aspects of the speech or writing.

Reliability is always of concern in subjective scoring since raters tend vary in their harshness (Upshur and Turner 1999) and in the way they interpret the rating scale (Lumley 2002). Rater training attempts to address problems with rater consistency and to ensure that raters assess as fairly as possible. In some cases, statistical analyses can be used to contribute to identifying rater bias, but, because raters do tend to vary, best practice in testing would require each text to be score d by more than a single rater.

Any kind of measurement implies a comparison: this may be with other members of the group or cohort, where measurements are made and compared across the group, often in relation to a standardized set of results, or the comparison may be with some definition of ability, or criterion. The former is known as norm referencing, and the latter as criterion referencing.


Performance-based Testing

A performance-based test in which the ability of candidates to perform particular tasks, usually associated with job or study requirements, is assessed (Davis et al. 1999: 14). These kinds of test are designed to measure a candidate’s productive language skills through performances on tasks which allow the candidate to demonstrate the type of language skills which it is expected they will be required to use at some later stage.

Performance test s appear to enhance validity by eliciting samples of the  type of language that will be required in future situations, but Darling-Hammond (1994) offers a cautionary note about such tests, pointing out that performance-based assessments do not inherently mean that testing will be more equitable.

Alternative Types of Assessment

There are also alternative ways of making assessments about language knowledge, and these include portfolio assessments where individuals collect a number of samples of their work which demonstrate their language ability.

There are also competency-based assessments where language knowledge is evaluated in relation to the learner’s ability to perform particular task competencies under various conditions.

While more communicative models of language testing are currently prominent, there is also a move towards more holistic, qualitative assessment. These alternative modes of assessment offer more qualitative assessments of the learner’s abilities on the whole, but also involve a more time-consuming commitment on the part of the assessor and the learner, and do not provide comparable data in the way more traditional types of tests offer.

One the whole, language tests and language assessments are designed for elective bilinguals: that is, they evaluate the language of the second-language learners against the norm of a native speaker of that language (Valdes and Figueroa 1994). Circumstantial bilinguals fall into a rather different category, and the question of whether they should be evaluated against native-speaker norms is one which has been widely discussed. Grosjean (1989) in a seminal article argued strongly for the view that a bilingual should not be seen as the sum of two monolinguals. Taking the point further, De Groot and Kroll (1997: 2) argue that bilingualism cannot be viewed as simply the sum of two monolingual minds but that we need to take account of the interaction between the two languages. Further, they argue that this interaction is a complex one, the investigation of which will require detailed understanding of monolingual knowledge of language as well as of the bilingual knowledge of languages.

Assessing the Language of Circumstantial Bilinguals

When we are considering assessing circumstantial bilingual language, we need to think in terms of domains of language us. While these domains will be linguistically differentiated for the monolingual speaker, it is frequently the case that they are differentiated for the circumstantial bilingual. With adult circumstantial bilinguals, we might want to consider whether they have been educated in one or both of their languages—and, related to that, what level of educational attainment has been achieved, and whether this was equivalent in both languages.

It may be that one of the languages is more dominant in certain domains, while other language is dominant in a different set of domains. Language functions occur in different domains, and the skills required for one are not necessarily instantaneously transferable to the other.

The issue of the domain of language use is clearly one which needs to be taken into consideration since a circumstantial bilingual may not have equal competence in the performance of both their languages in all domains available.

Assessing Bilinguals of Circumstantially Bilingual Children

Assessing the language of circumstantially bilingual children comes from two main sources. One is the assessment of children in the educational system, where there is often quite substantial standardized testing of content material. The other is the assessment of children who may have specific language impairment, which should be diagnosed as early as possible so that they may be provided with appropriate treatment.

The use of norm-referenced tests is widely problematic for assessing children being raised in a situations of circumstantial bilingualism because such tests have generally been normed on populations which do not include bilingual children.

The importance of appropriately assessing circumstantially bilingual children has been widely recognized because of the potential language problems to result in poor school performance, and subsequently to limit life choices. Cummins (1979, 1984) makes a distinction between basic interpersonal communicative skills and cognitive academic language proficiency (more recently termed conversational language proficiency and academic language proficiency respectively in Cummins 2000).

The model remains useful one given the fact that as children go though the formal educational system, they need to acquire very different language skills form those that they are using routinely at home. The distinction between basic conversational proficiency and academic language proficiency is a very important one because, while children may reach basic conversational fluency between within two years, it takes between five and seven year s to attain academic language proficiency (Strand and Demie 2005).   

Assessing Bilingual Children for Specific Language Impairment

10 percent of the population of normal children entering the school system will be affected by some sort of speech disorder; form this it follows that a similar proportion of bilingual children will be affected (Holm et al. 1999). The likelihood is that when children or their parents are advised that the child should see a speech pathologist they may encounter a number of difficulties. These include the high chance that the speech pathologist will not be familiar with at least one of the child’s languages and standardized assessment instruments will not be available in at least one of their languages.

Assessment used with children also need to be age-appropriate, and should not be confronting for the child. Hasselgreen (2005) points out that there is now a general consensus on the features that assessment activities for young children should encapsulate. These are:

  • Tasks should be appealing to the age group, interesting and captivating, preferably with elements of game and fun
  • Many types of assessment should be used, with the pupil’s, the parents’ and the teachers’ perspective involved
  • Both the tasks and the forms of feedback should be designed so that the pupil’s strengths are highlighted
  • The pupil should, at least under some circumstance, be given support in carry out the tasks
  • The teacher should be given access to and support in understanding basic criteria and methods for assessing language ability
  • The activities used in an assessment should be good learning activities in themselves

Self-assessment of bilingual proficiency

There are two ways in which we can obtain information about a bilingual’s language from self-assessments.
One approach is thought the use language the use of language background scales in which bilinguals are asked to provide information about their language use with a series of questions which investigate to whom they speak each of their languages, and how often they do so.  Baker (2206: 33-34) points out that these kinds of scales necessarily have some limitations because they are generally not exhaustive of targets (people) or of domains (contexts).

While language background scales have been used extensively in bilingual research, they do not purport to measure they proficiency of speakers, rather their patterns of language use across a variety of domains.  Proficiency can however, be measured through self-assessment questionnaires, and although these have limitations they can serve a useful purpose. Little (2005: 321-322) argues that there are three main reasons for using self-assessments: firstly, for learners to be able to assess their progress in terms of curriculum they are learning; secondly, to encourage learners to regard assessment as a shared responsibility; and thirdly, to allow learners to identify occasions in which the target language can be used for addition explicit language learning.   


Ng Bee Chin and Gillian Wigglesworth. Bilingualism: An Advanced Resource Book. Great Britain: Cromwell Press, 2007. Print.
 

No comments: