英文教师写作能力与写作评价决策：一项中国内地的探索性研究（外国语言文学学术论丛）最新章节_刘力著

3.5 Exploring rating processes: Thinkaloud protocols

As reviewed in Section 3.2.2, indirect evidence of raters’ thought process, whether extracted from textual features highly correlated with final scores or from the elicited perceptions of raters, inform what raters perceive more important for their decision-making and helps reveal part of the picture of what goes through raters’ mind during rating. However,much still remains unclear about how raters acquire and process information from the target performance and what strategies they use to make decisions and monitor their mental processes. Commonly used in cognitive psychology, educational psychology, think-aloud protocols(TAPs for short) have proved effective in exploring subjects’ text comprehension processes, learning strategies, problem-solving behaviors as well as decision-making processes (Ericsson & Simon, 1984/1993).

In the language testing field, the use of TAPs has gained an increasing popularity as a viable research methodology because there have been frequent calls for the use of a process-oriented approach to test validation (Messick, 1995). Researchers have widely adopted this technique as a means for validating assessment instrument (Cohen,1987) and for establishing more process-oriented definition of construct(Sasaki, 2002) by asking candidates to report their thoughts during or immediately after their going through test items or tasks. In the context of performance assessment, with increasing awareness of the crucial role of raters in score interpretation, researchers have also begun to seek more in-depth and direct evidence underlying raters’ judgments.

A number of studies have employed TAPs to explore raters’focuses during rating and the procedures or strategies they follow in order to arrive at the final judgment (see Table 3-3 for a brief summary of studies). The fundamental underlying assumption for TAP analysis is that information heeded as a task can be carried out represented in a limited capacity of short-term memory and can be reported following an instruction to think aloud (Ericsson & Simon, 1993). Studies on rater judgments/perceptions employing this approach show that this approach is capable of producing rich information about rater behavior and other factors related to the scoring of essays (Cumming, 1990; Cumming et al., 2001; Vaughan, 1001; Weigle, 1994; Wolfe et al., 1998).

The principle advantage of TAPs is that they can provide evidence of cognitive processes that is not colored by introspection (Ericsson & Simon, 1993). Although TAPs do not report directly on cognitive processes, they do provide expressions of thoughts from which such processes may be inferred (Green, 1998). Such evidence is particularly useful in exploring whether pre-service EFL teachers of different writing proficiency pay attention to a similar range of textual features in assessing student writing, and, whether they have similar constructs in mind when assessing writing proficiency. Furthermore, the coded TAP data can be analyzed both qualitatively and quantitatively, which is favored by many researchers in the context of performance assessment(e.g., Connor & Carrell, 1993; Cumming, 1990; Huot, 1993; Pula &Huot, 1993; Vaughn, 1991).

Compared with TAPs, other forms of verbal reports may suffer from some limitations. First, there may be the possibility of omitting important information due to memory loss during the delay between task completing and reporting. Confronting such a problem is that of the possible filtering or “tidying up” information in a retrospective report (Green, 1998). Some subjects may consciously adapt their verbal reports to make their thoughts reported more coherent and reasonable,which may therefore jeopardize the validity of the research. Second,the retrospective approach may also be restricted in its use for revealing raters’ mental processes, particularly the rating procedures in such processes.

With its methodological merit, TAPs are not free of challenges,however. Some researchers have questioned the reliability and validity of TAPs in terms of their veridicality and reactivity (Barkaoui, 2010b).There is a potential for veridicality in that the TAPs may not accurately reflect participants’ thought processes since they simply may not recall what they were thinking as they completed the scoring task. And the reactivity concerns the fact that “it alters the process being observed and/or its outcomes.” Despite their limitations, TAPs can offer plausible explanations and provide more detailed information about cognitive processes than we might obtain by simple speculation (Cohen, 1987;Green, 1998). As Lumley noted, thinking-aloud is probably the only methodology presently available that can provide researchers with any insight, however indirect, into the thinking involved in the rating process.

Results of studies using TAPs have provided direct evidence of rating criteria that confirm what earlier studies reported. There seems to be a general agreement as to what goes into a good essay or a poor one in vague terms; however, little is known about how raters distinguish between essays at different proficiency levels. It also appears that raters do not judge essay quality strictly on the basis of defined levels on a scoring guide (Vaughan, 1991; Wolfe & Kao, 1996). Due to a lack of a commonly agreed-on framework in describing raters’ mental processes,the majority of such studies are explorative in nature. The cumulative body of information from these studies, however, may well help us to form an ever-clearer picture of this complex cognitive activity. In the context of the current research, two relevant studies have been thus designed to address the relationship between raters’ writing proficiency and their assessment of writing scripts.