Thursday, April 11, 2019
The Past, Present, and Future of Automated Scoring Essay Example for Free
The Past, Present, and Future of Automated Scoring turn outNo sensible closing evict be made all longer with expose taking into account non lonesome(prenominal) the realism as it is, further the world as it leave alone be Isaac Asimov (5)IntroductionAlthough some realities of the classroom take a breather constant they wouldnt exist without the presence, whether actual or virtual, of learners and teachers the engine room age is changing not but the sort that we teach, but in addition how learners learn. firearm the implications of this affect all disciplines, it is acutely evident in the principle of create verbally. In the last twenty years, we have seen a rapid diverseness in how we read, release, and plow text. Compo personateionist Carl Whithaus maintains that constitution is becoming an increasingly multimodal and mul clock timedia activity (xxvi). It is no surprise hence, that there be currently ascorbic acid one million million blogs in exis goce worldwide and 171 billion email messages sent daily (Olson 23), and the trend toward digitally-based opus is also moving into the classroom. The typical student today writes al most(prenominal) wholly on a computing device, typically one equipped with automate tools to help them spell, check grammar, and even choose the right words (Cavanaugh 10). Furthermore, 300 notes that increasingly, classes and programs in piece of report require that students compose digitally (785).Given the effect of applied science on piece of music and the current culture of high stakes political campaign ushered in by the mandates of the No Child left(a) Behind stage of 2001, a seemingly natural product of the combination of the ii is electronic computer-based estimate of write. An belief still in its infancy, the process of technological change in combination with federal examineing mandates has outgrowthed in several nominates incorporating computer-based interrogatory into t heir piece of music perspicacitys, not only because of students widespread familiarity with computers, but also because of the demands of college and the temptplace, where word-processing skills atomic number 18 a essential (Cavanaugh 10).Although it last outores sense to have students accustomed to comprise on computer write in the equal mode for high-stakes tests, does it line sense to score their penning by computer as well? This is a contr oversial question that has both supporters and detractors. Supporters like Stan Jones, Indianas Commissioner of Higher Education, believe that computerized essay razing is required (Hurwitz n.p.), while detractors, in general pedagogues, assert that such appraisal defies what we grapple more or less compose and its assessment, because regardless of the reasonable all opus is social accordingly, solvent to and evaluation of writing are human activities (CCC 786).Even so, the naive realism is that the law requires inter rogation nationwide, and in all prob force that mandate is not going to change anytime soon. With NCLB up for revision this year, even politicians like Sen. Edward Kennedy of Massachusetts agree that standards are a good idea and that testing is one way to ensure that they are met. At some point, we need to pull away from all-or-none polarisation and create a untested paradigm. The sooner we pass that computer technology will subsume assessment technology in some way (Penrod 157), the sooner we will be able to address how we, as teachers of writing, quarter use technology effectively for assessment.In the past, Brian Huot notes that teachers responses have been reactionary, cobbled together at the last split second in response to an outside call (150). Teachers need to be proactive in addressing technological overlap in the composition classroom, because if we dont, others bed will impose certain technologies on our teaching (Penrod 156). Instead of passively leaving the de velopment of assessment software solely to programmers, teachers need to be actively involved with the process in order to ensure the application of sound pedagogy in its creation and application.This essay will argue that automated essay gain ground (AES) is an inevitability that bids many more positive possibilities than negative ones. trance the research presented here spans K-16 education, this essay will primarily address its application in secondary environments, primarily foc using on high school juniors, a group currently consisting of approximately 4 million students in the United States, because this group represents the targeted population for secondary school high stakes testing in this country (U.S. Census Bureau). It will first present a brief history of AES, then explore the current state of AES, and finally consider the implications of AES for writing instruction and assessment in the future.A abbreviated History of Computers and AssessmentThe first time regula te objective testing in writing occurred was in 1916 at the University of Missouri as part of a Carnegie Foundation sponsored study (Savage 284). As the 20th vitamin C continued, these tests began to grow in popularity because of their efficiency and perceived reliability, and are the cornerstone of what Kathleen Blake Yancey describes as the first wave of writing assessment (484). To articulate the progression of composition assessment, Kathleen Blake Yancey identifies three distinct, yet overlapping, waves (483). The first wave, occurring approximately from 1950-1970, primarily cerebrate on using objective (multiple choice) tests to assess writing simply because, as she quotes Michael Williams, they were the best response that could be tied to testing theory, to institutional need, to cost, and ultimately to efficiency (Yancey 489).During Yanceys first wave of composition assessment, some other wave was jumping in the parallel universe of computer software design, where devel opers began to address the possibilities of not only schedule computers to mimic the process of human reading, but to emulate the value judgments that human readers make when they read student writing in the context of large scale assessment (Herrington and Moran 482). Herrington and Moran identify The Analysis of Essays by Computer, a 1968 book by Ellis Page and Dieter Paulus, as one of the first composition studies books to address AES.Their goal was to appraise student writing as reliably as human readers, and they attempted to identify computer-measurable text features that would correspond with the kinds of intrinsic features that are the reason for human judgments , settling on thirty quantifiable features, which included essay length in words, average word length, amount and kind of punctuation, number of common words, and number of spelling errors (Herrington and Moran 482). In their study, they found a high enough statistical correlation, .71, to support the use of the computer to score student writing. The authors note that the response of the composition community in 1968 to Page and Pauluss book was one of indignation and uproar.In 2007, not much has changed in terms of the composition communitys position regarding computer-based assessment of student writing. To many, it is something that is an unknown, recondite Orwellian entity waiting in the shadows for the perfect moment to jump out and usurp teachers autonomy in the classroom. Nancy Patterson describes computerized writing assessment as a horror story that may come sooner than we pull ahead (56). Furthermore, P.L. doubting Thomas offers the following question and response How can a computer determine accuracy, originality, valuable elaboration, assoil language, language maturity, and a long list of similar qualities that are central to assessing writing? Computers cant. WE must ensure that the human element remains the dominant factor in the assessing of student writing (29).Herri ngton and Moran make the paying back a central one in the teaching of writing and have serious concerns somewhat the potential effects of machine reading of student writing on our teaching, on our students learning, and therefore on the profession of English (495). Finally, CCC definitively writes, We oppose the use of machine-scored writing in the assessment of writing (789). While the argument against AES is clear here, the responses appear to be based on a lack of on a lower floorstanding of the technology and an unwillingness to change. Instead of taking a reactionary position, it might be more constructive for teachers to slang the inevitability of computerized assessment technology it is not going away and to use that assumption as the basis for taking a proactive role in its implementation.The Current Culture of High-Stakes TestingAt any given time in the United States, there are approximately 16 million 15-18 year-olds, the mass of whom receive a high school educatio n (U.S. Census). Even when factoring in a utmost of 10 percent (1.6 million) who may drop out or otherwise not receive a diploma, there is a significant amount of students, 14-15 million, who are attending high school. The majority of these students are members of the reality school system and as such must be tested annually according to NCLB, though the most significant focus group for high-stakes testing is 11th manakin students.Currently in clams, 95% of any given public high schools junior population must sit for the MME, Michigan Merit Exam, in order for the school to qualify for AYP, Adequate Yearly Progress1. Interestingly, those students do not all have to pass currently, though by 2014 the government mandates a 100% passing rate, a number that most admit is an impossibility and will probably be addressed as the NCLB Act is up for review this year. In the past, as part of the previous 11th outrank examination, the MEAP, Michigan educational Assessment Program, required s tudents to complete an essay response, which was assessed by a variety of people, mostly college students and retired teachers, for a minimum amount of money, usually in the $7.50 $10.00 per second range.As a side note, neighboring Ohio sends its writing test to North Carolina to be scored by workers receiving $9.50 per hour (Patterson 57), a wage that dissipated food employees make in some states. Because of this, it was consistently difficult for the state to assess these books in a short menstruum of time, causing huge delays in distributing the results of the exams back to the school districts, posing a huge problem as schools could not use the testing information in order to address educational shortfalls of their students or programs in a timely manner, one of the purposes behind getting prompt feedback.This year (2007), as a result of increased graduation requirements and testing mandates driven by NCLB, the Michigan Department of Education began administering a new-fa ngled examination to 11th graders, the MME, an ACT fueled assessment, as ACT was awarded the testing contract. The MME is comprised of several sections and required most high schools to administer it over a period of 2-3 days. Day one consists of the ACT + Writing, a 3.5 hour test that includes an litigious essay.Days two/three (depending on district implementation), consist of the ACT WorkKeys, a basic work skills test of math and English, further mathematics testing (to address curricular topic not covered by the ACT + Writing), and a social studies test, which incorporates another essay that the state combines with the argumentative essay in the ACT + Writing in order to determine an overall writing score. Miraculously, under the auspices of ACT, students received their ACT + Writing scores in the mail approximately three weeks later on testing, unlike the MEAP, where some schools did not receive test scores for six months. In 2005, a MEAP authoritative admitted that the cost of scoring the writing assessment was forcing the state to go another route (Patterson 57), and now it has.So how is this related to automated essay scoring? My hypothesis is that as states are required to test writing as part of NCLB, there is going to be a lack of qualified people to be able to read and assess student essays and determine results within a reasonable amount of time to purposefully inform necessary curricular and instructional change, which is supposed to be the point of testing in the first place. Four million plus essays to evaluate each year (sometimes more if more writing is required, like Michigan requiring two essays) on a national level is a huge amount. Michigan Virtual Universitys Jamey Fitzpatrick says, Lets face it. Its a very labor-intensive project to sit down and read essays (Stover n.p.). Furthermore, it only makes sense that instead of states working on their own test management, they will contract state-wide testing to larger testing agencies, lik e Michigan and Illinois have with ACT, to reduce be and improve efficiency. Because of the move to contract ACT, my guess is that we are moving in the direction of having all of these writings scored by computer.In email correspondence that I had with Harry Barfoot at Vantage Learning in early 2007, a company that creates and markets AES software, verbalise, Ed Roeber has been to visit us and he is the high stakes assessment guru in Michigan, and who was part of the MEAP 11th grade becoming an ACT test, which Vantage will end up existence part of under the covers of ACT. This indicates the inevitability of AES as part of high-stakes testing. In spite of the fact that there are no states that rely on computer assessment of writing yet, state education officials are aspect at the potential of this technology to limit the need for costly human scorers and reduce the time require to grade tests and get them back in the hands of classroom teachers (Stover n.p.). Because we live in an age where the figure axe frequently cuts funding to public education, it is in the interest of states to save money any way they can, and states stand to save millions of dollars by adopting computerized writing assessment (Patterson 56).Although AES is not a reality yet, all property is that we are moving toward it as a solution to the cost and efficiency do its of standardized testing. Herrington and Moran observe that pressures for common assessments across state public K-12 systems and higher education both for placement and for proficiency testing make attractive a machine that promises to assess the writing of large numbers of students in a fast and reliable way (481). To date, one of the two readers (the other is still human) for the GMAT is e-Rater, an AES software program, and some universities are using Vantages WritePlacerPlus software in order to place first year university students (Herrington and Moran 480). However, one of the largest obstacles in bringing A ES to K-12 is one of access. In order for students writing to be assessed electronically, it must be inputted electronically, meaning that every student will have to compose their essays via computer.Sean Cavanaghs article of two months ago maintains that ACT has already suggested delivering computers to districts who do not have sufficient technology in order to accommodate technology differences (10). As of last month, March 2007, Indiana is the only state that relies on computer scoring of 11th grade essays for the state-mandated English examination (Stover n.p.) for 80 percent of their 60,000 11th graders (Associated Press), though their Assistant Superintendent for Assessment, Research, and Information, West Bruce, says that the states computer software assigns a confidence rating to each essay, where low confidence essays are referred to a human scorer (Stover n.p.). In addition, in 2005 West Virginia began using an AES program to grade 44,000 meat and high school writing sam ples from the states writing assessment (Stover n.p.). At present, only ten percent of states currently incorporate computers into their writing assessments, and two more are piloting such exams (Cavanagh 10). As technology croaks more accessible for all public education students, the possibilities for not only computer-based assessment but also AES become very real.Automated Essay ScoringWeighing the technological possibilities against logistical considerations, however, when might we watch to see full-scale implementation of AES? Semire Dikli, a Ph.D. candidate from Florida State University, writes that for practical reasons the transition of large-scale writing assessment from newspaper to computer delivery will be a gradual one (2). Similarly, Russell and Haney suspect that it will be some years before schools generally develop the capacity to administer big assessments via computer (16 of 20).The natural extension of this, then, is that AES cannot happen on a large-scale until we are able to provide conditions that allow each student to compose essays via computer with Internet access to upload files. At issue as well is the reliability of the company contracted to do the assessing. A March 24, 2007 Steven Carter article in The Oregonian reports that access issues resulted in the state of Oregon canceling its contract with Vantage and signing a long-run contract with American Institutes for Research, the long-standing company that does NAEP testing. Even though the state tests only reading, science, and math this way (not writing), it provided indicates that reliable access is an ongoing issue that must be resolved.Presently, there are four commercially available AES systems Project Essay Grade (Measurement, Inc.), Intelligent Essay Assessor (Pearson), Intellimetric (Vantage), and e-Rater (ETS) (Dikli 5). All of these incorporate the similar process in the software, where First, the developers identify relevant text features that can be extracte d by computer (e.g., the likeness of the words used in an essay to the words used in high-scoring essays, the average word length, the absolute frequency of grammatical errors, the number of words in the response). Next, they create a program to extract those features. Third, they combine the extracted features to form a score. And finally, they evaluate the machine scores empirically,(Dikli 5).At issue with the programming, however, is that the weighting of text features derived by an automated scoring system may not be the same as the one that would result from the judgments of writing experts (Dikli 6). There is still a significant difference between statistically optimal approaches to measurement and scientific or educational approaches to measurement, where the aspects of writing that students need to focus on to improve their scores are not the ones that writing experts most value (Dikli 6). This is the tension that Diane Penrod addresses in Composition in Convergence that w as mentioned earlier, in which she recommends that teachers and compositionists become proactive by getting involved in the creation of the software instead of leaving it exclusively to programmers.And this makes sense. Currently, there are 50-60 features of writing that can be extracted from text, but current programs only use to the highest degree 8-12 of the most predictive features of writing to determine scores (Powers et. al. 413). Moreover, Thomas writes that composition experts must determine what students learn about writing if that is left to the programmers and the testing experts, we have failed (29). If compositionists and teachers can enmesh themselves in the creation of software, working with programmers, then the product would likely be one that is more palatable and suitable based on what we know good writing is. While the aura of mystery behind the creation of AES software is of concern to educators, it could be easily addressed by education and involvement. CCC r easons that since we can not know the criteria by which the computer scores the writing, we can not know whether particular kinds of bias may have been built into the scoring (489). It stands to reason, then, that if we take an active role in the development of the software, we will have more control over issues such as bias.Another point of contention with moving toward computer-based writing and assessment is the concern that high-stakes testing will result in students having a narrow view of good writing, particularly those moving to the college level, where writing skill is expected to be more comprehensive than a prompt-based five-paragraph essay written in 30 minutes. Grand Valley State Universitys Nancy Patterson opposes computer scoring of high stakes testing, saying that no computer can evaluate subtle or creative styles of writing nor can they judge the whole tone of an essays intellectual content (Stover n.p.). She also writes that standardized writing assessment is alr eady having an unfavorable effect on the teaching of writing, luring many teachers into more formulaic approaches and an over-emphasis on surface features (Patterson 57).Again, education is learn here, specifically teacher education. Yes, we live in a culture of high-stakes testing, and students must be prepared to write successfully for this genre. But, test-writing is just that, a genre, and should be taught as such just not to the detriment of the rest of a writing program something that the authors of Writing of Demand assert when they write We believe it is possible to meld writing on demand into a plan for teaching based on best practices (5). AES is not an attack on best practices, but a tool for cost-effective and efficient scoring. Even though Thomas warns against the demands of standards and high stakes testing becoming the entire writing program, we still must realize that computers for composition and assessment can have positive results, and many of the roadblocks to more effective writing instruction the paper load, the time involved in writing instruction and assessment, the need to address surface features individually can be lessened by using computer programs (29).In addition to pedagogical concerns, skeptics of AES are suspicious of the companies themselves, particularly the aggressive marketing tactics that are used, particularly those that teachers perceive to be threats not only to their autonomy, but their jobs. To begin, companies aggressively market because we live in a capitalist society and they are out to make money. But, to cite Penrod, both computers and assessment are by-products of capitalist thinking applied to education, in that the two reflect speed and efficiency in textual production (157). This is no disparate than the first standardized testing experiments by the Carnegie Foundation at the beginning of the 20th Century, and it is definitely nothing new. Furthermore, Herrington and Moran admit that computer power has increased exponentially, text- and content- analysis programs have become more plausible as replacements for human readers, and our administrators are now the targets of heavy marketing from companies that offer to read and evaluate student writing quickly and inexpensively (480).In addition they see a threat in companies marketing programs that define the task of reading, evaluating, and responding to student writing not as a complex, demanding, and rewarding aspect of our teaching, but as a charge that should be lifted from our shoulders (480). In response to their first concern, teachers becoming involved in the process of creating assessment software will help to define the task the computers perform. Also, teachers will always read, evaluate, and respond, but probably differently. not all writing is for high-stakes testing. Secondly, and maybe Im alone in this (but I think not), but Id love to have the tedious task of assessing student writing lifted from my plate, oddly on sunny weekends when Im stuck inside for most of the daylight hours assessing student work. To be a dedicated writing teacher does not necessarily involve martyrdom, and if some of the tedious work is removed, it can give us more time to actually teach writing. Imagine thatThe Future of Automated Essay ScoringOn March 14th, 2007, an article appeared in Education Week that says that beginning in 2011, the study Association for Educational Progress will begin conducting the testing of writing for 8th and 12th grade students by having the students compose on computers, a decision unanimously approved as part of their new writing assessment framework. This new assessment will require students to write two 30-minute essays and evaluate students ability to write to persuade, to explain, and to convey experience, typically tasks deemed necessary both in school and in the workplace (Olson 23).Currently, NAEP testing is assessed by AIR (mentioned above), and will no doubt incorporate AES for assessing these writings. In response, Kathleen Blake Yancey, Florida State University professor and president-elect of NCTE, said the framework Provides for a more rhetorical view of writing, where purpose and audience are at the midriff of writing tasks, while also requiring students to write at the keyboard, providing a direct link to the kind of writing writers do in college and in the workplace, thus bringing assessment in line with lifelong composing practices (Olson 23). We are on the cusp of a new era.With the excitement of new possibilities, though, we must remember, as P.L. Thomas reminds us, that while technology can be a wonderful thing, it has never been and never will be a panacea (29). At the same time, we must also discard our tendency to avoid change and embrace the overwhelming possibilities of incorporating computers and technology with writing instruction. Thomas also says that writing teachers need to see the inevitability of computer-assisted writing ins truction and assessment as a great opportunity.We should work to see that this influx of technology can help increase the time students spend actually composing in our classrooms and increase the amount of writing students produce (29). Moreover, we must consider that the methods used to program AES software are not very different than the rubrics that classroom teachers use in holistic scoring, something Penrod identifies as having numerous subsets and criteria that do indeed divide the students work into pieces (93). I argue that our time is better spent working within the system to ensure that its inevitable changes reflect sound pedagogy, because the trend that were seeing is not substantially differently from previous ones. The issue is in how we choose to address it. Instead of eschewing change, we should embrace it and make the most of its possibilities.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment