Hedge-O-Matic
Ryan Omizo, University of Rhode Island
Bill Hart-Davidson, Michigan State University
(Published May 12, 2016)
Enter your text in the field below
(Limit 100,000 characters)
How the Hedge-O-Matic Works
The Hedge-O-Matic tokenizes raw text at the sentence level. Each sentence is then classified as either a hedge or non-hedge by a Support Vector Machine trained on a corpus of hedgey and non-hedgey sentences culled from academic science journals.
The Hedge-O-Matic outputs results as a table of tagged sentences and a scatter plot that denotes the linear distribution of hedgey and non-hedgey sentences in the submitted text.
The Hedge-O-Matic currently has an accuracy rate of 80-84% when tested on academic science writing. Results are less robust when applied to other genres of writing.
The Hedge-O-Matic relies upon libraries from the Natural Language Toolkit (Bird) and scikit-learn (Pedragosa, et. al.) for its text processing and classification. Some of the modules used in the Hedge-O-Matic have also been adapted from (Perkins 32-34). This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The Hedge-O-Matic is, above all, a proof of concept. The human coding used for the machine learning processes has not been validated for interrater reliability. Results should be taken as provocative, not as gold standards.
We do not store user submitted data in our application. Issues or problems experiences can be addressed to the email address in the C|R|G section of the site.
ABOUT
What do you mean hedging?
Hedging is a rhetorical strategy employed to make statements less definitive (Hübler 233), qualify the scope the scope of a position, limit the rhetor's commitment to a position (Lukka and Markkanen; Prince, Fader, and Bosk), or minimize potential controversy in order to render a statement more appealing or appropriate to audiences(Fetzer; see also Clemens; Fraser).
Serious theoretical interest in the phenomenon of hedging has been traced to Lakoff. In "Hedges: A Study in Meaning Criteria and the Logic of Fuzzy Concepts," Lakoff explores how certain words and phrases make some definitional statements "fuzzy" by marking degrees of truthfulness and category membership or implying the existence of additional properties that shape the meaning of an expression. One example Lakoff deploys frequently in his essay is "very tall man," where the category "tall" contains a fuzzy truth value that denotes a range of values. The modifier "very" addresses this range by suggesting a degree of fidelity to the concept of "tall." We might contrast this categorical hedge with "not so tall woman," which implies that within the category of "tall women," the woman in question grades lower than others (but she likely grades high in the category of "short women"). Lakoff's incorporation of fuzzy logic in the analysis of formal semantics builds off of earlier work by Zadeh (5), which applied fuzzy set theory to the problem of natural language. For Zadeh, hedgey words such as "very," "somewhat," "quite," and "essentially" function as operators on a theoretical universal set of discourse, organizing set membership according to varying levels of fixity (see also Huynh, Ho, and Nakamori 203-223).
In contrast to the work on the truth-value of propositional hedging by Lakoff and Zadeh, Brown and Levinson focus on hedging as a politeness strategy that facilitates social agreement by managing the illocutionary force of a statement. Illocutionary force, as theorized by Austin (99-100), signifies the intentionality of a communicator for issuing an utterance in the performance of an illocutionary act. For example, the question "Did you take out the trash?" on its face solicits information about an event in the world. However, the illocutionary force of this question might also function as a directive for the listener to dispose of the trash (and the perlocutionary force would relate to whether or not the listener did remove the trash, although an account of speech act theory is beyond the scope of this article and this app). For Brown and Levinson (145-146), hedging as a means to foster cooperation involves the weakening, strengthening, and blurring of the intent of such utterances (what Searle [348] might call the "illocutionary strength" of an utterance). Observe the difference between "Did you take out the trash?" and "Did anyone take out the trash?" In both cases, the illocutionary force of the question might contribute to a directive to remove the trash. However, the "you" of the second question serves as a more pointed targeting device to compel action from a specific person. The effect would likely be more pronounced if in both cases there is only one recipient of the question. The "anyone" might suggest that trash removal is a general responsibility for those in a household, which contrasts with trash disposal as the responsibility of a particular individual. Consequently, failure to fulfill the former obligation is a generalized failing. However, failure to fulfill the latter obligation resolves to personal negligence. Because of Brown and Levinson's reliance on speech act theory to anchor their understanding of hedging in politeness strategies, Fraser has glossed this approach as "speech act hedging" (18).
More recent work has situated hedging within a computational rhetoric problem-space. Di Marco, Kroon, and Mercer linguistic markers of hedging to classify citations in scientific articles with an automation algorithm (247-263). Di Marco, Kroon, and Mercer find that hedging sentences often co-occur with citations, and represent the ways in which authors frame their research in relation to existing scholarship. Di Marco, Kroon, and Mercer consequently argue that understanding when and if authors of scientific research articles are hedging in conjunction with their citation practices can lead to a more fine-grained evaluation of how citations are being used rhetorically, which would then enable a higher degree of nuance when tracking articles for relevancy and influence in information retrieval tasks in the form of a smarter indexing tool (247-248). While the goals of Di Marco, Kroon, and Mercer diverge from our own (and we have no knowledge of the specific algoritms they employ in their research), we share with them a common computational understanding of hedging as an important and high-fidelity linguistic signal from which we can extract a generalizable feature set (see Di Marco, Kroon, and Mercer 252).
Guided by the above literature, we delineate hedges into two categories for the sake of coding hedge moves using the Hedge-O-Matic: propositional and interpersonal. Although the hedge types we coded for would no doubt be seen in a more nuanced way than these two categories by scholars in linguistics, our approach distinguishes between two broad rhetorical purposes. One, propositional, refers to qualifying a claim based on the strength of evidence; the other, interpersonal, refers to making a statement less direct so as to show deference or politeness, or to otherwise preserve a social relationship.
Propositional hedges qualify claims of knowledge or truth in order to draw attention to the limitation or contingency of that knowledge or the communicator's lack of certainty about that knowledge. For example, "These conclusions are based on a relatively small sample size and should be treated carefully." Other common markers for propositional hedges are modal verbs such as "can," "could," "might," and "may" and/or verbs such as "indicate," "suggest," or "imply." For example, one type of sentence that we have often encountered in the Discussion/Conclusion sections of scientific articles is "These results suggest that X is a likely outcome of increased carbon emissions."
Our category of propositional hedges hews closest to Prince, et al.'s definition of propositional hedges (93), with one notable departure. In their taxonomy of hedges, Prince et al. divide hedges into "approximators" and "shields" (86-88). For Prince, et al., approximators govern propositional hedges, and function by marking the level of typicality a knowledge claim demonstrates. In a sense, these are the types of fuzzy hedges studied by Zadeh and Lakoff. For example, a hedge that describes an object as "kind of spicy" suggests to audiences that the object is part of a class of foods that is spicy but also some gradations away from prototypical spiciness. Prince, et al.'s version of propositional hedges also includes "rounders," which include moves to collapse fine-grained details (e.g., "Let's meet at around noon" as opposed to "Let's meet at 12 p.m."). Shield hedges are attempts by a communicator to limit his/her commitment to a position, and function as defense actions that allow communicators to insulate their status from criticism. Prince et al., further divide shields into "plausibility shields" and "attribution shields." The former denotes hedges in which communicators make conspicuous their uncertainty about a knowledge clam because hard evidence is lacking or because the claim and conclusions derive from a process of inference that relies on conceptual leaps made by the communicator. One example of plausibility hedging might be a sentence such as "Based on these figures, we can plausibly conclude that Americans still favor owning larger cars." For Prince, et al., the key distinction between a plausibility shield and an approximator is that the epistemic content offered in the claim is not framed as uncertain by the communicator. What is more ambivalent is the communicator's adhesion to the claim. Attributions shields work by laying the responsibility of a knowledge claim on another, which fosters distance between the communicator and the his/her source of information. An example of an attribution shield might be "Roger Ebert argues that video games cannot be considered art." The reader will notice that the type of statements encompassed by Prince et al.'s shields are included in our definition of propositional hedges. We are uncomfortable with the configuration of knowledge implied by Prince et al.'s typology. This configuration suggests that the epistemic value of a knowledge claim is not modulated by the author's positioning of it in relation to his/her own position. This configuration further suggests that the epistemic value of a knowledge claim inheres in the ways that the claim represents the world/facts/situations and not how it might be attempting to influence reception. As rhetoricians, we are attentive to how knowledge claims reflect and influence the world and are also inescapably drawn to the concept of ethos, which would hold that the belief the audience accords a claim is partly vitiated by the character, expertise, and reputation of the author. Thus, how closely the author aligns him/herself with the conclusions or attributions of a claim will affect its propositional content.
Interpersonal hedges intend to mitigate potential insult to a listener or reader and retain goodwill. This formulation draws on Brown and Levinson's imbrication of hedging in politeness strategies aimed at facilitating cooperation among interactants through positive assertions of sociality (e.g., "It would be so great if I could get your response by this afternoon") and/or redressive moves that offset violations of etiquette or other norms (e.g. "Kale is disgusting, but I'm really only speaking for myself" or "I hate to brag, but . . .") (145-171). That said, while we are indebted to Brown and Levinson and the work done in speech act theory (145-147), we want to make clear that we cite these approaches as a means to both highlight our theoretical trajectory and signal lines of flight away from the orthodox scholarship on interpersonal hedging.
One key departure is that interpersonal hedges involve a rich complexity of illocutionary forces and acts, as documented in the linguistics literature. Not all interpersonal hedging cues--for example, phatic connectives such as "er..."--are easily analyzed by the technology used in the Hedge-O-Matic. That said, we are interested in what Searle and Vanderveken call "illocutionary force indicating devices" (110), which linguistically and syntactically convey the illocutionary force of a statement. One example we have seen when working with comments that reviewers offer to writers is the use of "I think" before a specific suggestion for a revision. For example, a reviewer might say, "I think if you added another reference to this sentence it would strengthen your claim." The use of "I think" acts to soften what otherwise might sound like a direct order to make a change. Because these devices rely on the surface features found within a sentence, they can be screened by the Hedge-O-Matic and added to a feature set that will determine the hedgey-ness and non-hedgey-ness of a sentence.
Why use hedging in scientific articles to train the Hedge-O-Matic?
Hedging is an interesting rhetorical move in many types of discourse, but in science propositional hedging in particular is essential. We looked to scientific journal articles to provide training material for the Hedge-o-Matic for several reasons that follow from the importance of hedging to this genre. First, we knew that hedges would be there and that they issue detectable rhetorical signals through a delimited set of linguistic particles. Second, we could expect that the occurence of hedging in a scientific article is governed by regular genre conventions, most notably the Introduction Methods Results and Discussion (IMRAD) structure of research publications. This told us where to look for hedge moves. Thus, the choice of hedging in scientific articles as our object of study has given us a testable problem. The targeting of hedging in science articles also made locating training material much easier: not only are there a multitude of electronic sources, but human raters can easily zoom in on the Introduction and Discussion sub-headings to find usable hedging sentences.
In addition, there is a rich literature to draw from to understand propositional hedging in the rhetoric of science. The centrality of hedging as a signature move in scientific articles is a well established topic of inquiry. Hyland has written extensively about how authors of scientific articles deploy hedging strategies to secure the acceptance of their interpretations and circumvent objections from readers (Hyland, Hedging in Scientific Research Articles, "mTalking to the Academy," "Writing without Conviction?”) from readers. Hyland (1996) incorporates the fuzzy set theories of Zadeh and Lakoff, but emphasizes that in the genre of the scientific research article, hedging functions to demonstrate the care writers are taking to make precise, measured claims that are bidding for a consensus adoption. Thus, hedging in scientific articles represents a method by which writers bolster their claims by acknowledging the provisional nature of those claims in relation to existing disciplinary norms, institutions, and scholarship. Myers goes further to argue that hedging in scientific articles indexes the posing of new knowledge and is co-extensive with the act of claim-making ("The Pragmatics"). Correspondingly, doctrinal statements (non-claims) usually operate without hedging because the writer does not require the reader to suspend judgment or entertain novelty. Myers illustrates this in a comparative study of rhetoric of science textbooks and scientific articles ("Textbooks"). Myers finds that, unlike scientific journal articles, science textbooks rarely display hedging because, Myers argues, textbook production falls at the end of the scientific inquiry process, at which time research results and theories have gained ratification from the scientific community and hold a factive status. This paucity of hedging contributes to the goal of the science textbook as a vehicle for fundamental principles as opposed to the more argument-based scientific journal article, the claims of which are still under probation (Myers, "Textbooks")Whereas Myers and Hyland emphasize the sociological and rhetorical functioning and categorization of hedging in scientific articles, Salager-Meyer's examination of biomedical English prose focuses on the frequency distribution of hedges within the IMRAD model of the scientific research paper (149-170). Salager-Meyer's study reveals that hedging in scientific articles prevails in the Introduction and Discussion sections and is minimized in the Methods and Results sections. The logic for this is easy enough to infer. Introductory sections of scientific articles (and numerous other genres of academic writing) endeavor to create an exigency for the research program that follows. As a result, writers must justify their claims, problems, and methodology by engaging in the complex scholarly diplomacy discussed above--acknowledging the work of fellow researchers, demonstrating connections to existing literature, mitigating impositions on the reader, and qualifying the propositional content of claims while also justifying the relevance of the present study. The Discussion section of scientific articles obliges a similar rigamarole, though this one attuned to different priorities. In the Discussion section, writers often revisit justification moves as they reconcile their interpretations with their methods and results and gesture toward anticipated but largely provisional contributions. Conversely, the Methods and Results sections feature direct descriptions of procedures and the empirical outcomes of these procedures and show lower incidence of hedging. Given that replicability is a cornerstone value of scientific research, it is not hard to see why vagueness in details and/or circumlocutions would have limited use in these rhetorical zones.
It is important to note that once trained, the Hedge-O-Matic can be used more generally on prose writing of all genres. The Hedge-O-Matic produces descriptive results to augment other kinds of readings. The results will always require interpretation. We had an entertaining afternoon, for instance, running chapters of Jane Austen's Gosford Park to see if her famously indirect style, befitting a novel of manners, was indeed full of hedge moves. We were not disappointed.
How have we classified hedges for the Hedge-O-Matic?
The Hedge-O-Matic relies on a training set of hedgey and non-hedgey sentences culled from online science journals in two stages. For the first stage, Omizo hand-coded hedgey and non-hedgey sentences from 63 articles from the following journals:
- Plant Molecular Biology
- Poultry Science
- Applied Poultry Research
- Climate and Development
- Evolution and Human Behavior
- Biomedical and Environmental Sciences
- Geology
- Angewandte Chemie International Edition
- Chemical Review
From this initial training set we created a classifier with an 80% accuracy rate. We then used this classifier to automate the classification of 398 additional articles from the following Springer Open Access journals:
- Agricultural and Food Economics
- Carbon Balance and Management
- Crime Science
- Critical Ultrasound Journal
- Ecological Processes
- Energy Sustainability and Society
- Environmental Sciences Europe
- EPJ Data Science
- Fire Science Reviews
- Geoscience Letters
- Health Economics Review
- Intensive Care Medicine Experimental
- International Aquatic Research
- International Journal of Advanced Structural Engineering
- International Journal of Emergency Medicine
- Journal of Applied Volcanology
- Journal of Economic Structures
- Journal of Mathematics in Industry
- Journal of Modern Transportation
- Life Sciences, Society and Policy
- Nano Convergence
- Optical Nanoscopy
- Planetary Science
- Progress in Orthodontics
- Rice
- Translational Respiratory Medicine
- Visualization in Engineering
- Zoological Studies
Human raters were not involved in the creation of the above training set, save for inputing URLs into the screen scraping script. To filter out ambiguous results, we only retained those hedge sentences that possessed a .80 and higher confidence rating. Non-hedges were not extracted during this stage. These sentences were converted into a computational vector arrays and randomized. The end result is 2,121 hedge sentences and 2,121 non-hedge sentences. 20% of these sentences are reserved for testing. Thus, our training set includes 1,696 hedge and 1,696 non-hedge training sentences and 425 hedge and 425 non-hedge testing sentences.
Because the Hedge-O-Matic has been trained on articles from academic science journals, it has more reliable results when detecting propositional hedging, not interpersonal hedging, which is more common in conversational genres such as online discussion forums and email. We class sentences as hedges if they conform to the following 4 types:
Hedge Sentence Type | Example |
---|---|
Sentences that admit to limitations, incomplete knowledge or otherwise attempt to narrow the scope of claims | "It is less clear how it might be applied to broiler production, where welfare problems associated with the growth rate and skeletal weaknesses in birds will almost certainly require some form of genetic response" (Thompson 815).. |
Sentences that draw qualified conclusions/predictions or make qualified recommendations | "The present results suggest that homosexual preference could be found in animals, particularly in a situation where a high female fertility has been selected for" (Barthes, Godelle, and Raymond 161). |
Sentences that couch criticism in statements designed to mitigate personal offense | "Although the details of this argument are not supplied, it appears that traditional farming methods in which animals are permitted a range of movement and opportunities to express species-typical behaviors represent a kind of baseline for animal natures in Appleby’s view" (Thompson 817). |
Sentences that emphasize the necessity of statement in order to mitigate impositions on the reader. | "It should be noted that high ambient temperatures do not seem to impair female gametogenesis or lead to an early cell division of the zygote" (Sharifi,Horst, Simianer 2367) |
The underlined words and phrases in the Example cells of Table 1 indicate what we think of as triggers--surface-level features of the sentence that mark it as a hedge. One thing to note here is that even this small array of sentence types presents a complex interplay of rhetorical moves including the use of modal verbs, performatives, and clauses, from which, strict syntactic formulae are difficult to derive. In the section that follows, we unpack each example to give the reader a sense of how we have trained the Hedge-O-Matic to read for hedging in scientific articles.
- Example 1
- It is less clear how it might be applied to broiler production, where welfare problems associated with the growth rate and skeletal weaknesses in birds will almost certainly require some form of genetic response (Thompson 815).
Example 1 is extracted from an article focusing on the ethics of addressing the welfare needs of genetically engineered animals. Thus, the "it" in question refers specifically to the genetic manipulation of egg-laying hens. The initial subject-predicate clause limits the scope of the knowledge claim. The declaration that the case is "less clear" introduces uncertainty. Moreover, the modal "might" underscores the open quality of the argument, in which the use of eugenic techniques in egg production is still moot and not an established plan of action. The last subordinate clause of this sentence also enhances its hedgey-ness because it supplies a qualifying condition that complicates the discussion with more unknown parameters.
- Example 2
- The present results suggest that homosexual preference could be found in animals, particularly in a situation where a high female fertility has been selected for (Barthes, Godelle, and Raymond 161)
In Example 2, the verbs in the independent clause are what interests us. The verb "suggest" conveys what Meyer, drawing on speech act theory, describes as "weak illocutionary force" (28). In this case, the results in question evidence a particular signal that has informed judgment; however, the results might also be amenable to other judgments. Thus, verbs such as "suggest," "indicate," or "imply" tag the knowledge claim with a rhetorical likelihood value, whose function sharply contrasts with other means of presenting conclusions, which might include phrases such as "these results prove," "these results confirm," or "these findings establish."
We also see in Example 2 the use of a modal ("could be"). In this case, the use of the modal draws attention to the potential for homosexual preference to be found in animals as opposed to the facticity of homosexual preference in animals. Because modals verbs are used to describe conditional and/or subjunctive states, words such as "could," "might," "would," "may," and "should" are often the usual suspects of hedging because they convey indirectness and/or a deferral of validation (see Salager-Meyer 156). We should note here that not all modal usage indicates hedging. For example, a sentence such as "You can apply electronically or through mail" offers the reader two potential options for submitting an application. However, this is clearly not a hedge because both options are equally definitive. Thus, modal verbs such as "can" or "could" might simply be referring to multiple modes of ability. In a similar way, "should" can give a writer wiggle room to finesse outcomes in a sentence like "If you store eggs at 55 degree Fahrenheit, they should hatch," which is underwritten with the implication that the eggs might not hatch. However, a statement such as "People should care more about the environment and start recycling" is a forceful recommendation. Consequently, sentences that contain modals but refer to modals of ability to prescription have not been classes as hedges in our training set.
Other potential trigger verbs that may contribute to the sentence type in Example 2 are "appear" and "seem." One other example from the journal of Evolution and Human Behavior is, "A harsh, low-resource environment then appears to promote a strategy wherein women favor lower-quality but potentially higher-investing men for long-term relationships" (Little, DeBruine, and Jones 194). Rather than stating categorically that environments of want lead women to pursue relationships based on subsistence, Little, et al. highlight the inferential nature of their claim through the use of "appears," which, as a verb that connotes human sensing, calls attention to the authors' commonsensically limited positions as analysts. Furthermore, the verb "promote" identifies this behavior as a possible influence, not a necessary drive correlated with concrete outcomes.
Adverbs such as "likely," "possibly," and "evidently" also play a significant role in this sentence typology (indeed, adverbs in general are routine signs of hedging behavior). One example is, "While cues to the ability of acquiring resources may be varied, success in direct physical competition is likely partly related to physical strength and fitness" (Little, DeBruine, and Jones 194). In this sentence, the combination of "likely partly" presents two constraining moves. "Likely" limits the author's commitment to the conclusion by fixing the argument in the realm of the probable, not factual. "Partly" conveys a sense of quantity, suggesting that whatever role strength and fitness have in the attainment of resources, they are only two particular traits that inform a welter of causes.
- Example 3
- Although the details of this argument are not supplied, it appears that traditional farming methods in which animals are permitted a range of movement and opportunities to express species-typical behaviors represent a kind of baseline for animal natures in Appleby’s view (Thompson 817.
Example 3 is an academic version of a common interpersonal hedging move in which an agent attempts to modulate the pointedness of a criticism through indirection as a means to maintain the semblance of social graces. In an academic paper, interpersonal dynamics do not apply--at least, not in the way that they would in face-to-face interactions. However, as Hyland argues, the development and syndication of scientific knowledge within the sciences does not solely rely on the strength of proofs, but also on how these proofs are negotiated as knowledge claims within existing frameworks and through the norms of critique, review, falsifiability, replicability, and professionalism, which involve relating to others in the field of science (Hedging in Scientific Research Articles). Example 3 represents a clear bid to appear professional among a community of scholars who, if not sympathetic to Appleby, are enrolled in the same system of research and publication as he is, thus subject to the same forms of critique. In this case, the critique is not one of disagreement. Thompson is incorporating Appleby into his review of animal ethics, his interpretive work presumes to fill in the gaps he perceives in Appleby's argument in order to advance his own. In doing so, Thompson resorts to the type of trigger words, phrases, and moves we have already discussed. Thompson begins by qualifying the limited scope of his argument (which is limited because of purported omissions by Appleby) and then imputing a rationale to Appleby by words such as "appears" and "a kind of baseline." These characterizations outline the possible incompleteness of Appleby's argument. At the same time, it makes the gesture to the scientific community that his understanding of the source may be incomplete as well.
Fetzer offers a different rationale for conceptualizing Example 3--one that draws a harder distinction between the circumstances of interpersonal dialogue and interactions between a "textual system" of generic rules and contracts (49-72). Under Fetzer's rubric, Thompson's attempt to modulate his citation of Appleby reflects an attempt to achieve appropriateness under the textual system of scientific publication. The use the significantly fuzzy qualifier "kind of" signals the reduced scoped of Thompsons argument, which alerts readership that Thompson's claim may be inappropriate and, further, that he realizes it may be inappropriate (see Fetzer 49-54). Rhetorically-speaking, Thompson can thus skirt the line of discretion by demonstrating knowledge of the conventions of citation and critique while still tendering a potentially indelicate statement.
As one would expect, these types of hedging moves often coincide with references to other works and generally occur in a literature review. Consequently, they account for a much smaller portion of the training set, but they do occur.
- Example 4
- It should be noted that high ambient temperatures do not seem to impair female gametogenesis or lead to an early cell division of the zygote (Sharifi, Horst, and Simianer 2367)
The last formula in our coding scheme conforms to what Meyer has termed "necessity as an excuse" (31). Such statements frame actions that the author will take as necessary in order to obviate the burden that the writer presumes to place on the reader and gain his/her acceptance. For example, "At this juncture, it is worth quoting at length from Derrida." Or, "It is necessary now for me to reiterate my earlier point." One other trigger word/phrase to note in this formula is the modal "should," as in, "I should point out here" or "It should be underscored that." In these cases, "should" refers to an intractable condition; however, unlike the use of "should" as recommendation (e.g. "policies should account for the needs of indigenous populations"), the combination of "should" and an authorial move to sidestep reader contention renders such sentences hedging in our model.
One thing to also acknowledge here is that these categories do overlap, which is reasonable to expect given that hedging behaviors--especially those that seek to minimize personal or professional injury--involve a degree of convolution. Thus, Example 4 features both a hedge by "necessity as an excuse" and a hedge type found in Example 2 ("seem to impair"). The presence of either lexical device would define this sentence as a hedge in our model.
What sentences are not considered hedges by the Hedge-O-Matic?
Sentences that fall outside of the four types above are classified as non-hedges, with no other rhetorical inflection to distinguish between them. This lack of distinction can lead to a significant amount of noisy data, which does impact the performance of the Hedge-O-Matic.
One deliberate omission is the type of numerical hedging discussing by Dubois, Hyland (Hedgings, 54-56), and Salager-Meyer (154). This form of hedging involves the use of approximations when presenting numerical information and may include:
- Rounding - 5.98 converted to 6
- Ranges - "between 80 to 87%"
- Fractional estimates - "1/3 of all students"
- Use of qualifiers such as "about" or "around" - "The state saw about 10 inches of snow last month."
Dubois makes a convincing argument about the rhetorical effects of imprecise numbers in scientific discourse. One possible motivation is to mitigate the strength of previous research with allusions to vagueness while highlighting the precision of the author's current study. For example, "The prototype version of the Hedge-O-Matic, which used naive Bayes classification to assign hedge and non-hedge labels, was around 73 to 78% accurate; however, the current version, which uses support vector machine classification, has demonstrated a maximum accuracy of 86.1234%." In this case, the final float locks the reader to a specific figure that, in comparison with the previous blunt percentages, seems to crystallize the Hedge-O-Matic's improvement in efficiency.
In choosing to characterize these types of statements as non-hedges in our corpus of scientific articles, we are hewing to Grice's maxim of quantity (26), which generally states that communicators should not provide more information than is relevant for understanding given the constraints of a dialogue. One way to think of this is to ask, as we have, "Is it reasonable in an article about climate change to expect a series of sea-level measurements in floats in the body of the text?" In other words, is this glossing of statistics motivated by an attempt to make a proposition fuzzy or is it motivated by an effort to conserve page space and maintain reader attention? Dubois herself acknowledges that much of the numerical imprecision she observed was likely caused by the oral delivery of the information. We would contend that the same dynamics are at work in a written text. Moreover, many scientific articles that depend on numerical results feature tables and charts, which are not processed by the Hedge-O-Matic but contain precise representations of numerical data. Thus, instances of numerical imprecision at the sentence level could simply be short-hand references to these more solid pieces of data and not an attempt to mitigate the strength of a proposition. Skelton makes a similar point, noting that "there is a world of difference between the occasions when one might say, 'It got a bit hot this afternoon' and 'At 12:15 p.m. a temperature of 33.2 (correct to one decimal place) was noted,' but to describe one as either better or more certain is facile" (39).
Ultimately, the issue is not whether these types of statements are or are not hedges. The issue for us as we construct this model is whether or not such statements have a clear, performative function as hedges that can be inferred within the bounds of a given sentence (our unit of training and analysis). The answer was not always clear-cut. Thus, numerical imprecision does not factor into our model as a hedge. That said, performative verbs of estimation do factor into our training set. For example, if authors write, "We estimate that global temperatures will rise 1 degree a year for the next 25 years at current rates of C02 emissions," then the performative verb "estimate" calls attention to the propositional nature of the rates of temperature change predicted.
What should be foregrounded here is that underlying the algorithmic processes of the Hedge-O-Matic is a model of rhetorical action that attempts to attend to both potential judgments of human readers and the limitations of computer classifiers, which depend, at their core, on the frequency counts of words. This means that our model of hedgey and non-hedgey communication is narrowly construed and ignores aspects of cultural performance, temporality, and arrangement. It can often be the case that an author has devised a hedging strategy that extends through multiple sentences, some of which are not definitively hedgey by our model but contribute to a global hedginess. These larger gestures would not be captured by the model. There is also the inescapable fact that there are no definitive determinations for what counts as a hedge or what does not count as a hedge at the lexical or syntactic level (Clemen 235-248; Fraser 23). As Holmes demonstrates, a phrase such as "you know," which has been typically identified as a hedgey modifier, can also function as a "booster" phrase that emphasizes certainty (188). Holmes contrasts "you know you've heard it before" with "I'm the boss around here you know." The first statement reminds the listener of well-established point. The second statement, on the other hand, undercuts the declaration that the speaker is the boss. At the level of the particle, there is nothing to distinguish the first "you know" from the second, meaning that interpretation is reliant on other conditions.
The other inescapable fact is that humans are very good at parsing the nuance of hedges despite there not being regular formulae for their use. Computers are much less equipped for that task. That said, we do feel that the tabular and graphic views that the Hedge-O-Matic provides can allow readers to focus on zones in which hedges and non-hedges collaborate in larger rhetorical patterns (more on how we see the Hedge-O-Matic contributing to rhetorical analysis below).
What are Support Vector Machines?
Support vector machines (hereafter, SVMs) are a form of supervised machine learning that was established by Cortes and Vapnik (273-297). In supervised machine learning tasks, data that has been annotated by humans is sequestered into training and test sets. The training set is processed by an algorithm that extracts salient features and then use these features to build parameters for a classifier. The withheld test set is thenused to compare computer-assigned labels and human-assigned labels in order to evaluate the accuracy of the classifier. SVMs by taking the known, positive instances of a training set (for the sake of illustration, "Class -1") and the known, negative instances of a training set (for the same of illustration,"Class 1") and constructing a hyperplane or dividing line between them. The margins that buffer this boundary derive from the distance between the closest points in the positive and negative classes (Feldman and Sanger 76-78).
Figure 1 illustrates the principles of a linear SVM classifier on a toy data set comprised of 80 randomly generated points in two-dimensional space (i.e, each point contains two features that can be mapped to the x and y-axes of a graph). The first 40 points receive the "-1" label. The last 40 receive the "1" label. The classifier is trained on these 80 points and arrays them against the hyperplane boundary, here marked by the solid white line. The parallel dashed lines indicate the maximum margin separating one class from another. The square points in the example denote the support vectors that inform the hyperplane. Only a handful of training points, which are closest in proximity to the opposite class, constitute the support vectors. Those training points most distant from the opposite class play little role in the demarcation. What is interesting to note is that the support vectors create this line because they are the most undecidable vectors in the training set. Thus, the category distinctions between binary classes depend, in one sense, on a computational form of hedging.