Published in 'Raw data' is an Oxymoron edited by Lisa Gitelman
Is data modern? The answer depends on what one means by “data” and what one means by “modern.” The concept of data specific to electronic computing is evidently an artifact of the twentieth century, but the ideas underlying it and the use of the term are much older. In English, “data” was first used in the seventeenth century. Yet it is not wrong to associate the emergence of the concept and that of modernity. The rise of the concept in the seventeenth and eighteenth centuries is tightly linked to the development of modern concepts of knowledge and argumentation. And, though these concepts long predate twentieth-century innovations in information technology, they played a crucial role in opening the conceptual space for that technology.
After all, Priestley was an early innovator in the field we now call data graphics. His 1765 Chart of Biography is a great achievement in this field, an engraved double-folio diagram displaying the lives of about two thousand famous historical figures on a measured grid. It was one of the earliest works to employ the conventions of linearity and regularity now common in historical timelines and the most important work of its kind published in the eighteenth century.
But all of this was new when Priestley published his charts, and the aggregate views they offered were regarded as an important and novel contribution to both social and natural science. Indeed, it is the Chart of Biography, not an achievement in experimental science that is named on Priestley’s document of induction to the Royal Society. Later writers such as the political economist William Playfair, who debuted early versions of the line graph and bar chart in his 1786 Commercial and Political Atlas, credited Priestley for his innovative work in this area, too.
(...) Technical historical practice during the early modern period involved accommodation of historical facts to scriptural data in order to make the unknown known. Some of the most heroic efforts of this sort took place in the realm of chronology, especially in efforts to correlate European and non-European historiographical traditions. Ancient records of comets and other astronomical phenomena that posed interpretive problems for histories based on scripture provide other examples. And it is notable that chronology is one of the fields in which the English word “data” flourished earliest.
(...) It is crucial to observe that the term “data” serves a different rhetorical and conceptual function than do sister terms such as “facts” and “evidence.” To put it more precisely, in contrast to these other terms, the semantic function of data is specifically rhetorical. The question then is: what makes the concept of data a good candidate for something we would not want to deconstruct? Understanding this requires understanding what makes data different from other, closely related conceptual entities, where data came from, and how it carved out a distinctive domain within a larger conceptual and discursive sphere. So, what was data prior to the twentieth century? And how did it acquire its preanalytical, pre-factual status? In this, etymology is a good starting point.
The word “data” comes to English from Latin. It is the plural of the Latin word datum, which itself is the neuter past participle of the verb dare, to give. A “datum” in English, then, is something given in an argument, something taken for granted. This is in contrast to “fact,” which derives from the neuter past participle of the Latin verb facere, to do, whence we have the English word “fact,” for that which was done, occurred, or exists. The etymology of “data” also contrasts with that of “evidence,” from the Latin verb vidēre, to see. There are important distinctions here: facts are ontological, evidence is epistemological, data is rhetorical. A datum may also be a fact, just as a fact may be evidence. But, from its first vernacular formulation, the existence of a datum has been independent of any consideration of corresponding ontological truth. When a fact is proven false, it ceases to be a fact. False data is data nonetheless.
In English, “data” is a fairly recent word, though not as recent as one might guess. The earliest use of the term discovered by the Oxford English Dictionary occurs in a 1646 theological tract that refers to “a heap of data.” It is notable that this first OED citation is to the plural, “data,” rather than the singular, “datum.” While “datum,” too, appeared in seventeenth-century English, its usage then, as now, was limited — so limited, that in contrast to the well-accepted usage of the plural form, some critics have doubted whether the Latin datum was ever naturalized to English at all.
“Data” did not move from Latin to English without comment. Already in the eighteenth century, stylists argued over whether the word was singular or plural, and whether a foreign word of its ilk belonged in English at all. In Latin, data, is always plural, but in English, even in the eighteenth century, common usage has allowed “data” to function either as a plural or as a collective singular.
[I]t seems preferable in modern English to allow context to determine whether the term should be treated as a plural or as a collective singular, since the connotations are different. When referring to individual bits or varieties of data and contrasting them among one another, it may be sensible to favor the plural as in “these data are not all equally reliable” ; whereas, when referring to data as one mass, it may be better to use the singular as in “this data is reliable.” According to Steven Pinker, in English today, the latter usage has become usual. The fact that a standard English dictionary defines a “datum” as a “piece of information,” a fragment of another linguistically complex mass noun, further strengthens this intuition.
In these early years, the term “data” was still employed, especially in the realm of mathematics, where it retained the technical sense that it has in Euclid, as quantities given in mathematical problems, as opposed to the quaesita, or quantities sought, and in the realm of theology, where it referred to scriptural truths — whether principles or facts — that were given by God and therefore not susceptible to questioning. In the seventeenth century, in theology, one could already speak of “historical data,” but “historical data” referred to precisely the sorts of information that were outside of the realm of the empirical. These were the God-given facts and principles that grounded the historian’s ability to determine the quaesita of history.
In seventeenth-century philosophy and natural philosophy, just as in mathematics and theology, the term “data” functioned to identify that category of facts and principles that were, by agreement, beyond argument. In different contexts, such agreement might be based on a concept of self-evident truth, as in the case of biblical data, or on simple argumentative convenience as in the case of algebra, given X = 3, and so forth. The term “data” itself implied no ontological claim. In mathematics, theology, and every other realm in which the term was used, “data” was something given by the conventions of argument. Whether these conventions were factual, counter-factual, or arbitrary had no bearing on the status of givens as data.
First: the word “data” entered the English language in the seventeenth century and was naturalized in the eighteenth. There are a number of different sources of evidence for this, and the evidence is unambiguous. The data derived from the ECCO database shows a substantial increase in usage of the term during the eighteenth century. The number of books in which the English word “data” appears rises from 34 in the first decade of the century to 885 in the last decade, and the number of books in which “data” appears rises relative to the total number of books included in ECCO for that decade, from 0.3 percent of the total in the first decade to 3 percent of the total in the last. While this tenfold increase in relative frequency did not make data a common word, it did make it familiar. At the beginning of the century, the term “data” was italicized in the vast majority — 88 percent — of cases, an indication that the word was still considered a Latin loan. By the end of the century, “data” was italicized in only 19 percent of cases. These two trends strongly reinforce one another.
Second: the term “data” came into English in the early eighteenth century principally through discussions of mathematics and theology, roughly 70 percent of instances. At century’s end, mathematics and religion accounted for only about 20 percent of total instances, which were now dominated by empirical contexts such as those of medicine, finance, natural history, and geography.
Third: over the course of the eighteenth century, the main connotations of the term “data” shifted. At the beginning of the century, “data” was especially used to refer either to principles accepted as a basis of argument or to facts gleaned from scripture that were unavailable to questioning. By the end of the century, the term was most commonly used to refer to facts in evidence determined by experiment, experience, or collection. It had become usual to think of data as the result of an investigation rather than its premise. While this semantic inversion did not produce the twentieth-century meaning of data, it did make it possible. Still today we think of data as a premise for argument; however, our principal notion of data as information in numerical form relies on the late eighteenth-century development.
Seeing that “data” became much more commonly used during the eighteenth century, why did it take until the twentieth century for the term to become truly ubiquitous? It is clear that the fundamental semantic structure of the term “data” essential to the modern usage was settled by about 1750.
It appears, however, that while the newly outfitted term responded to and exemplified the epistemological perspective of the mid-eighteenth century, the term also was not fully required by it. Moreover, for all of the scientific achievements of the nineteenth century, the term “data” was still not of broad cultural importance. In effect, after its invention, the term went through a period of cultural latency. Though its usage expanded constantly within certain domains, throughout this period it played only a small role in the general culture. Ironically, this long period of latency may partly account for the great usefulness of the term in the twentieth century. In the twentieth century, when “data” reached its point of statistical takeoff, it was already a well-established concept, but it remained largely without connotative baggage. The arrival of computer technology and information theory gave new relevance to the base concept of data as established in the eighteenth century. At the same time, because the term was still relatively uncommon, it was adaptable to new associations.
Curiously, the preexisting semantic structure of the term “data” made it especially flexible in these shifting epistemological and semantic contexts. Without changing meaning, during the eighteenth century data changed connotation. It went from being reflexively associated with those things that are outside of any possible process of discovery to being the very paradigm of what one seeks through experiment and observation.
It is tempting to want to give data an essence, to define what exact kind of fact data is. But this misses the most important aspect of the term, and it obscures why the term became so useful in the mid-twentieth century. Data has no truth. Even today, when we speak of data, we make no assumptions at all about veracity. Electronic data, like the data of the early modern period, is given. It may be that the data we collect and transmit has no relation to truth or reality whatsoever beyond the reality that data helps us to construct. This fact is essential to our current usage. It was no less so in the early modern period; but in our age of communication, it is this rhetorical aspect of the term “data” that has made it indispensable.