购买
下载掌阅APP,畅读海量书库
立即打开
畅读海量书库
扫码下载掌阅APP

Unit 2
Data Journalism

1 Understanding Data Journalism

What, exactly, does data journalism mean in an age of open data portals, dazzling visualizations and freedom of information battles around the world? A dictionary definition of the two words doesn’t help much—put together, it suggests that data journalism is an occupation of producing news made up of facts or information. Data journalism has come to mean virtually any act of journalism that touches electronically held records and statistics—in other words, virtually all of journalism.

That’s why a lot of people in the field don’t think of themselves as data journalists—they’re more likely to consider themselves explanatory writers, graphic or visual journalists, audience analysts, or news application developers—all more precise names for the many tribes of this growing field. That’s not enough, so add in anything in a newsroom that requires the use of numbers, or anything that requires computer programming. What was once a garage band has now grown big enough to make up an orchestra.

Taxonomies of different branches of data journalism can help students and practitioners clarify their career preferences and the skills needed to make them successful. These different ways of doing data journalism are presented here in an approximate chronology of the development of the field.

Empirical journalism, or data in service of stories

Maurice Tamman of Reuters coined the term “empirical journalism”as a way to combine two data journalism traditions. Precision journalism, developed in the 1960s by Philip Meyer, sought to use social science methods in stories. His work ranged from conducting a survey of rioters in Detroit to directing the data collection and analysis of an investigation into racial bias in Philadelphia courts. He laid the groundwork for investigations for a generation. Empirical journalism can also encompass what became known as computer-assisted reporting in the 1990s, a genre led by Eliot Jaspin in Providence, Rhode Island. In this branch, reporters seek out documentary evidence in electronic form—or create it when they must—to investigate a tip or a story idea.

More recently, these reporters have begun using artificial intelligence and machine learning to assist in finding or simplifying story development. They can be used to help answer simple questions, or to identify difficult patterns.

These reporters are almost pure newsgatherers—their goal is not to produce a visualization nor to tell stories with data. Instead, they use records to explore a potential story. Their work is integral to the reporting project, often driving the development of an investigation. They are usually less involved in the presentation aspects of a story.

Arguably the newest entry into this world of “data journalism”could be the growing impact of visual and open-source investigations worldwide. This genre, which derives from intelligence and human rights research, expands our notion of “data”into videos, crowdsourced social media and other digital artefacts. While it’s less dependent on coding, it fits solidly in the tradition of data journalism by uncovering—through original research—what others would like to hold secret.

Data visualization

Looking at the winners of the international Data Journalism Awards would lead a reader to think that visualization is the key to any data journalism. If statistics are currency, visualization is the price of admission to the club. Visualizations can be an important part of a data journalist’s toolbox. But they require a toolkit that comes from the design and art world as much as the data, statistics and reporting worlds. Alberto Cairo, one of the most famous visual journalists working in academia today, came from the infographics world of magazines and newspapers. His work focuses on telling stories through visualization—a storytelling role as much as a newsgathering one.

Words & Expressions

portal:大门;门户网站

orchestra:管弦乐队

taxonomy:分类学

practitioner:从业人员;专门人才

chronology:年表

empirical:以实验为依据的

2 Power of Data Journalism

Data journalism is the analysis of statistics to numerically justify stories and make predictions. To some degree, journalists have used data since the start of its widespread accessibility; some of the first examples of computer-analyzed data-based stories come from Harvard’s Nieman Foundation in the 1960s. The increased volume of polls that became publicly accessible with the development of the internet in the early 2000s made data journalism more mainstream.

Among the first sites that began focusing on data journalism was RealClearPolitics (RCP), whose purpose was to collate polls along with interesting political editorials to allow the public to find both forms of political information on one site. RCP’s first foray into true data analysis was the development of the RCP polling aggregate, which summarized all the publically available polls by reporting their median. This simple statistical analysis took the first step towards eliminating possible polling biases and allowing the public to gain a better perception to the true state of the race. However, data-based predictions using these techniques in the 2004 presidential elections were not particularly successful compared to their non-data-based counterparts.

In 2008, Nate Silver, a relatively unknown baseball statistician, correctly predicted every Senate race and all but one state in the presidential election. He accomplished this by neither physically reporting from the ground nor by using some esoteric technique of political science. Instead, he used basic statistics to analyze the large volume of polls available and predict an outcome. The message was clear: data-based electoral predictions appeared to be significantly more accurate than predictions based on traditional political science. Since then, data journalism has become increasingly popular and has made analysis of elections and other issues more accurate and quantitative.

FiveThirtyEight and other sites, such as Vox.com, have attempted to apply data to other forms of journalism beyond electoral and sports predictions. Common applications include testing conventional wisdom in political science such as the opposition party gaining seats during midterms and the party affiliation of a state fully determining Congressional races, and looking at the effects of certain media-hyped events on the campaign. Another trend is the increasing use of journalist-created or crowd-sourced datasets. More recently, data journalism has been extended by the relaunch of FiveThirtyEight at ESPN to analyze the benefits of college, nutritional guidelines, and restaurant rating systems. Nate Silver is pushing the limits of data journalism and appears to be outcompeting traditional journalism across all fields.

However, data is a double-edged sword. It is wrought with difficulties. Most journalists are not trained statisticians and don’t know how to interpret accurately the probabilistic nature of data nor do they know how to deal with models with seemingly contradictory conclusions. The few are publishing their predictions and analyses on sites that aren’t mainstream. More importantly, journalism is not yet fully aware of the latent limits of data-based reporting. As powerful a tool as data is, it is also easy to misuse. Many prominent media outlets such as The New York Times unintentionally misreport data predictions when they report to the general public. As a result, data is often a double-edged sword: it can help improve the public’s awareness of the world around them, but it can also dramatically mislead the public.

The first, most common, pitfall is that data is inherently probabilistic. Predictions are not reported as certainties or facts; they have associated probabilities, which in more advanced analysis have their own error terms. These probabilities arise from a variety of sources: sample sizes and random error of polls, polling biases, potential flaws in the method, etc. The probabilistic nature of data is not respected by many publications.

The second, more subtle, pitfall is that while data is objective, any analysis of data must be subjective. The increasing volume of data available has highlighted a significant problem for data journalists: it is possible to find data saying almost anything. Data analysis must be performed to determine what the “truth”is or to make predictions, but this analysis has assumptions built into the model. At best, journalists should synthesize all the scientifically accurate models and report their results with as little bias as possible; this ensures that the public receives an accurate perspective of what the election would look like. However, a major effect is that many organizations only look at data that supports their own views or perform data analyses with assumptions that lead to favorable results.

Then how far can data go? When correctly done, data journalism can seem incredibly powerful. Probabilistic predictions of virtually anything are possible with sufficiently complicated models and complete information. One of the most powerful applications of data journalism evaluates the effects of a particular action, hypothetical or real, on public opinion or on the state of the campaign. It is virtually impossible to compete with data by using traditional means; in the time a traditional journalist can identify a potential trend by interviewing five people, a data journalist can analyze statistics relating to five million people and make generalizations about the whole country. But even the most powerful and most accurate data journalist is limited. Data can only analyze or confirm trends that are observed on the ground. Data cannot identify movements of people, nor can it explain how people think or why they think the way they do. It can only generate hypotheses about how people will behave and make models that analyze whether those hypotheses are true.

The attempted overuse and extended misuse of data today is not just limited to journalism. Society is approaching a cult of scientism, where all aspects of our lives are distilled into numeric calculations and decisions are made based on calculations. This approach is incredibly powerful, as numbers are more informed and more objective than a person’s feelings can be. However, people must be aware of the limitations of data: any model inherently introduces its own assumptions, which must be tested, and no model can understand certain aspects of an issue. Data will be a ubiquitous part of our lives, just as it will be a ubiquitous part of journalism. It remains to be seen whether it will be a boon or a hindrance for us.

Words & Expressions

foray into:涉足

quantitative:量化的

probabilistic nature:或然性

hypothetical:假设的

ubiquitous:无所不在的

3 The Function of Data Journalism

The definition of data journalism is both painfully simple and frustratingly vague. In his Tow Center paper, Alex Howard offered a detailed definition for data journalism: “gathering, cleaning, organizing, analyzing, visualizing and publishing data to support the creation of acts of journalism.”

Those who practice it do tend to agree on one principle: data journalism is, first and foremost, journalism. It simply uses data as a source in addition to humans.

We tend to delineate a few categories, each with its own skills and job descriptions. While these may vary or overlap depending on who you’re talking to, they tend to fall roughly along these lines:

Acquisition: Getting data, whether that means scraping a website, downloading a spreadsheet, filing a public records request or some other means.

Analysis: Doing calculations or other manipulations on data you’ve got, to look for patterns, stories or clues.

Presentation: Publishing data in an informative and engaging way. Infographics, news apps and web design are all examples of this.

Not all of these categories might fit a strict definition of reporting, but they all do constitute journalism, said Sarah Cohen, who leads a data team at The New York Times . Even news app developers who spend their days writing code are journalists, Cohen says, because they’re writing code in order to explain and communicate information to the public. The necessary skill for a data journalist is journalism and some interest in data.

While data brings its own challenges, it also offers some opportunities that are impossible or harder to get at in more traditional forms of reporting.

Data allows journalists to more authoritatively verify claims

The clearest advantage data has over other sources is that it’s fact. It’s an actual counted number of fatalities, for instance, or tax dollars or potholes. There’s not as much need to rely on anecdotal evidence when you have the real evidence in front of you.

Take a story by the Associated Press from earlier this year, which used a congressman’s Instagram account as a source for an investigation. This particular politician had been taking flights on his donors’private jets, and billing the public for it, suggesting an overly cozy or even illicit relationship with his top donors.

The reporters had found the scoop by comparing the location data on his Instagram posts to public data on flight records. These days, anything can be data.

Data allows journalists to tackle bigger stories

With data, size no longer matters: reporters can easily get ahold of information ranging from granular to the global. It might be just as easy to get budgets for every county in the state as it is to get it for just your county, opening up a wealth of new possibilities for exploration.

This capacity gives newsrooms an “investigative edge”they wouldn’t have otherwise, especially small or medium-sized newsrooms. Back in 1992, Steve Doig, a reporter at the Miami Herald , had to examine millions of building code inspections using a computer program called SAS. His investigation revealed the state had been extraordinarily lax about its inspections. Such a monumental task would have been impossible if his team of reporters had to work only with the inspection reports on paper.

Data makes it easier to find new stories

With data, reporters can suss out patterns and follow up on leads in a way they can’t with verbal stories or anecdotes. While reporters should still use their journalistic judgment, data offers a view that doesn’t lean so heavily on instinct or personal judgment. Computers are great when it comes to discovering things faster or discovering things you didn’t expect.

Data enables journalists to better illuminate murky issues

Data can also support or oppose an existing claim, or theory, or even an urban legend. Like Steve Doig, a Miami Herald reporter who found a clear connection between building inspections and hurricane damage, The Guardian used hard numbers to clear up what had been an issue of finger pointing. “The core of data journalism, on at least the analysis end, is looking for patterns,”Doig said. “The patterns are going to be what tells the story.”

Just as data can illuminate a murky social issue, it can also quantify it, which contributes valuable information to the social discourse.

In 1989, even before Doig was doing his hurricane investigation, reporters in Atlanta were trying to investigate rumors of racial discrimination in bank loans. Using six years’worth of lender reports, the Atlanta Journal-Constitution was able to show African-Americans were denied bank loans at rates far exceeding those for whites. The paper became one of the first to win a Pulitzer for an investigation using data.

The Atlanta reporters already had anecdotes about racial discrimination, Doig said, but the data allowed them to go beyond that and establish clear patterns—even illuminating the quantity and scale of the problem.

Data can offer detail and distance

Data allows more capacity for showing the ‘near’and ‘far’view of a topic. In other times, a man on the street interview would be the ‘near’and an expert interview would be the ‘far.’There’s not so much need to rely solely on expert testimony when data can provide the ‘far’or ‘macro’view more precisely.

On the other hand, the scale of the data itself can be overwhelming for the audience. While data on every police force in the United States can offer a “far”view for a story, no reader is actually going to sift through all that information if it’s put in front of them. But the web allows them to “look at their own ‘near’”.

Data offers the potential to be more transparent

At the same time, there may be a reason to share a huge data set with an audience. Data sources and web technology have made it possible for journalists to be transparent as they have never been before. Reporters can even share how they reached their conclusions, or allow readers to come to their own. “Transparency is the new objectivity”became a saying among journalists.

“Outside of the realm of science, objectivity is discredited these days as anything but an aspiration,”Blogger David Weinberger wrote. “If you don’t think objectivity is possible, then presenting information as objective means hiding the biases that inevitably are there. It’d be more accurate and truthful to acknowledge those biases, so that readers can account for them in what they read.”

Data can make reporting more efficient

Reporters frequently collect information from the same sources over and over again: building permits, police reports, census surveys. Obtaining and organizing this information can be made infinitely more efficient, even totally automatic, by keying in to the data behind the reports.

Derek Willis, a developer at ProPublica, found himself constantly checking the Federal Election Commission’s website for new campaign filings. He automated this process, bit by bit, until he had a program that checked for new filings every 15 minutes, and alerted him to interesting ones. “I don’t miss a thing,”he said.

A little programming knowledge had made Willis’task not only more accurate and efficient, but freed up his time for other reporting tasks.

Words & Expressions

constitute:构成

pothole:坑洼;岩石中的溶洞

fatality:死亡

granular:由颗粒构成的

murky:浑浊的;阴暗的

testimony:证词 X3EJdoOkmaSJ+fnQWLcnx1qEasuBjr4HDCHgnTspRMTAFWWXj7FolFdIdN8JApue

点击中间区域
呼出菜单
上一章
目录
下一章
×