Pam Baker is a freelance journalist on the big data beat. She contributes to PC Magazine and a variety of tech media outlets. Her session at EIJ 2017 focused on rapid response protocol in news rooms when dealing with continuously flowing data to find relevant background and contextual information.
Baker shared her knowledge for aspiring data journalists looking to get into the field.
EIJ: In the simplest terms, what is data journalism?
Pam Baker: Data journalism is the evolution of journalism. It is the same thing we have done since journalism existed with is taking information and pulling out of that, a story, worthy of the public’s knowledge so we take information and we make a news story that the public can use. Data journalism is just taking that to more extreme steps where we can consider more information than we could before through new technologies.
How important is it for journalists to understand data journalism?
It’s vital because journalism is getting ready to make a change – a disruptive change – and if you don’t have the data skills, you won’t be able to do any reporting and you won’t be able to compete. You might be able to do an occasional niche story that you just happen to be in the right place at the right time [for], but if you don’t have those skills then you’re not going to make it.
What do you say to someone who wants to get a start in data journalism but doesn’t really know where to start?
The best thing to do is to just dive in. So look at the tools available through journalism toolbox. Look at PC Magazine at my reviews of those tools. I’m not just touting my own work (it may sound like that but I’m not!) It’s just a good place to look for self-service applications that you can use and then just pull any public data set from Amazon Web Services or Google Public Data listed with mine and Wayne Rash’s talk with EIJ Conference. Any of those sources. Pull any public data and with a lot of these apps – it’s just drag and drop. It’s that easy. So you pull the data you choose to use and just run it, start asking questions and get your hands dirty.
If you get familiar with using data and putting it in and get familiar with the tools and the same tool does not fit everyone well. Some are very user-friendly which some advanced people find to be a pain but if you’re not used to data journalism and you’re just starting then you won’t think user-friendly. So choose the app that fits you best. Choose the public dataset, pull it into that app. A lot of it is just drag and drop. That’s all there is to it. Then ask questions of it. Maybe you pull environmental data from the [Environmental Protection Agency] or from Google Earth or whatever data you want to – health trends or environmental and anything and ask good questions through that app until you get comfortable with it. And the next time you have a story and you’re under deadline – it’s so familiar that you can do it quickly and you know where it is. That’s the best way – to just jump in.
What ethical issues do journalists and editors need to consider when working with data?
Okay, while the tools are very sophisticated – the [business intelligence] apps and the analytics are very sophisticated, very good tools. There is still a lot of room for user error. If you don’t have the math skills and the like then you may have the wrong interpretation and you could hurt someone’s career or businesses’ value by putting the wrong thing. So, accuracy, as it has always is in journalism, is the number one concern and that’s both from an ethical point of view and the value of the reader point of view.
Can you give an example of how useful data journalism is becoming more useful but also where it might be faulty?
Look at the election and the polls. That’s data. Polling and polls and everyone thought Hillary Clinton would win and Trump would lose – and everyone screamed at the end of it: “How could the data be so wrong!?” Same thing happened with Google Flu Trends. How is it that the [Center for Disease Control] data was right and Google Flu Trends was wrong? You know Google is very strong in big data and data tools. Well, the answer to that is the same thing that has always been true throughout all of mankind is that every set of information is part of the story and not the whole story. That’s good news. That means that there’s always a role for journalists to make sense of the data. So where that particular journalist is strongest is in finding the real story and doing the homework to validate the information and where they’re weakest is where the journalist the weakest in using the tools.