Seeing Data: A Conversation on Visualization
A conversation with Betsy Mason and Enrico Bertini.
March 4, 2020
NYU Arthur L. Carter Journalism Institute
20 Cooper Square
7th Floor Commons
New York, NY 10003
Kavli Conversations are hosted by NYU’s Science, Health and Environmental Reporting Program with support from the Kavli Foundation. Events are open to the public. Webcast will begin at 6:30pm.
>>Welcome to the Kavli conversations on science and communications. Our aim here, for those of you who have not joined us before, is to bring together leading science journalist and eminent scientist to talk about how science communication is changing and how we can do it just that much better, how we can get accurate and timely information about new research to the general public, who these days needs it more than ever. Just to look ahead for a moment, this is the first in our spring series. On March 25 we’ll be digging into the coverage of the recent vaping epidemic, so we’re throwing around a lot these days, with David Downs, a remarkable reporter, and with Michael Segal from Boston University on April 8, we’ll be exploring the gender mythology of testosterone with researcher Rebecca Jordan young, co-author of a new book and an authorized biography and a remarkable award winning journalist who writes about gender issues and sports medicine and other related issues. On April 21, direct Dan if anythingan will hold a special conversation with a renowned climate scientist, Katherine and New York Times climate reporter Brad plumber. But tonight what we’re looking at, if I may say so, is the big picture of big data. At best, at its best, science journalism is the artful literature of fact, and to make those facts engaging, we list all of our senses in our stories, sounds, smell, sight, touch and hearing and technology, of course, extends those senses into new realms. By harnessing the sensory data of social media, senseors, satellites, genetic testing, smartphones, and in this sense, big data has become brush strokes in an emerging portrait of human behavior and science on which we report. So I’d like to talk this evening about the ways data can enrich the vocabulary of storytelling by giving vivid form to ideas, trends, alternative futures and the invisible geography of the places that we inhabit. We’re joined in this endeavor this evening, from San Francisco, by journalist Betsy Mason, an award-winning freelance writer and editor who specializes in science and cart cartography, among other things. Her work includes in National Geographic, Science, WIRED, scientific American. She co-wrote a cartography blog at National Geographic and at WIRED where she founded the WIRED science blog’s network. She won the American geophysical union David Perlman award for her coverage of the earthquake risk in the bay area, she’s secretary of the council for the advancement of science writing and she’s co-author of a remarkable, lavishly illustrated book, All Over the Map: A Cartographic Odyssey. And from the science side, I introduced Enrico Bertini. Enrico was assistant professor at the NASA Tandon School of Engineering in the department of computer science and engineering, and his research focuses on the study of effective data visualization methods and how to communicate complex ideas effectively through visual data presentation. Now, he also teaches an online specialization and information visualization course for those of you who are interested and he’s co-host of a well-regard podcast on data visualization called data stories. He also writes for medium and he has often collaborated with other journalists on data analysis projects. Thank you both for joining us. Let’s start this way. To be sure, the way we visualize data shapes how we perceive the world. This week, who among us does not look and visualize and conceive of American politics through the geography of red and blue states, or think about the coronavirus in any terms except the spreading stains we constantly see projected on to global maps? So for science journalists, data visualization I think is an especially tricky matter. We not only make our own graphic mistakes, perhaps, as infrequently as possible, maybe, but because we report on published reports of other scientific findings, we’re also running the risk of being misled by the mistakes that scientists might make in turning their own data into charts and graphs. So their mistakes compound ours. Betsy, science is littered with bad data visualizations? You’re nodding.
>>Yeah. I’m hoping there aren’t a lot of scientists watching this. That’s extremely problematic for the science and also for science journalists who are looking at those visualizations.
>>What kinds of things are we talking about?
>>I think visualizations are often, for a lot of scientists, the very last thing they do, and they think of it as just some pretty things they need to have in the paper that they’re presenting. Of course, that’s some scientists. Other scientists do an excellent job. If you’re not putting a lot of thought into it, you’re going to make mistakes for sure. There are so many ways to make mistakes. I think one of the best or easiest ways to make a mistake is with color. Scientists tend to misuse color all the time, in ways that can even mislead themselves about what is in their actual data.
>>Do you have an example of that?
>>I can tell you about an example.
>>This is, of course, third hand from a climate scientist who told me that there was a scientist who was looking at his data using a rainbow color scale, which we’ve all seen. You see them for weather maps and for it feels like every scientific map visualization there is. And because of the way the spectrum is not visually consistent, the change isn’t consistent through the spectrum, it can appear there are sharp divisions in places where there are not. And so this scientist was interpreting a sharp division of sort of a front in his data that wasn’t there and published a paper on it, I think, or maybe just presented it at a conference. But it just wasn’t there there. The divisions between some colors look sharp while others look gradual and it doesn’t represent the underlying data appropriately.
>>But scientific journals have standards?
>>Yeah, and the rainbow color map is standard. And that’s part of the problem. It’s very difficult to move scientists off from something that they’ve been doing for a long time. And there’s good reason for that. If you have consistency through time with the way that you do science or visualize science, then you can compare things more easily. But if you’re talking about the rainbow, every single color scale is representing data wrong in a different way. So the comparisons really aren’t that valid and it’s just, I think, a habit that’s hard to break.
>>So I guess I see this called categorized as chart crimes. I see a lot of literature talking about the chart crimes that journalists are routinely committing with data scales or cherry-picking data to make a point or whatever. Enrico, I wonder, from your standpoint now, what is it that journalists do when they approach turning data into visualizations that makes you particularly crazy?
>>Can I say it?
>>To a roomful.
>>I hope so, because otherwise we’ll have a long silence for the rest of the evening.
>>Honestly, I don’t think the problem is exclusively on the visualization side of things. Often the problem in journalism, as well as many other endeavors, people design to convey something they already have in mind. They already have a preconceived message and they try to create the visualization that best — find the data and the message and the visualization that best conveys the person’s view of the world. And I think we have this idea that when we are visualizing data or providing numbers that come from data, they are close to the truth, and I don’t think that’s that true.
>>There are so many ways in which data can be used used, even without doing any crazy manipulation. It’s easy to select the data that tells the story you want to tell.
>>But isn’t that the point?
>>It’s a Catch-22.
>>I mean it’s true that we come, as readers or viewers, come to the graphic with the assumption that what we’re seeing is almost certainly true. But if I’m hearing you correctly, what you’re suggesting is that actually no, a graphic is an argument.
>>That the use of data — explain that. What do you mean?
>>What I mean is that I think when — whenever you try to present something, whenever you’re trying to present something that is based on data, you have two options. With the constraints that you have in journalism, my sense is that you can’t really go too deep and present all the caveats that are behind a given scientific piece of scientific knowledge or experiment or data that has been acquired from some source. So because of that, you just can’t explain everything in detail. But if we look at how good scientists look at data, I think they look at data with as much skepticism as possible. And I think one problem is that in journalism it’s hard to write an article when you’re making a point and then saying: Oh, by the way, I’m not that sure about this point.
>>This is interesting. I thought we’re going to talk about what color bar should be on a chart, although I guess there aren’t any bars in that kind of chart. What I’m hearing is this process starts long before anybody puts pen to pencil or Apple pen to iPad, that the underlying data is driving all of this. Betsy, what from your standpoint — I know you’re intensely interested in mapping and in maps, but what do you look for in a data set? What makes a data set appealing visually? Potentially?
>>Oh, that’s a tough question to answer. Of course, if I’m looking at a data set, I’d want one that’s already clean.
>>What do you mean, already clean?
>>Anybody who has dealt with any data set knows you have to spend far too much time cleaning up the data, getting rid of different artifacts and empty cells and things that are just — periods in the wrong place, all that sort of stuff. But I guess we were talking earlier about whether just because you can make a map or chart with data, if you should. And I think that mistake of making a visualization when one is not necessary, it’s not going to add anything to the story and it could potentially be misleading is made quite a bit. So if you have geographic data, the tendency is to think: Great, I can make a map. But if the point of your story or of the visualization isn’t geographic location, then that’s just not a helpful visualization and it height be difficult for the reader to actually get the information out of it that you want them to.
>>So from your standpoint, then, what is the purpose of a visualization? To inspire? Explain? Analyze? Educate? Wow?
>>I had you at wow?
>>All of that. I think there are generally two things you can do with data visualization. There’s analysis and exploration, where you’re looking for things in the data, you’re using it as a reporting tool or science is exploring the data to see what might be in there, asking questions of the data. And then there’s the visualization presentation side to show a result that you have or a point you’re trying to make. And everything in between.
>>Enrico, from your standpoint, what do you look for in a data set? What makes it appropriate for a visualization? I mean, are all numbers equal?
>>I think I really like what you said. In visualization, you can either use it to present or you can use it to understand something better yourself. And I think what we are looking in a visualization is different according to whether our goal is to explore and analyze or communicate to others. I think, personally, when I think about the visualization that are used for communication, I think the goal is to make something as clear as possible. So one way I like to think about it is when I ask myself and I typically tell my students to think about it is what is the question that you want people to be able to answer when they look at a visualization? Make it explicit. What are these questions? And then if you can — I think this is a powerful tool because then you can look at your visualization and say: Does it answer the question question?
>>Can you give me an example? This is sounding very theoretical.
>>Sure. I have to think about it for a moment.
>>That’s permitted. You can think.
>>It’s OK. I don’t know. In my course at NASA, we use data sets coming from New York City, from the open data portal, and there’s a common one where there’s a restaurant inspection. This is data that is continuously updated about inspections that are performed in New York City restaurants. Say that I want to answer: Do different types of cuisines or restaurants have different violations? Now, I ask my students to create a visualization that answers this question, and once they create — there are so many ways to create visualizations that answer this question question. You can come up with 10, 20 data visualization on the same data, answering the same question. Now, which one is best? I think it’s the one that answers — that if you show it to somebody, can answer the question quickly and easily. Some are better than others. Does that make sense?
>>Well, it does.
>>You seem surprised.
>>No. What I’m thinking is I agree with that and that sounds wonderful, but here I am with my pencil and my Apple pen and a blank pad and I don’t know what kind of form this data should take. What is most effective? I mean, what about a pie chart? What’s wrong with pie charts?
>>Well, what are you trying to show in the data? Here’s a good example, if that’s the question you’re asking, are different types of cuisine likely to have different ratings a map doesn’t make sense. If you’re wondering if different areas of the city have more problems than others, then a map would make sense. If your point is to understand the differences precisely between the cuisines and how many failing grades there are with each type of food, then you would probably want a bar chart, where it’s easy to compare the lengths of the bars and you can see precisely the difference. If what you want to say is something about how many failing grades there are relative to all of the inspections, then a pie chart might be OK if you’re saying — I don’t know how they grade them here, but if there are only a few A’s.
>>Relative to the whole of all the restaurants there’s only a few A’s a pie chart is fine. If you want to know precisely the difference between how many A’s and B’s, that’s going to be difficult for a person to read.
>>If I can cast you for a moment as the graphics guy in a typical newsroom, there’s a division of labor and you’re the journalist, it seems to me there’s a moment when there should be some kind of conversation between the two of you before the subject of doing a graphic even comes up.
>>What is that conversation? The question you need to hone?
>>I would ask her: What is it that you want to communicate. If a person is looking at the data visualization, what is the message? What should they be able to answer in a few seconds?
>>So do you think the typical newsroom those conversations are happening?
>>I don’t know what’s typical. I suppose the journalist gets through the entire reporting and they’re in the process of editing and the editor decides they need art to go along and break up the words and then we go to the art department as it’s usually called and say make a graph of something. That’s obviously not a great way to do it. But there are — in other more enlightened publications there are lots of conversations about when the graphics team should be involved. This is a thing that’s important at National Geographic because the visuals of National Geographic are very important. So the cartographers are often brought in fairly early in the story to be able to help tell the story. Because what they can tell on a map will change what the journalist, the writer, how much they need to carry, how much of the load they need to carry with the words and how much they can leave to the map.
>>I just want to interrupt and tell our unseen listeners that they can send us their questions by tweeting the hashtag #KavliConvo and we will relay them and you will be able to participate too. I remind you all again that this is a conversation, so please feel free to come around to the mic and ask questions.
>>Can I add? I just said one criterion would be fast. But I think that depends on context as well. So I’m assuming that when you said journalism and newsroom newsroom, I assumed we have to publish something really quick and that people can’t really spend too much time reading this article. But of course this is not true for every case. Even within journalism, there are certain pieces that are supposed to be read without — with more depth, more time, and mulling over it. So it’s not necessarily the criterion that people have to be able to read something fast. With clarity, yes. Visualization has to be clear and understandable, comprehensible, but not necessarily — it has to be consumed fast.
>>But it’s a fair point. Let’s use your experience with propublica if I may as a kind of test case for our conversation. You worked with them to do some analysis, which you should explain to us, on Yelp reviews. I wonder if you could just give us a sense of what that initial conversation, is this a story, is this actually a data graphic? What was that like if you don’t mind bringing us in.
>>That would be the other side of data visualization that we discussed before. There is more on the exploratory side. So now the goal is no longer necessarily to create a visualization to communicate something to the large public, but it’s more like how do we use visualization to help a person or two analyze a specific data set? That’s the type of work in particular that I’ve been doing with propublica. We’ve been working with one or two journalists from the newsroom who wanted to analyze very large sets of reviews from Yelp specifically to understand, among other things, medical malpractice. These were all about comments — customer comments about health care services. As you can imagine, if you want to understand what people say about doctors or medical services and you have hundreds of thousands of reviews, it’s not that easy. You need some sort of tool.
>>But going into this, the reporter you were working with, he or she had no idea there was a trend or pattern or a telling insight to be had. They just knew they had an immense compost heap of undifferentiated data and they looked to you to make sense of it.
>>Absolutely, yes. They have have had ideas that there are some patterns there.
>>Right. Doctors that have a lot of malpractice suits also getting bad comments on Yelp? That would be interesting to me.
>>It’s a long story, but I think some of the work they wanted to do is to be able to identify a different type of malpractice and then what journalists at pro proplucka do, once they identify a specific, egregious case, they would call the person and say: Look, there are ten people who say that you’ve done this. And we want to publish an article about that. Do you have anything to say before we do it? That’s the way it works.
>>I’m back on the — it’s very common now. I mean mean, New York City, we could even talk about this later, but I think the last time I looked, there are 2,600 and something or other public data sets, huge collections of municipal data of every sort that New York has been collecting in the course of doing its business and has stuck it up online and there it is. But those are just like dumps. I mean, how do you come to something like that with a question? That’s why I want to understand this Yelp business. You had no guidance going in. It was like what can we find, if anything? And I want to understand, we’d like to understand, how visualizing that kind of data might help you analyze it.
>>I think the most important ingredient is somebody has to have some questions. I am skeptical of the idea that you look at data — I mean, I can do it for the fun of it because I like it, but in general, the real value comes from the question and then from the data, right? I think in this case what was really good with the project with proplucka is that they did come with questions, but they didn’t have the tools that allowed them to answer these questions. So basically what we did was to create a data visualization tool that works on top of Yelp data that includes the reviews, several statistics, and helped them quickly focus their attention on the subset of the data that answers the questions they had. I know it’s a little abstract.
>>I understand. That’s the point. It’s an abstract intellectual exercise. One of the things that’s marvellous about maps, they’re kind of like our original metaphor for organizing information. I wonder wonder, in your — looking through the history of that sort of mapping, I mean, what are the examples that stand out for you that might point us into how to approach the geography of mapping through things like geotagging or whatever, what kinds of questions we might be asking that could yield informative visuals? as maps?
>>You’re asking me, like, what are the best maps in history?
>>In terms of what they’re able to show?
>>Gosh, that’s a hard question. There’s about 15 maps just popping into my head right now.
>>One that comes to mind is a set of maps that was in an obscure army corps of engineers report from the 1940s, I think, about mapping old, abandoned river channels of the Mississippi River. And these maps were commissioned by the army corps because they were trying to understand better how to control the river and by looking at its history they could understand different things about where they could build levees and what the river might be prone to do. So each channel that was mapped is in a different color. So what you end up with is this incredible sort of swirl of colors. It looks kind of like different-coloured straggety. It’s the first spaghetti. It’s the first map in the book. At some point — here it is. That’s one piece of it. I’m sure you’re familiar with this. So there’s I think 15 different maps that look like this that map the entire Mississippi River fromo Illinois into the gulf. At some point a cartographer found these maps and basic basically put them out there and the cartographers went berserk for these maps because they’re such an effective way to show how dynamic the river is and how much territory it covers. This is 50 miles across here and that’s how much the Mississippi River has traveled over the past few thousand years and now we’re trying to hold it in one place. At the same time, in addition to that wonderful big pattern and message, there’s an extreme amount of very specific detail in these maps as well, and that’s really hard to do. So I’d say that these are a good example of what — how effective a map can be.
>>Professor, do you have a question?
>>I do. I thought I’d ask the first question just to get the ball rolling here. Back in the day, back in the old days when people did most of their journalism work in newsrooms, there used to be this person called the graphics editor, because back then we called visualizations graphics. And the graphics editor at the newspaper where I spent a lot of years would always say: Keep it simple. All graphics should be instantly understandable. And now we’re in an era where not only we have these massive data sets, but we have the capability of layering multiple data sets and it’s just not that difficult to produce very complex visualizations, what we used to call graphics, that definitely violate that editor’s old maxim. So I guess my question is: In an era when we can produce things of immense complexity without much difficulty, is that what we should be doing? Because I see very beautiful, very complex graphics, even in The New York Times, which has amazing art department. I see things that routinely violate my old editor’s rule of instantaneous understanding. I wonder what you all think about that. Since we can do more complex things, should we be doing that, or not?
>>I think that’s a great question. Just because we can do really complicated, interactive, dynamics scatterplots that turn into — and all move at the same time — you know, is our ability to make these visualizations outstripping our ability to kind of really perceive them? If Dan doesn’t mind my paraphrasing there. Betsy.
>>There’s a lot there. I think that graphics editor was probably underestimateing his readers. And I think that’s a really high bar to insticontainly be able to understand something. A lot of good graphics you can get the main message or some message instantaneously, but it’s nice if there’s more to explore and you can get more out of the graphic. But one big question is always what to leave out.
>>Well, but I think that this touches on something that you make a point of that impressed me, which is that in order to improve these visualizations or to make effective ones, we really need to have a better understanding of the strengths and weaknesses and biases of how human beings actually perceive the world. What are you driving at there? You mentioned color, for instance, but literally there’s a kind of neurological thing going on here when we look at stuff that affects how we judge it.
>>Yeah. Enrico might be able to speak to this a little better, because data visualization specialists are very aware of these aspects of visual perception that make humans better or worse at interpreting different types of graphs. And there’s been research for decades on this showing that it’s much easier for people to compare the lengths of two bars than it is for them to compare differences in angles, like in a pie chart, or an area, another aspect of the pie chart, or changes in color. So I think the sort of fallback position is to use the easiest possible visual representation for humans to understand that still conveys the message that you want. So if you can convey it with a bar chart, then use a bar chart, because people will be able to understand that better, especially if the point, like I was saying before, is to be able to understand accurately the differences between two values. But the research has gotten much further and people are going on about all the different aspects of visualizations that testing with Amazon’s mechanical Turk, where you can cheaply get some highly unscientific group of people, non-random sample, to try to interpret these graphics and you can from that learn what is easier for humans to discern. With color, another pitfall is that if you put two colors next to each other, they’re going to interact. So if you have got a light color surrounded surrounded by dark colors, it will look a lot lighter than it would look if it were surrounded by a bunch of colors that are even lighter lighter.so if you’re interpreting data where there are lots of colors next to each other in cells or whatever — by cells I mean like on an Excel chart, not in the body — then you can actually misinterpret data because of the contrast issues between shades.
>>I know I was intrigued to see online as part of my preparation for this that actually there are a set of apps, actually, for running color graphics through various simulations of how a colour-blind person would see them. There are different forms of colour-blind colour-blindness, but it’s a significant portion of the population that doesn’t see color the same way. That clearly is a challenge that the community is dealing with. You were nodding while she was talking. What is it about the way that people perceive stuff that actually serves as a useful and important limit on how complicated these things can get?
>>Well, data visualization is based on the science of how we perceive the world with vision. So every time we design something that breaks the rules in a way way, or goes — or doesn’t take into account the way we see the world with our eyes, you have problems. Some of the examples that she mentioned come from the literature of human vision. There are many of these examples, many.
>>People are — I’m sure you’ve seen online you can go on and look at different visual illusions that we’re susceptible to. If you don’t take some of those into account, then you’re going to have problems.
>>Do you have examples of things you like?
>>Yeah. I can show.
>>And then I’ll — maybe while you find those, you have a question there? Go ahead.
>>Hi. For someone who was never trained to do data science or data journalism, is there a tool out there to help us get the ball rolling? I don’t even know the term data cleaning. Where do I start? Thank you.
>>That’s actually the best question. Where does someone start, Enrico? And I know the answer is take a course with you. But I don’t think that’s what we were fishing for.
>>My gut reaction to your question is I wouldn’t start from the question of what is the best tool, but what should I learn first. First I would start with learning the main notions and principles, because tools are always changing one way or another. That said, if you want to start with data visualization in an easy way without programming, there are many tools out there there. Can I mention specific tools?
>>Tableau is a great tool. I don’t know if you’ve heard of it. It’s very easy. You can very quickly load a data set and through drag and drop create very useful data visualizations. Whatever you like of the way tableau works is kind of like exposes what is called — I’m going to use a technical term, the grammar of graphics, which is some sort of visual language that you can use to build data visualizations. And in a way, as you learn to use tableau, you are implicitly learning the grammar. I don’t know if it makes sense. Another tool I really like that is created — is being created for data journalism is called data wrapper. It comes from a company that is based in Germany. One of the founders is a former graphicker of New York Times. They’re doing an excellent job in providing really good data visualization defaults for a number of classic data visualization problems that journalists are confronted with. This ties back to something I wanted to say before about the previous questions about we have complex tools. We have complex tools and now with these tools it’s so much easier to create complex visualizations and of course abuse it. I agree, but I also think that people like those who are behind data wrapper are doing a fantastic job at providing good de defaults and good ways to think about data visualization. So a tool can be powerful, but also guide people in the right direction. I think this is really important. There are tools out there that don’t have good defaults, but there are other tools that have really good defaults.
>>When you say guide people in the right direction, who are the people? The creators of the graphic or the viewer?
>>I think part of the earlier question was that people are doing graphics for people who can’t understand what is being shown. I mean, these things can get very challenging.
>>Yeah. I don’t want to turn this into an advertisement for these tools, but the two tools I mentioned I think typically, especially data wrapper, doesn’t allow anyone to create something that is necessarily too crazy. Most of the visualizations that you can create with it are at the same time powerful but easy to read. I could go on forever.
>>Let’s take another question. You’ve been patient.
>>My focus as a journalist is climate change and I’d be curious to hear examples of where you think climate change has been visualized well and any ideas you have for how to better visualize climate change and its impacts.
>>Actually, there are a couple of interesting issues there.
>>I could go on forever with this one. It’s a great question.
>>I think there are a number of graphics that a journalist named Peter Aldous has done for buzz feed. He’s a self-taught mapper. He’s done a series of maps, I think actually maybe I have them here.
>>That would be great. It’s hard to talk about pictures without showing them. It’s tricky.
>>He’s done a series that let people explore what’s going to happen in their area. So you can look at how the potential for flooding in your area has changed. You can look at when the last major fire near your house was. And I think these kinds of visualizations are important now that we are having impacts to really show people that this is everybody’s problem and to make it a less big, abstract problem and more something that you can see in your — you know, you can see where you are on the map. So I think that’s one thing that I’ve seen recently that’s nice.
>>In the marriage of climate-related impact data to very local mapping. Did you have an example that you wanted to show us? While you’re looking it up, you wanted to add your point and I have a question I wanted to ask you as a follow-up to her requested. Go ahead.
>>Regarding climate change, I think one of the most successful I’ve seen around is the climate stripes. I think probably many of you saw them.
>>Are people familiar? No. Explain.
>>I’m surprised. Did you? Yeah.
>>Some people have.
>>I saw somebody had it put on their car.
>>The story behind the climate stripes is a climatologist from the UK, an academic professor, I don’t remember exactly, I think from the University of Redding and he created these climate stripes using color that show how the average temperature around the world has changed over the years. It goes from very blue to very red.:kind of goes against all the rules of how to visual data properly, but they work really well. Why do they work well? Because they communicate very effectively what the message behind it is.
>>And the message behind it —
>>The message behind it is everything is getting warmer.
>>And you’re going to have to explain it. If I show you the climate stripe, it would be obvious.
>>That’s a great example and I want to look at this map. Just to follow up: I read about climate every once in a while, I mean regularly, and I’m always bombarded with citizen scientists, self-appointed people who are well-meaning and they can go to these data sets and they instantly pick out a version of a temperature record that they like that proves their point or whatever and then they hurl it at my head, where it will leave a bruise. I don’t think they’re being argumentative because they’re mean-spirited. They’re looking at the data and they see, if they started here and started there, it looks very different from what we are addling as journalists is the trend. So my question to you is: If there’s a time series and temperature is a good example, and climate change, to be transparent, to be trustworthy — and this is for both of you — I mean, can I, as the journalist, pick out the part of the graph I like, or do I have to, every time I want to talk about warming in the western hemisphere or whatever, include a fever chart that has all 1,800 years of tree ring data or whatever the question is? How far back do I have to go to be honest or not honest? in a time series?
>>It’s hard to tell. I think this goes back to what we were saying at the begin. Necessarily when you want to communicate a message, you want to find that part of the data that communicates the message more effectively, and that’s a solution, but it’s also part of the problem. In climate science as well as in other complex systems, the problem, again, goes back to the data.
>>Right? There’s so much to say about how the data is gathered, how part of the data is gathered and some other part of the data is gathered. The science behind it is so complex that at some point you can’t express everything within the confines of an article.
>>OK. But, Betsy and Enrico, issues of trust and transparency with this kind of data and these kinds of visualization are at the heart of so many journalistic arguments that we have about these important tops. What should we — topics — this is a question for both of you what should we be doing as visual journalist to communicate the sources — what do we do to make us trustworthy? Why should a reader, given that the data is an argument, as we established at the beginning, why should they believe it?
>>Transparency is important, to say where the data came from. Your question about how far back to go in the record, might depend on what your story is. But I think it’s important to have as much context as possible. Otherwise you’re talking about the polar vortex indicating that climate change, global warming is not happening.
>>Yeah. I mean, I get that.
>>Journalists would never do that. [LAUGHTER]
>>No, but — I don’t mean to beat this one into the ground, but it’s a nice concrete example. Every year those of us in the audience and online, whatever, who write about the warmest year yet kind of story, they’re always in January and early February. In the modern record, which goes back to 1840. That’s what that story will be about. We’ll always have readers: Oh, yeah, but if you look at it from 1998 —
>>But why would you do that.
>>Because they’re just looking at the graphs and if you look at that, it’s actually a plateau.
>>The world or industrialization sarstarted in 1998.
>>You’re in writing the argument there. You have different charts right there.
>>None of those are true.
>>But they’re not not true either. That’s sort of the problem with data visualizations.
>>There’s the question of what to include and that’s sort of what we’re talking about here. And then the fact that you can take the same data and visualize it in many different ways and get very different stories out of the exact same data. I think that’s a thing that’s important for readers of maps and visualizations to understand. And I don’t think people think about visualizations critically like that.
>>How would we source it?
>>What do you mean?
>>Well, is that part of the transparency, trust thing? Here’s my chart, here’s my map and then how much should I reveal about how I built my tool to go through Yelp?
>>As much as possible.
>>As much as possible.
>>If you can, make the tool available so other people can try it, and links to where the data came from, if you can, if it’s accessible. The more the better. That in itself will help the trust issue.
>>So for any data visualization you need to source it as thoroughly and openly as you would source an article?
>>Yeah. If I can add something to that. I think every complex problem out there is necessarily nuanced. So personally, I don’t know if that’s true for everyone everyone, but personally what I like in journalism is when I read something that is nuanced, not black and white. For me, nuance is a form of trustworthiness. I understand there’s a limitation there because you can write a whole textbook. You have to read an article. But I think some journalism is better than some other journalism when it offers not necessarily balance but nuanced argument. And when something is uncertain, explains why and when something is uncertain, or how much certainty we have in something or something else. I think that’s really important. That’s a clear divide between what I personally find interesting in journalism.
>>Do we have a question?
>>I wanted to ask: How do you think data visualization contributes to or might enable poorly devised statistical analysis like P value mining?
>> I think in general visualization is really good at revealing problems with your data or problems with any process that has been recorded through data. So my answer is a little bit broader than what you asked. But I think in general data visualization is really useful as a diagnostic tool.
>>How so? Both of you are nodding. Expand, please.
>>Virtually every time in my work I had to collabo collaborate with someone who wasn’t the source of questions and data. Once we start visualizing it in a certain way, they’re surprised that the data reveals something that is somewhat counterintuitive. So one way to describe the power of data visualization that I’ve heard many years back is that through data visualization you can find what is expected and what is unexpected. It’s kind of like confirmation of some of the things that you believe but also surprises. And I think when something is surprising, it’s really interesting. And visualization is really powerful as a tool to be surprised about something. We all have in our head an idea of when we collect data and we visual visualize it, the person who is looking at the visualization as an idea, a mental model of the world that has been captured through the data. Once it’s visualized, now I can see it with my eyes and I can find discrepancies between what I believe is true and what is depicted in the data.
>>I’ve had a lot of conversations with scientists and data visualization experts about this, and in particular there’s a group at the Broad Institute at MIT that is there to help scientists explore their data through visualization and maybe present it too, but primarily on the exploration side. Time and time again they will come in and say: I want to see this aspect of the data, and they’ll look at different ways to visual visualize it and discover that either they were wrong or there’s something far more interesting in there that they hadn’t thought of before. And until they were able to see the data, they couldn’t see that aspect of their own research. I’m not going to be able to come up with an example that I remember, but that’s definitely something that happens a lot.
>>It happens all the time.
>>Hi. I really liked what Enrico said about the grammar of graphics. I come from the world of comics and similar language is used to describe such pictorial language, like the line is the author’s voice. Do you grammar school recommendations or tools we can use to better understand tools like data visualization?
>>I’m just not sure I understand completely your question. What do you mean by tools to understand data visualization? Sorry.
>>I’m sorry. I understood the first part. I got lost on the second part.
>>Right. Better ways we can look at an image and mine it for the data that it is presenting, the data visualization data kind of.
>>As a viewer or as a createor?
>>As a viewer.
>>So shouldn’t the chart or shouldn’t the graphic be telegraphing that information to the watcher here?
>>I think most people who make charts would say that that’s probably true. But I have talked to —
>>But is it true?
>>Well, we’re capable of so much more with visualization now and we have access to so much more data that there is maybe an argument to be made for better visualization literacy among the population. But that’s a question for someone else. To better be able to read. I don’t know. I think trying to make visualizations will make you a better reader of visualizations. And I was going to add to the tools that Enrico brought up. If you want to make a map, there is at last tools that are accessible to anybody, any journalist could find a geographic data set and make a map. It might not be good, but there’s map box and it’s free online as a free and Carto also and they have different strengths and weaknesses. But you could go on there. You could go to the open data portal for New York City, grab a data set and within an hour or two have made an actual map. And a nice thing about those is that the default is not the rainbow color scale. But like you said, defaults are important in visualization tools and that’s part of why a lot of those rainbows get into journals because that for many years was the default. But that’s changing.
>>I want to get to our questions, but I would like to find out what it was you like so much about the mapping visualization of this climate data.
>>The headline will tell you. You probably would be interested in looking at this.
>>The headline being: Is your home at risk of flooding from rising seas by 2050?
>>Every spot you look at on the map you get a percentage of how many houses in that zone are going to be in the flood zone and what the current total value of all those structures is, which is interesting. But I was just using these as examples of making the climate change local — this shows sort of the sprawl, the amount of paved area in I think Houston. That looks like Houston. And he’s made those maps all over the country. This is looking at whether the fires were caused by humans or by natural causes like lightning. So you can see sort of what kind of fires are in your area. This is showing the areas of earth that I think are already above the 2 degrees Celsius warning.
>>Do you find the color scale is good? Is there anything technically about this visualization that appeals to you or is it the act that they did this?
>>Yeah, it was just an example. Before we were talking about climate things. That’s not a particularly attractive map, but I think it’s an effective map because you can see what areas are already — have on average gone beyond the 2 degrees Celsius warning. I think that’s what this map is. Yeah. So things that make climate change more personal I think are sort of what we’re getting at there.
>>Enrico, do you have any thoughts you’d want to add to the visualization of climate change with mapping?
>>I think one thing that I’m always thinking about data visualization and climate change is that the very large majority of existing visualizations are visualizations about the past and data that we’ve collected. It would be much more interesting to visualize the future and future scenarios. I think that would be much more interesting.
>>That’s a hideous rainbow map.
>>And it’s a map of.
>>This is a gravity map of the moon, and can you tell?
>>I thought that was Mars. There you go.
>>Can you tell by looking at it which is high and low gravity? There’s no reason to understand if red is more than yellow or less.
>>OK. You’re so smart. What would you do? How would you solve this problem?
>>Use a single-color ramp.
>>Hues of one color or maybe two colors if you have some point in the data that’s important, like 0 degrees or places that have warmed versus cooling or above and below the average gravity, I suppose, would be something.
>>So the gradient would be naturally communicative as opposed to this, which needs a very complicated legend.
>>You can see all kinds of sharp edges in there that probably don’t exist. And if you use a single hue, then people will be better able to understand the underlying data if it’s matched.
>>And it says something I suppose that I can’t tell the moon from Mars with that either. I assumed that was Mars. I think you’re right. It’s a similar one.
>>It could very well be Mars.
>>These are the color gradients?
>>This is a simultaneous color where two stars are the same color. If you are looking at a typical heat map that Jeanette cysts use all the time, that green square that looks bright, if you moved it up there it’s probably one of the darker squares. In reality, perhaps it’s not. There’s an example of how that can be misleading.
>>Cool question. Thank you for your patience.
>>I like both simple and complex graphics, the ones that you can manipulate. At scientific American and National Geographic where Betsy has worked, we call them information graphics, the ones where you are manipulating the information, you have to interact with the visual itself. I also like what my friend Jen Chrisson who was great with graphics, would call an info poster, a pie chart, bar chart, put them all together. So I started to think about the privileging — and I don’t mean this in a big social justice way. I mean more like the cultural studies way. Of, like, visual literacy over text literacy. We’re also very focused here at NYU journalism and in journalism in general. There’s definitely a privileging or a focus on text literacy over visual literacy on text storytelling over — at least still visual storytelling, even though we live in this incredibly video and TV laden environment. It’s interesting to me that we don’t have better visual literacy skills. I’m wondering, should science communicators and journalists be moving more in the direction of visual literacy? Pictures are how we first learn and understand information. It’s so interesting to me that we focus so much on the text storytelling. So should we be shifting our skills more toward visuals, and if so, how do we — where do we go next?
>>I think that’s a really interesting question. And I would ask you to stay at the microphone, because once Betsy and Enrico weigh in, you’re a science educator, I want to hear what your answer is too. Just because you asked the question doesn’t get you off the hook. Enrico, how would you answer that? Are we privileging text over visual literacy, and is that dumb? [LAUGHTER]
>>It’s a hard question.
>>Yes, it’s a great question.
>>I don’t know if I have anything intelligent to say. Yes. At the same time, I think data journalism and the use of visualization in journalism has been one of the major trends in the last ten years almost. Maybe the answer is that, yes, it takes time. But maybe — I don’t know. I’m not in journalism myself, so I may have a completely different lens from you. From the outside, I think journalism is going a lot in that direction. It takes time. I actually see the same development that happened in journalism, I would like to see them happen in other fields where I think they need to catch up. In science, for instance.
>>I was going to say, scientists typically do not get any training in data visualization, which is kind of ridiculous if you think about it, or often in statistics. Sometimes if they do have statistics, that’s the only data visualization instruction they get get, and that’s problematic. But I think particularly for science journalists, visual literacy, as you were asking about before, would definitely be a good idea. If we could understand, just like being able to understand how the statistics works better, being able to understand, read a graphic and find problems or get what the scientists at minimum is getting out of the graphics would definitely be a good idea. I don’t know. The tools are getting more and more accessible, so maybe there should be a data visualization class requirement. I would have liked that.
>>Robin, you teach as well as being an accomplished journalist, science journalist. So how has your sense of the balance between text and visual literacy, journalism, whatever you’d like to call it, how has that changed over the arc of your career?
>>Not a lot really. I think, to answer what your question is asking, I would say not a lot. Of course, my career has changed because I’ve been lucky to work at better-quality, more sophisticated publications over time. So with that comes more sophisticated graphics. But if I think about sampling visual representations and visual storytelling over the decades of my career across publications, across media, at least print and online media, I’ve seen more sophisticated storytelling, but the percentage of it is the same more or less. You know, rounding. And I still see a lot of marginalization of visual storytelling, or at least graphic and information graphic storytelling. It’s like this little ghetto. We admire them, adore them, but there’s one of them for ten of us. And it would be interesting in we changed that ratio.
>>I think it’s an interesting question. And I know that my answer would be that, yes, it’s stupid to have that separation, and it is not just a question of statistical sophistication and knowing which tool to use, that actually it requires us, as journalists, to embrace a level of esthetics and imagination that I think makes us a little uncomfortable.
>>And also I think journalists sometimes fail to apply the same critical thinking to figures. For example —
>>You want to show us something? Then I want to show something.
>>This is one of the president’s favorite maps. That’s the 2016 electoral map and obviously the message that the winner of this election wants to convey is that most of the country is red. This is a map of geography, so it’s completely relevant. It’s not a map of people. And if you look at things in a different way, like here’s a way from The New York Times to visualize that problem. We know —
>>I thought this was clever.
>>I don’t know, three, six, however many million more votes, but it looks like there’s very little of the country. Especially online these electoral maps, this is by precinct, are highly saturated. I think there are also issues with how we see red versus blue that make this even worse. If you change it a little bit you start to get a different story. This is a map I think that’s putting the colors only where the population is. So that looks different. Here’s another way of looking at the same story where the size of the circle is proportional to the amount that the counties leaning in that direction. You can also do that with population to get a more accurate —
>>Do you think that starts to get into the zone that Dan Fagan was talking about, where it’s too complicated to understand?
>>I don’t think that’s actually that complicated, and it’s more accurate in some ways. Depending on what you’re looking at. Just the point is journalists should be aware that maps like this are not what’s called normalized for the relevant value, which is population. This one actually is in some ways, but in general — I don’t think I have a state one, do I? No. It’s not a very good way of conveying that data.
>>And it’s misleading.
>>And it does more than just affect how you look at the data. This is how we now think about our version of political polarization right now, and it feeds back on itself. Dan, you’re looking at that like a moth at the candle.
>>I thought for sure, Betsy, that you were going to follow that with one of those maps where we actually distort the continent. There we go.
>>That’s called a carto gram where you’re actually distorting the geography.
>>Ask and it shall be answered.
>>Here’s a comparison one. This might be a different election.
>>Can you unpack the idea of the carto gram here? What’s the idea?
>>The size of the state on here is proportional to how many people are in that state. So — or how many votes in that area. So you get a more accurate comparison of the two colors. There are tons of other ways.
>>If you lived on one of the coasts, that’s how you’d like the electoral college to work? Is that the idea?
>>Yeah, you know, it’s just — that looks very different, right?
>>When you — geography is not a great way to show election data. Because of the vast differences in how many people live in different places places. It’s just like if you show a map of where people are tweeting about coronavirus. I’m sure this map must be out here, right? If you’re not thinking about it, you’re basically going to be putting a map of population centers. So that sort of thing happens a lot and I think people, and particularly journalists, need to think more about what these visualizations are actually showing.
>>Enrico, you were thinking about something?
>>No? OK. [LAUGHTER]
>>A question. I didn’t want to step on your thought if you had one. I’m sorry. Please.
>>You both talked about the importance of the selection of data, picking the right data to show before you get to work on the design and portrayal of it, but I’m wondering, as a journalist, at least, once you’ve made that decision of: OK, this is data I want to represent in a story, what then — you talked about a few of them, but what then are some other potential pitfalls that journalists commonly fall into when they’re trying to show that data accurately?
>>You mean like labeling or icons?
>>Or what kind of chart.
>>I’m wondering about a couple more examples.
>>Can you address that?
>>I think in general — I don’t know specifically only about journalists, but, yeah, select selecting or using the wrong chart is a common problem, or showing too much at once is a common problem. I think, in general, it’s hard to distinguish completely between what you’re visualizing and how you visualize it. I think we have a tendency to think about visualization as only the act of creating a visual representation of something, whereas I think you can’t really remove the part where you decide what information to extract from a different data set. The two things go hand in hand.
>>Unpack that a little bit. That’s interesting.
>>I wonder if I can give an example. Not too specific, but whenever we want to create a visualization, we start from some data set. And a data set has some information. Before this information is visualized, can be manipulated or transformed in many different ways. Actually, if you want to create a specific kind of visualization, we have to transform the data in some way. Because of that, there’s really not a sharp distinction between what you do in the realm of data and data transformation and what you do in the realm of giving a visual representation to this data. The two things go together. So some of the design decisions that you have to make when you visual visualize data pertain to deciding how to transform the data to generate a certain type of chart. So the two things go together.
>>I think the most important thing is to figure out exactly what you’re trying to say with the graphic and then determine if the way you’ve chosen to do it is actually conveying that, if the aspect of the data is what’s being showcased, I guess, and what’s easiest to get out of the graphic.
>>I have to say that one of the most common problems that I see around is when I look at a chart or a visualization, what am I supposed to extract out of this graphic? Sometimes there’s so much going on that it’s really hard to figure out how am I supposed to read this? I think that’s one of the most important tests.
>>Can I add a layer to that?
>>In our conversation earlier, you mentioned something I found really very compelling, which was this idea of statistical numbing that a large, complicated graphic of particularly heart-rending, tragic, very human, large phenomena, mass murders and fill in the blank — I don’t know — that somehow so many numbers which are meant to convince us and persuade us and move us have the opposite effect; they turn everything into kind of hard informational little stale candies. I mean, what do we do, what can we do to combat that? You’ve worked with something that I heard you call anthropographics.
>>Yes. That’s one of the most “terrifying” results that I discovered in psychological research that is related to data visualization. There’s a renowned psychologist. His name is Polslavik. He’s been studying the problem of how do we commune indicate information about tragedies, and how do we communicate information so people are compelled to act. One of the most egregious examples he uses is the tragedy in Rwanda. Back then we had all the information we needed needed, but we weren’t able to convey the information in a way that people wanted to act, especially decision makers. So he did a lot of research that basically shows that the same way there are limitations in the way of visual perception, there are also cognitive limitations that you’re minds and brains have. One of these limitations is that it’s very hard for us to reason over large numbers. Because of that, if I’m talking, say, to you about the tragedy of one person, you’re going to have a very strong reaction because you can think about this person, you have a lot of empathy for one person. But if I start talking about 50 persons, it’s different. If I talk about a million persons, you can’t wrap your head around a million. A million or 2 million. The difference between 1 million and 2 million, to us, is nothing. But it’s huge. And Polslavik run a number of experiments showing that’s the kind of reaction people have. If I show you the story of one person, the reaction is much stronger than showing statistics about a hundred or a thousand or a million persons. That’s discouraging in a way.
>>I was going to say. The message of this is that we shouldn’t do graphics that depict large human effects.
>>You asked the question, asking me what is the solution. Some of the legal research we’ve been trying to do in my lab has been trying to see if visualization can play a role.
>>– in the sense that — no. [LAUGHTER] Apparently. I don’t know. I don’t know. In science, everything is preliminary. I would love to have someone that comes back and says: Look, what you did is wrong.
>>I’ve heard people actually say one thing you should do, instead of having little dots, have the icons be little people. Then we’ll identify with the little people who are the subject of genocide in Rwanda — I don’t mean to make light of that. I’m trying think of a large tragedy.
>>What we tried to do in our research is to say: It’s possible that the way you visualize something is actually part of the problem. If I show you some numbers or a pie chart or a bar chart, people just don’t feel a lot of empathy for a bar chart, right? So maybe if I kind of humanize this a little more, a little better, then people start relating a little bit more to the actual people who are behind numbers. And we develop this idea of what we called anthropographics, where there are data visualizations that try to convey more closely or more effectively the idea that there are people behind the data. We failed to do that. We tried a lot of different things. I think we don’t have time. We tried a lot of different things, like little icons or using less aggregation and more units, to see behind the statistics there are a hundred people, a million people, a hundred thousand people. It doesn’t work very well.
>>Betsy, threw this up on the wall for us.
>>I meant to show it earlier. An example of two different ways you can show the same data that maybe effective for different points. As far as what you guys were talking about, that surprises me because I’m sure you remember the racial dot map. This happen, people went insane for this map, that showed color code coded for race, one dot for each person in different cities. For some reason, the individual dots seemed to make a much bigger impression on people than other maps of the same data had before. And I thought it was because the dots were people.
>>That’s my intuition as well. Again, it’s possible that we failed to find that it does exist.
>>The dehumanizing effect.
>>Exactly the same intuition.
>>It can be profound. It did have breakthrough moments, which I suppose offer a —
>>I don’t know why people likeed that map. I thought it was because of the dots.
>>We have time for one last question.
>>I have a question of projections because I’m studying civil engineering. We need a plan for people moving into urban areas. How do you show uncertainty visually?
>>Thank you very much for asking that question. I want to hear both of you. I’m projecting the future. But I got to tell you, I’m not so sure. How do I do that visually?
>>There’s a whole area of research right now that is blossoming around visualization, how to visualize uncertainty. There are some classic techniques to do that, and researchers are developing new ones. I would say there’s nothing too established.
>>You just make the line dotted.
>>I’m trying to visualize that. Please.
>>There’s a researcher at northeastern University — I always confuse — no, northwestern. Sorry. I always confuse the two. They want to kill me now. Jessica hullman, a good friend of mine, and she’s been experimenting with what she calls hypothetical outcome plots. It would be nice to show them. I don’t have them here. But the idea is to use animation to show how frequently something happens. And we seem to be really tuned to understanding how uncertain something is when it’s animated. It’s hard to describe something with words, something that is very visual. But one of the new developments I’ve seen in recent years in this area is this idea of using animation to show how frequently something happens, and we seem to be especially tuned to extracting uncertainty or frequency of information. That said, there are standard techniques like box plots or level of transparency or even making — say if you have lines, making lines more or less crisp or objects more or less crisp, to communicate the idea that something is more or less certain. It’s an open area of research and there is a lot of work to do there.
>>Betsy, as a practical matter —
>>I thought his question was about map projections projections. I was all excited.
>>It could be.
>>I thought so too. That’s the transparency question.
>>This is a question we deal with in text all the time too with science journalism, how to convey how much uncertainty there is with a certain finding or what we’re describing. It’s a really difficult question. I don’t know the answer. It points out this fact that visualizations, and I think in particular maps, are very powerful if you put something on a map, people will believe it’s true because there’s an element of it that’s true. Well, you know, not exactly because of the projections, but the geography is somewhat true. So it just lends it this sort of weight that I think can be dangerous and very effective for conveying both true things and fake news.
>>Yeah. It just reminded me that another powerful method that has been researched recently is the idea of showing multiple possible outcomes. A popular one is the idea of hurricanes, what kind of trajectory a hurricane has.
>>And if you don’t like the trajectory, you can take a magic marker and extend the range of the graph. That’s very interactive. It’s a very interactive graphic.
>>That would be ridiculous.
>>The idea that has been explored is that if there are multiple future projections, then show multiple lines so you can see these are the possible ones, rather than showing the cone of uncertainty around one single path. To say. Am look, it could be this way or that way or that way. That’s another really powerful method that I’ve seen used recently.
>>Now if I were going to make a visualization of this evening’s conversation, I would I think do it as a kind of heat map where it would be very kind of blueish in the beginning where we sort of thought about this very complicated idea and we all then gradually warmed to the topic and things got very intense and by the ends of the evening this is pretty red stuff, interesting, hot, a hot topic. You two made it hot. Thank you very much.
Betsy Mason is a freelance journalist and the author of All Over the Map: A Cartographic Odyssey.
Enrico Bertini teaches, studies, and produces data visualizations at NYU Tandon School of Engineering.
Robert Lee Hotz is a science writer at the Wall Street Journal and a Distinguished Writer in Residence at NYU Journalism.