Use and Misuse of Graphical Representations

Learning Objectives

Introduction

In addition to bar graphsA graph that uses horizontal or vertical bars to represent data., histogramsA graph using bars to show continuous quantitative data over a series of similar-sized intervals. The height of the bar shows the frequency of the data, and the width of the bar represents the interval for the data., and circle graphsAlso called a pie chart, a type of graph where categorical data is represented as sections of a whole circle. (pie charts), there are other graphs that statisticians use to represent data and analyze what it shows. But you have to be careful when creating and reading graphs. If they are not carefully created, they can be misleading, and sometimes people purposefully make them misleading.

Choosing a Graph

Choosing what type of graph to use to represent a specific data set takes some trial and error. And, sometimes, there is more than one appropriate type of graph you can use. What you choose depends on the way you want to present your data, as well as your own personal preferences. Modern spreadsheet programs like Excel are very flexible at creating different types of graphs; with only a couple clicks, you can view data represented as a bar graph, line graph, or circle graph. From there, you can choose which one best paints the picture you want to show.

Since there is often more than one way to graph a data set, let’s look at some examples and think about the different possibilities that are available to us.

Example

Problem

 

 

 

 

 

 

A baseball writer wants to create a graph showing the total hits for the players with the greatest number of hits in the first half of the baseball season. These players have the following number of hits: `86, 88, 90, 90, 97, 99, 102`, and `106`  What type of graph should the writer use to represent the data?

 

A bar graph entitled Hits, First Half of Season, shows Players A through H on the x-axis and number of hits on the y-axis. Player A has just over 80. Player B has 85. Player C and Player D each have 90. Player E has just below 100. Player F has 100. Player G has just over 100. Player H has a few more hits than Player G.

The data set contains information about the hit totals of `8` players. Either a bar graph or a pictograph could help show the total number of hits that each player had compared to the other players.

 

A line graph entitled Hits, First Half of Season, shows Players A through H on the x-axis and number of hits on the y-axis. Player A has just over 80. Player B has 85. Player C and Player D each have 90. Player E has just below 100. Player F has 100. Player G has just over 100. Player H has a few more hits than Player G.

A line graph would not be appropriate, since the data is not continuous. There is no data “in between” each player’s hit totals.

 

A circle graph entitled Hits, First Half of Season, is divided into 8 equal parts labeled Players A through H.

A circle graph does not make sense either, unless the writer wants to show the percentage of hits that each player has of the total number of hits. But that data wouldn’t be too helpful if the writer just wants to show the total amount of hits.

 

Answer

A bar graph or pictograph would work best here. (A pictograph may be preferable for a small amount of data, and a bar graph may be preferable for a lot of data.)

The writer could use a stem-and-leaf plot to show the distribution of numerical data, but this kind of graph is not as effective at showing the relationship between each player and the number of hits that he has. A box-and-whisker plot, which shows the middle of a data set, is not useful here since the writer is interested in the hit totals, not the average number of hits or the spread of the data.

Example

Problem

 

 

 

 

 

 

 

 

 

 

 

A statistician is collecting data on the frequency with which adults go to the dentist. She surveys `128` people and finds the following information.

 

Less than `1` time per year: `28` respondents

`1` time per year: `51` respondents

`2` times per year: `42` respondents

More than `2` times per year: `7` respondents

 

In a presentation to dentists, she especially wants to highlight the population that visits the dentist less than `1` time per year. What type of graph should she use to represent the data?

 

A horizontal bar graph entitled Frequency of Dental visits shows Number of Respondents on the x-axis and, on the y-axis,  “More than 2 times per year,” “2 times per year,” “1 time per year,” and “Less than 1 time per year.” “More than 2 times per year” reaches most of the way to 10. “2 times per year” reaches just beyond 40. “1 time per year” reaches 50. “Less than 1 time per year” reaches nearly to 30.

To show her findings, the statistician could use a couple of different graphs. A bar graph would be fine to use here since the data is categorical. She has grouped her findings into `4` categories.

 

An untitled circle graph is divided into 5 unequal parts. From largest to smallest, the parts are: 1 time per year, 40%. 2 times per year, 33%. Less than 1 time per year, 22%. More than 2 times per year, 5%.

A circle graph may be even better. The statistician is interested in the percentage or share of people who responded to each question. A circle graph allows for easy comparison among the categories surveyed.

Answer

A circle graph is best, but a bar graph would also be acceptable.

As with the first example, stem-and-leaf plots and box-and-whisker plots are not useful here. The statistician is not interested in the average amount of times that a person goes to the dentist each year. A line graph would not be appropriate either, as the data is not continuous.

Example

Problem

 

 

 

 

 

 

 

 

 

An amusement park planner wants to better understand the distribution of wait times that people experience while waiting for a popular ride. At the park one day, he asks `15` random people about the length of time they had to wait (in minutes).

 

`12, 3, 2, 10, 12, 0, 2, 0, 8, 5, 4, 0, 7, 4, 6`

 

What type of graph provides the best visual representation of this set of data: a circle graph, a box-and-whisker plot, or a bar graph?

 

An untitled circle graph is divided into 10 unequal parts. From longest wait time to shortest, the parts are: 12 minutes, 13%. 10 minutes, 6%. 8 minutes, 7%. 7 minutes, 7%. 6 minutes, 7%. 5 minutes, 7%. 4 minutes, 13%. 3 minutes, 7 percent. 2 minutes, 13 percent. 0 minutes, 20%.

The key idea here is that the planner wants to understand the distribution of wait times. A circle graph does not show distribution. The planner could create a circle graph (like the one at left) that shows the percentage of people who waited for different amounts of time, but this does not help him understand the distribution of the data.

 

An image shows a box-and-whisker plot. The number line goes from 0 to 13. The whiskers range from 0 to 12, and the box ranges from 2 to 8 with a median of 4.

Creating a box-and-whisker plot would be more helpful. This type of graph would show the distribution more effectively, as seen at left. Half the people waited between `2` and `8` minutes for the ride.

 

An untitled bar graph shows Person A through Person O on the x-axis and the minutes that each person waited in line on the y-axis. Person A, 12 minutes. Person B, between 2 and 4 minutes. Person C, 2 minutes. Person D, 10 minutes. Person E, 12 minutes. Person F, zero minutes. Person G, 2 minutes. Person H, zero minutes. Person I, 8 minutes. Person J, between 4 and 6 minutes. Person K, 4 minutes. Person L, zero minutes. Person M, between 6 and 8 minutes. Person N, 4 minutes. Person O, 6 minutes.

A bar graph can show the length that each person waited, but it does not show much about the distribution of wait times.

Answer

A box-and-whisker plot is best.

Ultimately, the box-and-whisker plot gives the most information about distribution, so it is the most useful.

 

An oceanographer wants to make a graph that shows the height (in centimeters) of a specific coral over the period of `2` years. Which type of graph is the most appropriate?

 

A) Circle graph

 

B) Box-and-whisker plot

 

C) Line graph

 

D) Stem-and-leaf plot

 

 

Misleading Graphs

As you have seen, graphs provide a visual way to represent data sets. Pictures can be misleading, though, so you also need to know how to identify graphs that seem to show something different than what the data says. This may be due to carelessness or it may be done on purpose. Below are some general questions to keep in mind as you read graphs.

Questions to Consider when Reading Graphs

 

  • Are the graphs labeled sufficiently?
  • What is the scale?
  • Does the graph show a full picture of the data, or only a select picture?

 

Look at the graph that follows. The title states “Average Salary for Adjunct Professors at Four Colleges,” and four bars appear on the graph. You can tell which colleges are being compared, but you are given no information about the scale that is being used. The graph makes it appear that the average salary for Adjunct Professors at Central College is much higher than that at Eastern College, but without a scale, you cannot know for certain. (You do know that the salary is higher; you just do not know how much higher.) To make this graph less misleading, a `y`-axis with salary information should be included.

A bar graph entitled Average Salary for Adjunct Professors at Four Colleges shows colleges on the x-axis and no label on the y-axis, but there are five horizontal lines. Eastern College is a bar that is nearly to the third line. Western College is almost touching the third line. Central College reaches the fourth line. Southern College reaches just past the third line.

Even when both axes are present and labeled correctly, graphical representations of data can be misleading. This is shown in the set of attendance graphs that follow.

In the graph on the left, the scale begins at `0` and goes to `20,000`. The graph itself shows that attendance at Minneapolis Wildcats games has steadily increased each year since `2008`, topping out in `2010` at just over `16,000`.

Now look at the graph on the right. It appears to show that attendance at St. Paul Strikers’ games has increased even more dramatically: the bar for `2010` is more than twice as tall as that in `2008`. From looking at these two graphs, you may conclude that the Strikers have been the more popular team recently, as the height of the bars seems to indicate that their attendance has grown faster than that of the Wildcats.

But notice something interesting: the scale of the Strikers graph is very different. It begins at `10,000`! This paints a skewed picture of the data when compared with the Wildcats graph, which starts at `0`. And by examining the actual data (the actual attendance, not just the height of the bars), you can tell that attendance is actually greater at the Wildcats games. In `2010`, for instance, Wildcats attendance is a little over `16,000`, while attendance at Strikers games is below `15,000`.

A bar graph entitled Attendance: Minneapolis Wildcats shows three years on the x-axis: 2008, 2009, and 2010. The y-axis shows attendance numbers of 0, 4,000, 8,000, 12,000, 16,000, and 20,000. 2008 reaches 12,000. 2009 reaches halfway between 12,000 and 16,000. 2010 reaches just above 16,000. A bar graph entitled Attendance: St. Paul Strikers shows three years on the x-axis: 2008, 2009, and 2010. The y-axis shows attendance numbers of 10,000, 11,000, 12,000, 13,000, 14,000, and 15,000. 2008 reaches 12,000. 2009 reaches most of the way to 13,000. 2010 reaches almost to 15,000.

This brings up an important point. When you are using graphs to compare data sets, the scales need to be consistent; otherwise, it is very difficult to compare the data itself. As you can tell from the two previous graphs, changing the scale of a graph can dramatically change the way it looks and the impression the graph makes.

A more honest representation of the attendance data can be found in a double-bar graph, where the attendance figures from both teams is mapped side-by-side using the same scale. Look at the results below. Now it is clear that the attendance for the Wildcats is greater than the attendance for the Strikers.

A double bar graph entitled Attendance: Wildcats and Strikers shows 3 years: 2008, 2009, and 2010 on the x-axis. The y-axis shows attendance numbers of 0, 4,000, 8,000, 12,000, 16,000, and 20,000. In 2008, the Wildcats and Strikers both had attendance of 12,000. In 2009, the Wildcats had attendance of about 14,000 while the Strikers had about 13,000. In 2010, the Wildcats had attendance of just above 16,000 while the Strikers had closer to 15,000.

The circle graph here is another example of a misleading representation. The actual percentages of people who responded to each question are not available, and the viewer has to interpret the data based on the size of the sections. At first glance, this graph seems to be showing that a lot of voters seem to be favoring Candidate A, as the “Yes” section is very large.

Part of the reason why this section appears large is because the graph has been created so that it looks large. The circle graph is presented in three-dimensional form, and the data that is foremost in the graph (the “Yes” slice) appears the most prominent. The creator of this graph is hoping that this graph will make you think that Candidate A is very popular!

A three-dimensional, exploded circle graph entitled Question: Will You Vote for Candidate A is divided into 5 unequal parts. From largest to smallest, the largest part appears to be Yes, at about two-fifths of the circle, which is closest to the viewer and is shaded blue. The next largest is No, at about one-third of the circle, and is shaded red, then Probably No, at one-fifth the circle in light red, Undecided at one-tenth the circle in purple, and Probably Yes as just a sliver in light blue.

On closer inspection, though, the data does not seem to support this contention. Combining the “Yes” and “Probably Yes” sections is roughly equal to combining the “No” and “Probably No” sections, which means that the candidate is not as popular as this representation would suggest. In fact, someone who did not want this candidate to appear favorable could have represented the data using the next graph. Notice the different positions of the “No” and “Probably No” sections, as well as the consistent colors.

A three-dimensional, exploded circle graph entitled Question: Will You Vote for Candidate A is  divided into 5 unequal parts. From largest to smallest, the largest part appears to be Yes, at slightly more than one-third of the circle, and is farthest away from the viewer and blue. The next largest is No, at about one-third of the circle, shown in red, at the forefront of the image. Next is Probably No, at one-fifth the circle, shown in the same red as (and next to) the No part. Undecided at one-tenth the circle, is purple. Probably Yes is just a sliver, and is the same blue as (and next to) the Yes part. No and probably no, both shaded red, seem to make up half of the circle, while Yes and Probably Yes shaded blue, make up less than half. Undecided makes up the rest

Notice how perspective and color make a difference in viewing and analyzing data!

Next is a more honest way of representing this data. In this graph, the circle graph is shown from above, and the actual percentages are included.

A circle graph entitled Question: Will You Vote for Candidate A is shown divided into 5 unequal parts. From largest to smallest, the parts are: Yes, in blue, 40%. No, in red, 30%. Probably No, in light red, 18%. Undecided, in purple, 9%. Probably Yes, in light blue, 3%.

Results from a poll measuring a politician’s approval rating are shown in the table below.

 

Date

Approval Rating (%)

January

`55%`

February

`58%`

March

`59%`

April

`56%`

May

`59%`

June

`56%`

July

`52%`

 

Which of the following graphs is most misleading?

 

A)

A line graph is entitled Approval Rating: January to July. It shows Months on the x-axis and ratings on the y-axis. January is just above 50%, February and March and slightly higher. April drops to just above 50% again. May rises, June and July drop incrementally to about 50%.

 

B)

A line graph is entitled Approval Rating: January to July. It shows Months on the x-axis and ratings on the y-axis. January, 55%. February, 58%. March, 59%, April, 56%. May, 59%.

 

C)

A bar graph entitled Approval Rating: January to July shows months on the x-axis: January, February, March, April, May, June, and July. The y-axis shows percentages: 70%, 60%, 50%, 40%, 30%, 20%, 10%, and 50%. January is about 53%. February is about 58%. March is about 59%. April is about 53%. May is about 59%. June is about 55%. July is about 51%.

 

D)

A bar graph entitled Approval Rating: January to July shows six months on the x-axis: January, February, March, April, May, June, and July. The y-axis shows percentages: 100%, 75%, 25% and 0%. January is about 53%. February is about 58%. March is about 59%. April is about 53%. May is about 59%. June is about 55%. July is about 51%.

 

 

Summary

Graphs have a big impact on how you understand a set of data. Use an appropriate type of graph and you can communicate your data effectively; use the wrong type of graph, though, and your viewers may misunderstand the story you are trying to tell. When reading graphs in newspapers and online, be sure to look at the axes, the scale, and the presentation of the data itself. These can all help you identify if the graph is representing a data set fairly or unfairly.