Introduction to data visualization

From Knowledge Kitchen
Jump to: navigation, search


Owing to trends in human-computer interaction (in particular, ubiquitous computing and your ever-increasing digital shadow), people, governments, and organizations of all kinds are becoming inundated with data. It turns out that people, governments, and organizations are often not so good at analyzing this ongoing barrage of data.

There is a constant need to expose trends and intelligence from big data to extract the big picture from the tiny details. Therefore, data visualization and data mining are now very sought-after skills, and are gradually becoming a sophisticated tools for art and creative expression.

History

Please watch this video overview of data visualization for a sense of history and early uses of data visualization.

In modern times, Edward Tufte, originally a statistician, is considered to be one of the popularizers of information design who literally wrote the book that everyone quotes, but few actually read.


Terminology

The terms "infographic" and "datavisualization" are often used interchangeably. There is no clear distinction between the two, however, "infographic" is generally used to refer to simple data visualizations that are more graphically designed than usual, and usually appear in newspapers, websites, ads, and other heavily-designed materials.

To sum it up, infographics are meant to look cool and simple to viewers, whereas data visualization is a general term for visually representing complex data, often in ways that are not pretty.

data visualizations

  • usually minimally designed
  • usually not suitable for lay audiences
  • main purpose: to help viewers identify patterns in data
  • rarely used to influence people's opinions on issues

infographics

  • usually heavily designed
  • often suitable for lay audiences
  • main purpose: to effectively information that is already somewhat understood by the viewer
  • often used to influence people's opinions on issues
Infographic by Nicholas Felton
2D Histogram... a common type of data visualization. From improving-visualization.org

Quality

According to Tufte, a good data visualization should:

  • show the data
  • allow the viewer to focus on the data and not the methodology or the software
  • avoid data distortion
  • present many numbers in a small visual space
  • allow the viewer to easily compare/contrast
  • show the data at various levels of detail
  • integrate the statistical, visual, and textual descriptions of the data

Industry demand

Some recent job postings exhibiting the demand for data visualization expertise:

[Giant News and Media Corporation] is looking for a Web developer with Ruby and JavaScript skills who will develop highly interactive Ajax-based web applications and web-based data visualization components as part of a project to build a system to allow journalists to monitor election returns, exit polls, and support news reporting on election night.

[Financial Services Company] is looking for a Data Visualization Designer to help us revolutionize how investors view and interact with financial data.

At [Exhibit Design Company...] You will work on projects for world-class museums, top cultural institutions, major brands, and innovative corporations designing create interactive exhibits, 3D visualizations, immersive environments, large-scale displays, websites, mobile devices, and digital branding.

Im looking for a summer intern for the Center for Innovation in Visual Analytics at [Giant Computer Services Company Research Department]. 3 months in one of more competitive research center in data visualization and interaction of the world.

[...] Encourage young people from post-emergency communities (potentially in Haiti) to develop innovative ideas for DRR community projects. Facilitate the creation of a communication network; Assist in the creation of quality data visualization and advocacy materials, in collaboration with [Global NGO]

Common types of 2D graphical visualizations

Note - some of these links are broken... I am in the process of updating them - you can always find equivalents with a simple search.

Table

Rows and columns are a form of visualization! A table is an arrangement of data in rows and columns, or possibly in a more complex structure.

Airplane model data

Scatter plot

Scatter plot of Old Faithful eruptions
Pay disparity by gender, as visualized in a scatter plot and line chart by the NY Times

A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are color-coded, one additional variable can be displayed.

Line chart

A line chart or line graph is a type of chart which displays information as a series of data points called 'markers' connected by straight segments. It is similar to a scatter plot except that the measurement points are ordered and joined with straight line

A typical business revenue line chart
A parody of a line chart
A line chart and bart chart combination

Tips:

  • Line-charts imply that data is continuously changing. If your data is discrete you might consider a bar-chart instead.
  • When using colors to portray data values or types, ensure that the colors used are accessible to all users. Online tools such as ColorBrewer (www.colorbrewer.org) can help identify issues with colors used.

Pie chart

UK Consumer spending in 1990, as visualized by Phoebe Bright; broken down into categories (bubble size) and, within that, durability (pie slices)
An average day in the life of Nicholas Feltron

A pie chart is a circular statistical graphic which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice, is proportional to the quantity it represents.

Tips:

  • Pie charts are often criticized. Comparing the size of pie segments can be difficult, and many visualization experts suggest that bar-charts should be used instead.
  • Using area or volume to represent data can distort data values, and exaggerate differences between values. For example, if the radius of the circle is used to represent data values, the area of the circle will quadruple if the data values double. There is also an issue of 'perceptual scaling' - the tendency of people to underestimate areas.

Area chart

An area chart or area graph displays graphically quantitative data. It is based on the line chart. The area between axis and line are commonly emphasized with colors, textures and hatchings. Commonly one compares with an area chart two or more quantities layered on top of one another.

Area chart comparing quantities of salt across food categories, from Next Generation Food, 15 April, 2010

Map chart

A map indicating a comparison of measured quantities with color or other visualization techniques.

Wind in real time, from http://hint.fm/wind

Bar chart

A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally.

Bar chart of deaths from smoking, from webwell.org.uk

Stacked area graph

A combination of a line graph and area chart with areas stacked on top of one another indicating relative measurements over a horizontal axis.

A stacked area graph showing immigration to the uS, from flowingdata.com

Bubble chart

A bubble chart is a type of chart that displays three dimensions of data. Each entity with its triplet of associated data is plotted as a disk that expresses two of the vi values through the disk's xy location and the third through its size.

Heat map

A heat map is a graphical representation of data where the individual values contained in a matrix are represented as colors.

A heat map representation of house hunting activity on Trulia.com

Venn diagram

A Venn diagram is a diagram that shows all possible logical relations between a finite collection of different sets. These diagrams depict elements as points in the plane, and sets as regions inside closed curves. A Venn diagram consists of multiple overlapping closed curves, usually circles, each representing a set.

A Venn diagram showing what "it" was, according to Charles Dickens

Tree map

In information visualization and computing, treemapping is a method for displaying hierarchical data using nested figures, usually rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches. A leaf node's rectangle has an area proportional to a specified dimension of the data. Often the leaf nodes are colored to show a separate dimension of the data.

Treemap of Lebanese exports, from MIT Harvard Economic Complexity Observatory

Dendrogram

A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. Dendrograms are often used in computational biology to illustrate the clustering of genes or samples, sometimes on top of heat maps.

A run-of-the-mill dendrogram showing shared ancestry of bacterial species
A radial dendrogram of canine breeds, from the Institute of Canine Biology

Mind map

A mind map is a variety of dendrogram used to visually organize information. A mind map is hierarchical and shows relationships among pieces of the whole. It is often created around a single concept, drawn as an image in the center of a blank page, to which associated representations of ideas such as images, words and parts of words are added.

Mind map of tennis concepts

Fan chart

A fan chart is yet another variety of radial dendrogram that is most often used to display geneological data. A fan chart's regularly-spaced design relies on there being a fixed number of sub-branches to every branch.

A fan chart of geneological data

Network diagram

A network diagram is a visual representation of network architecture. It maps out the structure of a network with a variety of different symbols and line connections. It is the ideal way to share the layout of a network because the visual presentation makes it easier for users to understand how items are connected.

A network diagram

Flow diagram

A flowchart is a type of diagram that represents an algorithm, workflow or process. The flowchart shows the steps as boxes of various kinds, and their order by connecting the boxes with arrows.

A simple example of a flow chart for a game of blackjack


Popular software and languages

Microsoft Excel

  • is familiar to most people who are computer-literate
  • can do basic charts and plots
  • can apply criteria to data sets to gather aggregate statistics and insights
  • limited design flexibility
  • generally good for small data sets
  • read the official documentation

R

  • a powerful programming environment that includes serious graphics capability
  • open source
  • widely used within the scientific computing community
  • large community of support and lots of books
  • read the official documentation

D3

  • uses Javascript and HTML 5
  • open source
  • used exclusively for creating data visualizations on websites
  • widely used in commercial industry, the arts, and academia
  • the name means "Data Driven Documents", in case you were wondering
  • large community of support and lots of books
  • read the official documentation

Rgraph

  • unrelated to R
  • uses Javascript and HTML 5
  • used exclusively for creating data visualizations on websites
  • used in social sciences, business
  • not dominating any one field
  • read the official documentation

Processing

Processing.js


Links

Overviews

Visualization artists

Common types of graphical data visualizations

Physical visualizations

Tutorials

A few tutorials:


What links here