Bar charts can have several variations, but all utilize a common theme. Data from one variable is plotted on the X axis, and the Y axis quantifies the data displayed.
Scatter plots and line plots display data by contrasting the dependent variable (the Y axis) against an independent variable (the X axis). It uses the Cartesian Coordinate System, but is usually restricted to a specific X-Y quadrant, such as positive X and positive Y values. A scatter plot can also have multiple dependent variables for a single independent variable, and it may be shown in 3 dimensions with X, Y, and Z axes.
Click to Show Example
A fictitious dataset is used here that visualizes the number of Single Nucleotide Polymorphisms (SNPs) found versus the distance from the 5' end of an exon.
Dot plots are used to visualize how multiple categories or labels of data relate to each other. It's useful if you want to use a barchart-type visualization of data for numerous categories, but want to present the data in a way that isn't overly cluttered. A Wilkinson dot plot can also be used within other types of graphs, such as a layer in a Circos Plot. A Cleveland dot plot (not shown) uses a single dot for quantity instead of multiple dots.
Click to Show Example
# code in R
library(gcookbook)
countries2009 <- subset(countries, Year==2009 & healthexp>2000)
p <- ggplot(countries2009, aes(x=infmortality))
p + geom_dotplot(binwidth = 0.25) + geom_rug() + scale_y_continuous(breaks = NULL) + theme(axis.title.y = element_blank()) + labs(x = 'Infant Mortality')
A violin plot compares multiple data distributions. The X axis describes the dataset being compared, and the Y axis quantifies the frequency of values in that dataset. The width of each dataset at given Y points show the distribution of values (usually this is the frequency of values at that point).
Violin Plots usually have a boxplot overlay that shows the median and range of values.
Click to Show Example Dataset and Example Plot
The example dataset shows Microsatellite Instability (MSI) for a given cell line.
Cell_Line,MSI
WT,1
WT,3
WT,6
WT,3
WT,9
WT,9
WT,1
WT,9
WT,3
WT,8
WT,0
WT,7
WT,6
WT,4
WT,3
WT,5
WT,0
WT,2
WT,2
WT,0
Mut1,18
Mut1,10
Mut1,16
Mut1,18
Mut1,15
Mut1,0
Mut1,9
Mut1,16
Mut1,9
Mut1,9
Mut1,3
Mut1,5
Mut1,10
Mut1,16
Mut1,11
Mut1,19
Mut1,1
Mut1,7
Mut1,13
Mut1,16
Mut2,28
Mut2,39
Mut2,2
Mut2,24
Mut2,9
Mut2,26
Mut2,37
Mut2,19
Mut2,37
Mut2,11
Mut2,3
Mut2,15
Mut2,0
Mut2,21
Mut2,38
Mut2,36
Mut2,18
Mut2,20
Mut2,37
Mut2,32
# code
d <- read.csv('example_violin.csv')
p <- ggplot(d, aes(x=Cell_Line, y=MSI))
p + geom_violin() + geom_boxplot(width=0.1, fill="black", outlier.color = NA) + stat_summary(fun.y = median, geom = "point", fill="white", shape=21, size=2.5)
Circos plots are used to show relationships among numerous variables. It is commonly used in Bioinformatics to show variants among an entire organism's genome.
Click to Show Examples
A circos plot has different layers to show relationships and quantification. The plot should have a detailed description for each layer that describes the type and scale of data being visualized, unfortunately this does not always happen.
The outer layer has the identities for the dataset; in the example below, this is the chromosome number. The quantification layers show the quantities for the outer identities; in the example below, this is the number of homologous regions between mouse and human. The inner relationship layer (with 'ribbons') shows how groups of identities are similar to each other; in the example below, this is the homology among regions of the human and mouse genomes.
Note that the number of quantification layers is at least one, but technically has no maximum, and the relationship inner layer may not be present.
-
R Graphics Cookbook by Winston Chang, c2013. O'Reilly Press. Chapters 3 and 5, Chapter 6 pp 135-141.