Histogram display a sample estimate of the density or mass function by plotting a bar graph of the frequency or proportion of times that a variable takes specific values, or a range of values for continuous data, within a sample
- Histograms are useful and easy, apply to continuous, discrete and even unordered data
- They use a lot of ink and space to display very little information
- It's difficult to display several at the same time for comparisons Also, for this data it's probably preferable to consider log base 10, since the raw histogram simplay says that most islands are small
- Stem-and-leaf plots are extremely useful for gettihng distribution information on the fly
- Read the text about creating them
- They display the complete data set and so waste very little ink
- Two data sets' stem and leaf plots can be shown back-to-back for comparisons
- Created by John Tukey, a leading figure in the development of the statistical science and signal processing
- Dotcharts simply display a data set, one point per dot
- Ordering of the dots and labeling of the axes can the display additional information
- Dotcharts show a complete data set and so have high data density
- May be impossible to sonstruct/difficult to interpret for data sets with lots of points
- For data sets in groups, you often want to display density information by group
- If the size of the data permits, it displaying the whole data is preferable
- Add horizontal lines to depict means, medians
- Add vertical lines to depict variation, show conficence intervals interquantile ranges
- Jitter the points to avoid overplotting (jitter)
- The InsectSprays dataset contains counts of insect deaths by insecticide type
(A,B,C,D,E,F) - You can obtain the data set with the command
data(InsectSprays)
attach(InsectSprays)
plot(c(.5, 6.5), range(count))
sprayTypes <- unique(spray)
for (i in 1 : length(sprayTypes)) {
y <- count[spray == sprayTypes[i]]
x <- sum(spray == sprayTypes[i])
points(jitter(rep(i, n), amount = .1), y)
lines(i + c(.12, .28), rep(mean(y), 2), lwd = 3)
lines(rep(i + .2, 2),
mean(y) + c(-1.96, 1.96) * sd(y) / sqrt(n)
)
}
- Boxplots are useful for the same sort of display as the dot chart but in instances where displaying the whole data set is not possible
- Centerline of the boxes represents the median while the box edges correspond to the quantiles
- Whiskers extend out to a constant times the IQR or the max value
- Sometimes potential outlieers are denoted by points beyound the whiskers
- Skewness indicated by centerline being near one of the box edges