Instructor notebooks: https://observablehq.com/d/fa16a9680714478d?collection=@observablehq/data-vis-course
YouTube recordings:
Instructor notebooks: https://observablehq.com/d/fa16a9680714478d?collection=@observablehq/data-vis-course
YouTube recordings:
Simple(st?) chart: just mapping two values from dataset to see relationship (e.g. fuel economy (mpg) vs power (hp)) If not seeing continuous values, probably not the chart to use (e.g. cylinder count, though a number, is not continuous)
Bar chart can have its axis sorted arbitrarily, scatterplot has to be continuous scale
Precision doesn't matter so much for data vis Good for quickly comparing values by category
Bad practice: starting at non-zero for bar chart value axis misrepresents data (makes it hard to make comparison between bars)
Unlike bar charts, need to show whole dataset because it represents parts-to-whole Not good for comparison between the parts (use bar chart for that)
Pie chart & bar chart are similar in that they both have two mappings, to category and to value
Generally "continuous process" that changes over time Almost always the horizontal axis is time
When can line charts go wrong?
Rule of thumb for line chart (not based on research, but common practice): "bank to 45 degrees"
Instructor answered question about "radar charts": "haven't seen a good use case for radar chart" :)
Bad practices are collected in "How to lie with statistics"
Charts are not meant to be used for high-precision, just to get a rough sense of what the numbers are
Using retinal variables to encode data: e.g. simple scatterplot, one variable would be position
Retinal variables also called visual encodings
... discussion of JavaScript, how Observable works, and
Plot
API ...
Marks are specified by function calls. For example dot
Plot.plot({
marks: [
// π This is mapping cars data to the position retinal variable
Plot.dot(cars, { x: "power (hp)", y: "economy (mpg)" })
]
})
Plot & vega-lite (and similar libraries) share terminology (such as "marks") from common ancestry in:
Plot.plot({
marks: [
// π This is a constant color
Plot.dot(cars, { x: "power (hp)", y: "economy (mpg)", fill: "steelblue" })
]
})
Plot.plot({
// π This is another encoding
marks(cars, { x: "power (hp)", y: "economy (mpg)", fill: "cylinders" })
})
In general, it's not a good idea to add too many variables mapped to visual variables, becomes impossible to read
Distinguishing between continuous and categorical scales
Plot.plot({
marks(cars, { x: "power (hp)", y: "economy (mpg)", fill: "cylinders" })
color: {
// By default this will be a _continuous_ color scheme
// (because of magic inspection of cylinders domain)
legend: true
}
})
Plot.plot({
marks(cars, { x: "power (hp)", y: "economy (mpg)", fill: "cylinders" })
color: {
legend: true,
// This makes the categories of color scheme explicit
// (and changes the color scheme, because no longer continuous)
+ domain: [3, 4, 6, 8]
}
})
Different from scatterplot, has categorical axis (like car make) and within each category, a continuous scale that values within that category are mapped
Looking at cars
dataset, can see the breakdown of different "tiers" of performance within car models: for Pontiac, a couple at the high-end, three or so in mid-high-range, then rest clustered towards the lower end
Plot
's table view is really nice! Gives DB-table view but with some smart histograms/filtering/sorting baked in.
Different from dot: for any given date (x
value) there can only be one (and should be exactly one) y
value
Force a line chart to include 0 with Plot.ruleY([0])
mark
Important distinction between different types of data:
People tend to get this wrong (e.g., showing alphabet frequency as line chart, doesn't make sense because it's showing continuity as if there were values between individual letters)
One of the reasons radar charts are bad, changing the axes around changes the shape of the chart
Distinction is not always clear, for example, aggregating continuos data, you might want to show large bins as categorical (years as bars)
Pop out effect (pre-attentive vision) its importance is overstated in data vis
The thought is that a single color bar stands out and is processed before anything else
Motion effective, and distinctive color (single blue bar in an otherwise-gray bar chart)
Specifically useful for presentation, when pointing people to something
In early phases of data exploration don't want that (want to avoid any kind of "bias")
Early stages of data vis (exploration) should be boring
"Don't use presets because they often have colors" (this is true of a lot of chart libraries, Plot
's defaults try to be boring)
Unusual charts are memorable, take longer to process (Sankey)
Not biasing, but helping to understand the chart: e.g., adding a 1/26 line for the alphabet frequency (those bars extending above are more frequent than if they were all equally probable) or adding reference line such as average among many lines in line chart
For comparisons between spikey data, useful to smooth the values (makes higher level pattern easier to parse)
Binning creates a histogram, shows distribution of data
If looking at a bunch of overlapping data, could use dodge
which will push plotted points so they stack
dodge
"pushes apart" plotted data points (dots)
Sometimes affected by quantization (or reveals quantization, you can see higher aggregations around specific points, like multiples of 10)
(Same as beeswarm chart? I think, but I missed what he said there)
Binning does similar to dodge, but slices data into ranges (bins), can control the granularity/size of bins to reduce the spikiness/noise of raw data
Can also pass in explicit threshold values instead of number of bins
"Noise can be interesting (might point to errors in data gathering), but could just be noise"
Stacking can be a bad idea.
(Mentions bad idea to have too many categories)
Comparison can be hard with stacked bar charts (bars aren't starting at same point on axis). People have a harder time with stacked bar charts than pie charts (comparison also difficult with pie charts).
Stacked area: same issue, can read the area on the bottom pretty well, but once they're stacked its hard to compare.
Hard to read the vertical width of each section, especially when at extreme angles, distorts the vertical width (which is actual value)
Other views (silhouette, wiggle) try to make the stack easier to interpret
For continuous value, smooth by taking average over window (neighborhood of values)
(I was very tired and didn't take many notes on this one)
Facets
Grouped bar charts: give you something that is difficult in other repre
Index chart
Stock prices, e.g., can show starting from a collapsed single point (0) and then plot multiples of initial value
Shows change (not comparison of absolute value)
In Plot, you can use normalizeY
for index chart