Skip to content

Instantly share code, notes, and snippets.

@autocorr
Last active November 9, 2017 17:50
Show Gist options
  • Select an option

  • Save autocorr/b8c9698f5e0e62a61979319e382c8ac3 to your computer and use it in GitHub Desktop.

Select an option

Save autocorr/b8c9698f5e0e62a61979319e382c8ac3 to your computer and use it in GitHub Desktop.
Create histograms of DPDFs
# Here we import the numpy module to use arrays, and matplotlib for plotting.
# Pyplot is a sub-module of matplotlib designed to be a good interface for
# interactive plotting, matplotlib in general contains many sub-modules defining
# all sorts of behavior, from colors, fonts, to how tickmarks are rendered. We
# use the "as" keyword to give the modules an alias, so we don't import literally
# thousands of function names into the scope of the program (in IPython just type
# "np.<tab>" to see how many symbols are defined.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
# This assumes you have put your code to load the dataframe of the catalog into
# the program scope as "df". Something like
# df = pd.read_csv('foo.csv')
# or something
# Here we just want to plot the distances of clumps that actually have well-resolved
# distances, and not include all the others where it's 50/50 whether they are at
# near or far distance. We can do this via the method "query" on df:
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html
# https://pandas.pydata.org/pandas-docs/stable/indexing.html
# This is the bread-and-butter of pandas, querying and transforming data, along with
# merging different datasets.
good_df = df.query('kdar in ["N", "F"]')
# This selects the rows of the dataframe where the column "kdar" has a value that
# is contained in the list of two strings "N" and "F", which are flags for
# near and far distance resolutions.
# Next we create an instance of the Figure and Axis classes in matplotlib, the
# essential entities that contain all the stuff (Figure), and an Axis which is
# basically the white square part with the plot, ticks, and labels. A Figure
# instance can contain multiple Axis objects to make sub-plots (such as 2x2 for
# a grid of four plots in the same Figure). It's implicit that if you don't pass
# any arguments to subplots that you only want one Axis object in the figure
# (number of plot rows = 1, number of plot columns = 1).
fig, ax = plt.subplots()
# A histogram requires us to count the number of values within a range. If we don't
# pass a list of explicit bins to use, the histogram function will just divide the
# data [min,max] range by 30 or something. We can use the numpy function for a linear
# sequence divided into N-steps, linspace. Assuming that this is in pc not kpc, divide
# by a thousand if so.
# Divide the range of numbers between 0 and 20,000 into 51 intervals.
# We use the 51 because Python defines ranges as inclusive-left, exclusive right [l,r)
bins = np.linspace(0, 2e4, 51)
# => 0, 500, 1000, ...
# Now we call the histogram method in order to draw a histogram plot onto the Axis
# (ie the white square part). The dataframe good_df exposes all of the columns as
# attributes accessible with the dot (attributes are attached values, methods are
# attached functions). We also pass the bins to tell it what intervals we want
# to divide this data set into. The first is a positional argument, the histogram
# method is expecting the first thing to be some kind of a list of data. Other
# arguments are optional and can be referenced by name, such as we're doing with
# "bins" here, I chose to use same name for my variable as the parameter, so we
# end up with the somewhat curious looking bins=bins. You can see the full list
# of parameters/keyword-arguments that you can pass by typing "ax.hist?" into a
# jupyter notebook or ipython terminal. matplotlib also includes "plt.hist" to do
# the same thing, where it will implicitly figure out what the last Axis you drew
# to was and use that one. Using the methods on ax is the preferred style, because
# you are being explicit on what you are drawing where (onto this instance, "ax")
# not whichever was called last. This is part of a general principle of programming
# in Python, that "explicit is better than implicit". You can see other proverbs
# in the Zen of Python by typing "import this" :)
ax.hist(good_df.dist_ml, bins=bins)
# Save the figure to a PDF file, assuming you have a folder in your current
# directory called "plots" from which to write a new file into.
fig.savefig('plots/resolved_distances_hist.pdf')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment