Last active
November 9, 2017 17:50
-
-
Save autocorr/b8c9698f5e0e62a61979319e382c8ac3 to your computer and use it in GitHub Desktop.
Create histograms of DPDFs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Here we import the numpy module to use arrays, and matplotlib for plotting. | |
| # Pyplot is a sub-module of matplotlib designed to be a good interface for | |
| # interactive plotting, matplotlib in general contains many sub-modules defining | |
| # all sorts of behavior, from colors, fonts, to how tickmarks are rendered. We | |
| # use the "as" keyword to give the modules an alias, so we don't import literally | |
| # thousands of function names into the scope of the program (in IPython just type | |
| # "np.<tab>" to see how many symbols are defined. | |
| import numpy as np | |
| import pandas as pd | |
| from matplotlib import pyplot as plt | |
| # This assumes you have put your code to load the dataframe of the catalog into | |
| # the program scope as "df". Something like | |
| # df = pd.read_csv('foo.csv') | |
| # or something | |
| # Here we just want to plot the distances of clumps that actually have well-resolved | |
| # distances, and not include all the others where it's 50/50 whether they are at | |
| # near or far distance. We can do this via the method "query" on df: | |
| # https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html | |
| # https://pandas.pydata.org/pandas-docs/stable/indexing.html | |
| # This is the bread-and-butter of pandas, querying and transforming data, along with | |
| # merging different datasets. | |
| good_df = df.query('kdar in ["N", "F"]') | |
| # This selects the rows of the dataframe where the column "kdar" has a value that | |
| # is contained in the list of two strings "N" and "F", which are flags for | |
| # near and far distance resolutions. | |
| # Next we create an instance of the Figure and Axis classes in matplotlib, the | |
| # essential entities that contain all the stuff (Figure), and an Axis which is | |
| # basically the white square part with the plot, ticks, and labels. A Figure | |
| # instance can contain multiple Axis objects to make sub-plots (such as 2x2 for | |
| # a grid of four plots in the same Figure). It's implicit that if you don't pass | |
| # any arguments to subplots that you only want one Axis object in the figure | |
| # (number of plot rows = 1, number of plot columns = 1). | |
| fig, ax = plt.subplots() | |
| # A histogram requires us to count the number of values within a range. If we don't | |
| # pass a list of explicit bins to use, the histogram function will just divide the | |
| # data [min,max] range by 30 or something. We can use the numpy function for a linear | |
| # sequence divided into N-steps, linspace. Assuming that this is in pc not kpc, divide | |
| # by a thousand if so. | |
| # Divide the range of numbers between 0 and 20,000 into 51 intervals. | |
| # We use the 51 because Python defines ranges as inclusive-left, exclusive right [l,r) | |
| bins = np.linspace(0, 2e4, 51) | |
| # => 0, 500, 1000, ... | |
| # Now we call the histogram method in order to draw a histogram plot onto the Axis | |
| # (ie the white square part). The dataframe good_df exposes all of the columns as | |
| # attributes accessible with the dot (attributes are attached values, methods are | |
| # attached functions). We also pass the bins to tell it what intervals we want | |
| # to divide this data set into. The first is a positional argument, the histogram | |
| # method is expecting the first thing to be some kind of a list of data. Other | |
| # arguments are optional and can be referenced by name, such as we're doing with | |
| # "bins" here, I chose to use same name for my variable as the parameter, so we | |
| # end up with the somewhat curious looking bins=bins. You can see the full list | |
| # of parameters/keyword-arguments that you can pass by typing "ax.hist?" into a | |
| # jupyter notebook or ipython terminal. matplotlib also includes "plt.hist" to do | |
| # the same thing, where it will implicitly figure out what the last Axis you drew | |
| # to was and use that one. Using the methods on ax is the preferred style, because | |
| # you are being explicit on what you are drawing where (onto this instance, "ax") | |
| # not whichever was called last. This is part of a general principle of programming | |
| # in Python, that "explicit is better than implicit". You can see other proverbs | |
| # in the Zen of Python by typing "import this" :) | |
| ax.hist(good_df.dist_ml, bins=bins) | |
| # Save the figure to a PDF file, assuming you have a folder in your current | |
| # directory called "plots" from which to write a new file into. | |
| fig.savefig('plots/resolved_distances_hist.pdf') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment