mahynski / research_thermodynamic_extrapolation.md

Last active November 4, 2024 07:14

Using Thermodynamic Extrapolation to Accurately Predict Fluid Properties #research

Data Science

mahynski / examples_bokeh_periodic_table.ipynb

Last active October 5, 2023 17:11

Create an interactive periodic table using Bokeh

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

mahynski / notes_conda_colab.md

Last active October 16, 2023 16:06

Colab vs. Conda #notes

tl;dr

Google colab is essentially a web-based Jupyter notebook. It links to GitHub and/or a Google Drive account allowing you to save your work in the cloud and run on google's servers. It also gives you free access to GPU and TPU resources. Google provides a number of introductions to Colab, but if you are already familiar with python and Jupyter (and markdown) you are ready to go.

One purpose of this guide is to be a < 5 minute quick-start to get going so you can use and write (python) code, do data science, machine learning, etc. on-the-go or from whatever remote location you please.

Colab also lets you work together (as the name suggests) with other, the same as any other google document, for example.

In contrast, if you don't work remotely (from your resources) you may prefer to configure things locally. Working locally alo has many other benefits, and is still

mahynski / notes_jupyter_notebooks.md

Last active October 16, 2023 16:07

Best Practices for Jupyter Notebooks #notes

tl;dr

This is an opinionated guide to using Jupyter notebooks, and it is certainly not the first. Google even has a manifesto on the subject. Depending on the application, clearly tastes will differ. I generally do python prototyping to develop scientific code and workflows using Jupyter. This means that functions and classes are usually initially spun up and then, when basically functional, converted to scripts or built into existing libraries for proper version control, unittesting, and issue tracking. There is ipytest for testing inside the notebook, but I recommend outsourcing to a repo using pre-commit, for example, so this can be tracked with git. So the notebook is primarily for (1) initial testing and sandbox-level development, or (2) for scripting these toge

mahynski / notes_hyperopt_sklearn.md

Last active October 16, 2023 16:07

Using hyperopt-sklearn #notes

tl;dr

hyperopt-sklearn is a python package that serves as a wrapper around hyperopt that extends its functionality to scikit-learn that can perform automatic hyperparameter tuning and other pipeline optimization. It is based on hyperopt and was originally presented in Komer B., Bergstra J., and Eliasmith C. "Hyperopt-Sklearn: automatic hyperparameter configuration for Scikit-learn" Proc. SciPy 2014. This is an AutoML framework though it is not quite as powerful as others like TPOT or auto-sklearn; however, it can be more transparent and easier to extract optimized models from. The primary advantage that I find with this tool is its simplicity and how ea

mahynski / notes_explainer_dashboard.md

Last active October 16, 2023 16:07

Configuring ExplainerDashboard for SHAP #notes

tl;dr

Modern computational software and hardware have made it relatively easy to process data and train machine learning models; however, implementing these models requires trust of the end user, which means explainable AI methods, such as SHAP, need to be leveraged. In the realm of scientific research another common problem is that data science/analysis is often performed by one (or a team of) individual(s), but the audience or collaborators are those with more detailed scientific knowledge of a problem and less expertise on the data science end. In order to collaborate and/or present the results to a more general audience, an interactive visualization tool is needed. This enables scientists and engineers to absorb a model, perform 'what-if" analyses to test its limits, and extract scientific knowledge. To that end, several dashboard tools have been developed. This tutorial focuses on setting up ExplainerDashboard

mahynski / notes_jax_numba.md

Last active January 3, 2025 02:15

Using Numba and JAX to Accelerate Python Code #notes

tl;dr

Numba is a "just in time" (jit) compiler for python. It is designed to optimize floating point operations and loops which makes it ideal for scientific code. The simplest way to use it is to decorate you function as shown below, which instructs Numba to compile the function into fast machine code the first time it is called.

from numba import jit

@jit(nopython=True)
def dotproduct(v1, v2):
	isum = 0.0

Nathan A. Mahynski mahynski

Table of Contents

Table of Contents

Table of Contents

Data Science

tl;dr

tl;dr

tl;dr

tl;dr

tl;dr