Perform imports of required packages. Convert the contents of daily maximum and minimum BOM data into pandas dataframes. Concatenate individual dataframes into a single dataframe for mean maximum and mean minimum temperatures. Data source: http://www.bom.gov.au/climate/data/index.shtml
from great_tables import GT, md, html, system_fonts | |
import pandas as pd | |
power_cie_prepared_tbl = pd.read_csv("./data/2023_cie_power_cons.csv") | |
# Create a Great Tables object | |
ciep_gt_tbl = GT(data=power_cie_prepared_tbl) | |
# Apply wider color ranges & formatting | |
gt_tbl = ciep_gt_tbl \ |
from great_tables import GT, md, html, system_fonts | |
import pandas as pd | |
power_cie_prepared_tbl = pd.read_csv("./data/2023_cie_power_cons.csv") | |
# Create a Great Tables object | |
ciep_gt_tbl = GT(data=power_cie_prepared_tbl) | |
# Apply wider color ranges & formatting | |
gt_tbl = ciep_gt_tbl \ |
# Load libraries ---- | |
library(tidyverse) | |
library(tidyquant) | |
library(tidytext) | |
library(showtext) | |
showtext_opts(dpi = 300) | |
showtext_auto(enable = TRUE) | |
font_add_google("Fira Sans Condensed", "fira sans") |
library(tidyverse) | |
library(scales) | |
# Read in data and clean (column) names | |
# Source: https://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-world-energy/downloads.html | |
# Below assumes you save the CSV in a folder called DATA | |
bp_wide <- read_csv("DATA/bp-stats-review-2022-consolidated-dataset-panel-format.csv", | |
name_repair = ~ janitor::make_clean_names(., case = "snake")) | |
bp_wide %>% glimpse() |
# Libraries ---- | |
library(tidyverse) | |
library(tidyquant) | |
library(timetk) | |
library(formattable) | |
library(scales) | |
# library(fredr) | |
library(gt) |
In this project I have attempted to create supervised learning models to assist in classifying certain employee data. The classes to predict are as follows:
- Active - the employee is still in their role
- Non-active - the employee has resigned
I pre-processed the data by removing one outlier and producing new features in Excel as the data set was small at 1056 rows. Some categorical features were also converted to numeric values in Excel. For example, Gender was originally "M" or "F", which was converted to 0 and 1 respectively. I also removed employee number as it provides no value as a feature and could compromise privacy.
After doing some research, see References, I found that the scikit-learn library does not handle categorical (string) features correctly in Decision Trees using the above approach. When added, these features provided no increase in accuracy, so I removed them. For example; Department, some departments have a highe
This map is where I live, so it will be interesting what the database querying reveals.
Problems encountered with the map I created several small samples of the data and worked with the smallest one most of the time, until I had my auditing procedures working correctly. Using the code snippet provided in Project Details, I changed the k size and created samples called sampleK10.osm, sampleK25.osm, sampleK35.osm and sampleK100.osm. Once I had sampleK100.osm working correctly I progressively moved up in size. The problems I came across are listed below.