Skip to content

Instantly share code, notes, and snippets.

View simicd's full-sized avatar

Dejan Simic simicd

View GitHub Profile
// useFetch.ts
// ...
export const useFetch = ({ url, init }: RequestProps) => {
// Response state
const [data, setData] = useState<DogImageType>();
useEffect(() => {
// Define asynchronous function
// useFetch.ts
import { useState, useEffect } from "react";
interface RequestProps {
url: RequestInfo;
init?: RequestInit;
}
type DogImageType = { message: string; status: string };
// DogImage.tsx
import React, { FC, useState, useEffect } from "react";
export const DogImage: FC = () => {
type DogImageType = { message: string; status: string };
const [data, setData] = useState<DogImageType>();
useEffect(() => {
// Define asynchronous function - since useEffect hook can't handle async directly,
@simicd
simicd / Dog Image API.postman_collection.json
Last active August 14, 2020 10:59
Postman array iteration
{
"info": {
"_postman_id": "c18ab42d-2677-4ede-b043-99535f4da9f6",
"name": "Dog Image API",
"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
},
"item": [
{
"name": "Dog API - Loop through breeds",
"event": [
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# Measure initial memory consumption
memory_init = psutil.Process(os.getpid()).memory_info().rss >> 20
# Read csv
col_csv = pd.read_csv("penguin-dataset.csv")["Flipper Length (mm)"]
memory_post_csv = psutil.Process(os.getpid()).memory_info().rss >> 20
# Read parquet
# Read csv and calculate mean
%%timeit
pd.read_csv("penguin-dataset.csv")["Flipper Length (mm)"].mean()
# Read parquet and calculate mean
%%timeit
pd.read_parquet("penguin-dataset.parquet", columns=["Flipper Length (mm)"]).mean()
# Write to csv
df.to_csv("penguin-dataset.csv")
# Write to parquet
df.to_parquet("penguin-dataset.parquet")
# Write to Arrow
# Convert from pandas to Arrow
table = pa.Table.from_pandas(df)
# Write out to file
## Read Palmer Station Penguin dataset from GitHub
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/allisonhorst/"
"palmerpenguins/47a3476d2147080e7ceccef4cf70105c808f2cbf/"
"data-raw/penguins_raw.csv")
# Increase dataset to 1m rows and reset index
df = df.sample(1_000_000, replace=True).reset_index(drop=True)
# Update sample number (0 to 999'999)
@simicd
simicd / spark_tips_and_tricks.md
Created February 14, 2020 20:58 — forked from dusenberrymw/spark_tips_and_tricks.md
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the