Skip to content

Instantly share code, notes, and snippets.

View isteves's full-sized avatar

Irene Steves isteves

View GitHub Profile
@isteves
isteves / pkg_db_connection.md
Last active January 10, 2021 09:20
Managing a DB connection in an R package

In our department, there's almost always just a single database that we want to connect to. Thus, managing the connection throughout our code quickly becomes annoying and redundant:

conn <- odbc::dbConnect(odbc::odbc(), ...)

dbGetQuery(conn, statement1)
dbGetQuery(conn, statement2)
dbGetQuery(conn, statement3)
@isteves
isteves / glue_in_function.md
Created January 26, 2021 19:31
Using glue::glue() inside of another function

Using glue inside of another function

The key is defining an environment!

test_glue <- function(cmd, e = parent.frame()) {
  crayon::red(glue::glue(cmd, .envir = e))
}

test_fxn &lt;- function (name) {
@isteves
isteves / neo4j.md
Last active October 31, 2021 17:42
neo4j learnings

Undirected: (a)-[r]-(b) Directed: (a)-[r]->(b) where a and b are nodes and r is the relationship (link) between them

In the following call, the curly brackets are for extra parameters (json form). CALL apoc.import.graphml("file://graph.graphml", {}) CALL apoc.import.graphml("file://graph.graphml", {readLabels: true})

There are properties and labels. Labels are what you can see as different colors in neo4j, and is defined in a graphml file as shown below (see ":Person"). Properties are other attributes that you can query by, such as age ("> 30 years old").

@isteves
isteves / resources.md
Last active January 25, 2022 09:39
Resource collection
@isteves
isteves / pyspark_tricks.md
Last active May 25, 2022 11:40
PySpark tricks

PySpark tricks

"Exploding" aggregations

If you want to do the same aggregation to many columns you can write it this way to be more succinct:

cols_min = ["size", "age"]

df \
@isteves
isteves / tidyverse2pyspark.md
Last active February 28, 2023 13:34
tidyverse2pyspark_translation

Tidyverse to pyspark translations

Adding count of a column as a new column

df %>% add_count(some_col)
df.withColumn("n", count("*").over(Window.partitionBy("some_col")))