Skip to content

Instantly share code, notes, and snippets.

View c1au6i0's full-sized avatar

Claudio Zanettini c1au6i0

  • Baltimore, MD
  • 12:38 (UTC -12:00)
View GitHub Profile
@ericmjl
ericmjl / ds-project-organization.md
Last active May 30, 2025 20:38
How to organize your Python data science project

UPDATE: I have baked the ideas in this file inside a Python CLI tool called pyds-cli. Please find it here: https://github.com/ericmjl/pyds-cli

How to organize your Python data science project

Having done a number of data projects over the years, and having seen a number of them up on GitHub, I've come to see that there's a wide range in terms of how "readable" a project is. I'd like to share some practices that I have come to adopt in my projects, which I hope will bring some organization to your projects.

Disclaimer: I'm hoping nobody takes this to be "the definitive guide" to organizing a data project; rather, I hope you, the reader, find useful tips that you can adapt to your own projects.

Disclaimer 2: What I’m writing below is primarily geared towards Python language users. Some ideas may be transferable to other languages; others may not be so. Please feel free to remix whatever you see here!

@seandavi
seandavi / TCGAtranslateID.R
Last active January 8, 2024 21:12
Translate GDC file_ids to TCGA barcodes
library(GenomicDataCommons)
library(magrittr)
TCGAtranslateID = function(file_ids) {
info = files() %>%
GenomicDataCommons::filter( ~ file_id %in% file_ids) %>%
GenomicDataCommons::select('cases.samples.submitter_id') %>%
results_all()
# The mess of code below is to extract TCGA barcodes
# id_list will contain a list (one item for each file_id)
@parmentf
parmentf / GitCommitEmoji.md
Last active June 6, 2025 16:10
Git Commit message Emoji