Skip to content

Instantly share code, notes, and snippets.

Last active January 31, 2022 16:33
Show Gist options
  • Save shaabhishek/a3c83b35c772e9544527c38972ff5ef6 to your computer and use it in GitHub Desktop.
Save shaabhishek/a3c83b35c772e9544527c38972ff5ef6 to your computer and use it in GitHub Desktop.
Important Methods/Attributes - Psych Project


  1. Helps create the flattened representation of the timeseries tensor from the raw files

DatasetAllocator.record: 1.

extract_timeseries_dataset.main: 1.


  1. Time series tensor representation of the dataset with dimensions (num_patients, num_timesteps, num_codes)
  2. Each entry of tensor stores sums of patient codecounts binned into 2-week intervals

TimeseriesDataset.fact_counts / DatasetAllocator.fact_counts / fact_counts.npy:

  1. Stores the actual count data in a flattened representation (i.e. it's a 1D array).
  2. It is created by iterating through the patients and saving the counts for each (timestep, code) observed (sorted by time)
  3. To reconstruct the counts, we would also need the fact_codes and fact_steps arrays.

TimeseriesDataset.fact_codes / DatasetAllocator.fact_codes / fact_codes.npy:

  1. Stores the fact codes corresponding to the counts in fact_counts array.
  2. It has values in [0, total_concepts] = [0, ~107k]. These values are mapped to original code names using ConceptDefs.concept_idx method
  3. Shape is the same as fact_counts

TimeseriesDataset.fact_steps / DatasetAllocator.fact_steps / fact_steps.npy:

  1. Stores the time steps corresponding to the counts in fact_counts array.
  2. It has values in [-2, 4] years = [-51,104] fortnights (0 corresponds to fortnight of first MDD diagnosis for the patient)
  3. Shape is the same as fact_counts


  1. Create a sparse 2D matrix representing all patient facts w/ caching.
  2. Really should be a 3D matrix of (patient,code,time), but scipy.sparse only supports 2D, so instead it's (patient x time,code)
  3. Contiguous blocks of {rows_per_patient = 6 years = 156} rows represent neighboring fortnights for individual patients
  4. Uses the 1D arrays [fact_counts, fact_codes, fact_steps] to know which entry to populate with what
  5. Internally, it 1. remaps step values from [-51, 104] to [0, 155], 2. Offsets them to match the patient row in 2D matrix, 3. Uses the remapped steps and codes as rows and cols for the sparse matrix


  1. Subsets the timeseries tensor to 1. only [2 years after first MDD diagnosis ~= 60 fortnights] and 2. only antidepressant indices
  2. Returns a 3D tensor of shape [n_patients, 60, n_antidepressant_codes]


  1. Same subset as above (60 fortnights + antidepressants) but the time window for counts is now [3 months = 6 fortnights] wide (resulting in 10 three-month time blocks in 2 years).
  2. Returns a 3D tensor of shape [n_patients, 10, n_antidepressant_codes]


  1. Same subset as above (60 fortnights + antidepressants) but the time window for counts is now [6 months = 12 fortnights] wide (resulting in 5 six-month time blocks in 2 years).
  2. Returns a 3D tensor of shape [n_patients, 5, n_antidepressant_codes]


  1. Subsets the timeseries tensor to [2 years before first MDD diagnosis = 51 fortnights] and returns total counts of codes for this period.
  2. Returns a 2D tensor of shape [n_patients, n_codes]
  3. I feel this is confusing notation


  1. Same as TimeseriesDataset.count_representation but now restricted to [6 months before first MDD diagnosis].
  2. Returns a 2D tensor of shape [n_patients, n_codes]


  1. Subsets the TimeseriesDataset.count_representation matrix by excluding the antidepressant codes, and appends patient demographic info to it.
  2. Returns a 2D tensor of shape [n_patients, n_codes - n_antidepressant_codes + n_dem_features]


  1. Same as TimeseriesDataset._counts_and_dems_without_antidepressants but it also appends patient age as a feature
  2. Same shape as TimeseriesDataset._counts_and_dems_without_antidepressants except that it has 1 more col


  1. Analogous function to TimeseriesDataset._counts_dems_and_ages_without_antidepressants but uses TimeseriesDataset.count_representation_6_mos_before


  1. Analogous to TimeseriesDataset.two_week_antidepressants except:
    1. This excludes antidepressants.
    2. Sums the counts of all the included codes for each time block
  2. Returns an (expanded) 3D tensor of shape [n_patients, 60, 1]


  1. Analogous to TimeseriesDataset.two_week_antidepressants except that this includes psych codes using the ConceptDefs.is_psych_code filter.
  2. Returns a 3D tensor of shape [n_patients, 60, n_psych_codes]


  1. This method defines several outcomes for each patient (all are 1D arrays of shape [n_patients]).
  2. The count data used is TimeseriesDataset.two_week_antidepressants, TimeseriesDatasetExtended.two_week_non_AD_collapsed_codes and TimeseriesDatasetExtended.two_week_psych_codes.
  3. These counts all correspond to 2 years from the index prescription. Also a prediction_time_window_in_fortnights variable is defined to be [3 months = 6 fortnights] (call that variable W in the below discussion)
  4. Outcome defined are:
    1. switched_treatment: The logic is to identify whether an AD treatment changed during the first W (i.e. 3 months). It does so by first finding timesteps where a unique treatment was provided (some code count > 0), and returns True if there are more than 1 sure timesteps.
    2. antidepressant_afterwards: Identifies whether an AD treatment was provided to patient in [W, 2W] / [3 month - 6 month period]. Does so by summing all counts at each step and checking if any timestep with count > 0 exists.
    3. same_treatment_afterwards: Computes unique AD treatment timesteps in [0, W] and [W, 2W] windows and checks if these two match. One thing to note is that this method doesn't check if only one treatment is provided in the window.
    4. remains_in_care: If any non-AD code exists in [W, 2W] window, return True.
    5. psych_afterwards: If any psych code exists in [W, 2W] window, return True.
    6. stable_treatment: Computed as (not switched_treatment) and (same_treatment_afterwards), i.e. the patient sticks to the same treatment in the [0, 2W] window.
    7. dropped_treatment: Computed as (not antidepressant_afterwards) and (remains_in_care and (not psych_afterwards)), i.e. the patient 1. didn't receive an AD or psych code but received a non-AD code in [W,2W] window (which means they dropped treatment but not going to hospital).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment