Important Methods/Attributes - Psych Project

DatasetAllocator:

Helps create the flattened representation of the timeseries tensor from the raw files

DatasetAllocator.record: 1.

extract_timeseries_dataset.main: 1.

TimeseriesDataset:

Time series tensor representation of the dataset with dimensions (num_patients, num_timesteps, num_codes)
Each entry of tensor stores sums of patient codecounts binned into 2-week intervals

TimeseriesDataset.fact_counts / DatasetAllocator.fact_counts / fact_counts.npy:

Stores the actual count data in a flattened representation (i.e. it's a 1D array).
It is created by iterating through the patients and saving the counts for each (timestep, code) observed (sorted by time)
To reconstruct the counts, we would also need the fact_codes and fact_steps arrays.

TimeseriesDataset.fact_codes / DatasetAllocator.fact_codes / fact_codes.npy:

Stores the fact codes corresponding to the counts in fact_counts array.
It has values in [0, total_concepts] = [0, ~107k]. These values are mapped to original code names using ConceptDefs.concept_idx method
Shape is the same as fact_counts

TimeseriesDataset.fact_steps / DatasetAllocator.fact_steps / fact_steps.npy:

Stores the time steps corresponding to the counts in fact_counts array.
It has values in [-2, 4] years = [-51,104] fortnights (0 corresponds to fortnight of first MDD diagnosis for the patient)
Shape is the same as fact_counts

TimeseriesDataset.as_csr:

Create a sparse 2D matrix representing all patient facts w/ caching.
Really should be a 3D matrix of (patient,code,time), but scipy.sparse only supports 2D, so instead it's (patient x time,code)
Contiguous blocks of {rows_per_patient = 6 years = 156} rows represent neighboring fortnights for individual patients
Uses the 1D arrays [fact_counts, fact_codes, fact_steps] to know which entry to populate with what
Internally, it 1. remaps step values from [-51, 104] to [0, 155], 2. Offsets them to match the patient row in 2D matrix, 3. Uses the remapped steps and codes as rows and cols for the sparse matrix

TimeseriesDataset.two_week_antidepressants:

Subsets the timeseries tensor to 1. only [2 years after first MDD diagnosis ~= 60 fortnights] and 2. only antidepressant indices
Returns a 3D tensor of shape [n_patients, 60, n_antidepressant_codes]

TimeseriesDataset.three_month_antidepressants:

Same subset as above (60 fortnights + antidepressants) but the time window for counts is now [3 months = 6 fortnights] wide (resulting in 10 three-month time blocks in 2 years).
Returns a 3D tensor of shape [n_patients, 10, n_antidepressant_codes]

TimeseriesDataset.six_month_antidepressants:

Same subset as above (60 fortnights + antidepressants) but the time window for counts is now [6 months = 12 fortnights] wide (resulting in 5 six-month time blocks in 2 years).
Returns a 3D tensor of shape [n_patients, 5, n_antidepressant_codes]

TimeseriesDataset.count_representation:

Subsets the timeseries tensor to [2 years before first MDD diagnosis = 51 fortnights] and returns total counts of codes for this period.
Returns a 2D tensor of shape [n_patients, n_codes]
I feel this is confusing notation

TimeseriesDataset.count_representation_6_mos_before:

Same as TimeseriesDataset.count_representation but now restricted to [6 months before first MDD diagnosis].
Returns a 2D tensor of shape [n_patients, n_codes]

TimeseriesDataset._counts_and_dems_without_antidepressants:

Subsets the TimeseriesDataset.count_representation matrix by excluding the antidepressant codes, and appends patient demographic info to it.
Returns a 2D tensor of shape [n_patients, n_codes - n_antidepressant_codes + n_dem_features]

TimeseriesDataset._counts_dems_and_ages_without_antidepressants:

Same as TimeseriesDataset._counts_and_dems_without_antidepressants but it also appends patient age as a feature
Same shape as TimeseriesDataset._counts_and_dems_without_antidepressants except that it has 1 more col

TimeseriesDataset._counts_dems_and_ages_without_antidepressants_6_mos_before:

Analogous function to TimeseriesDataset._counts_dems_and_ages_without_antidepressants but uses TimeseriesDataset.count_representation_6_mos_before

TimeseriesDatasetExtended.two_week_non_AD_collapsed_codes:

Analogous to TimeseriesDataset.two_week_antidepressants except:
1. This excludes antidepressants.
2. Sums the counts of all the included codes for each time block
Returns an (expanded) 3D tensor of shape [n_patients, 60, 1]

TimeseriesDatasetExtended.two_week_psych_codes:

Analogous to TimeseriesDataset.two_week_antidepressants except that this includes psych codes using the ConceptDefs.is_psych_code filter.
Returns a 3D tensor of shape [n_patients, 60, n_psych_codes]

TimeseriesDatasetExtended.build_outcomes:

This method defines several outcomes for each patient (all are 1D arrays of shape [n_patients]).
The count data used is TimeseriesDataset.two_week_antidepressants, TimeseriesDatasetExtended.two_week_non_AD_collapsed_codes and TimeseriesDatasetExtended.two_week_psych_codes.
These counts all correspond to 2 years from the index prescription. Also a prediction_time_window_in_fortnights variable is defined to be [3 months = 6 fortnights] (call that variable W in the below discussion)
Outcome defined are:
1. switched_treatment: The logic is to identify whether an AD treatment changed during the first W (i.e. 3 months). It does so by first finding timesteps where a unique treatment was provided (some code count > 0), and returns True if there are more than 1 sure timesteps.
2. antidepressant_afterwards: Identifies whether an AD treatment was provided to patient in [W, 2W] / [3 month - 6 month period]. Does so by summing all counts at each step and checking if any timestep with count > 0 exists.
3. same_treatment_afterwards: Computes unique AD treatment timesteps in [0, W] and [W, 2W] windows and checks if these two match. One thing to note is that this method doesn't check if only one treatment is provided in the window.
4. remains_in_care: If any non-AD code exists in [W, 2W] window, return True.
5. psych_afterwards: If any psych code exists in [W, 2W] window, return True.
6. stable_treatment: Computed as (not switched_treatment) and (same_treatment_afterwards), i.e. the patient sticks to the same treatment in the [0, 2W] window.
7. dropped_treatment: Computed as (not antidepressant_afterwards) and (remains_in_care and (not psych_afterwards)), i.e. the patient 1. didn't receive an AD or psych code but received a non-AD code in [W,2W] window (which means they dropped treatment but not going to hospital).

shaabhishek/psych_doc.md