Skip to content

Instantly share code, notes, and snippets.

@fluffywaffles
Created May 10, 2016 05:23
Show Gist options
  • Save fluffywaffles/346994216eaefe45c443c3c805980ccf to your computer and use it in GitHub Desktop.
Save fluffywaffles/346994216eaefe45c443c3c805980ccf to your computer and use it in GitHub Desktop.
def bias_replace_missing_with_avg(data_set):
'''
For some reason, this is my longest function.
It's all the partitioning and partition undoing.
Replace 'None' values (missing values) with the average of all existing
values for that attribute.
'''
notNone = lambda x: x is not None
partitioned_by_attr = [
(mean(filter(notNone, attr_values)), attr_values)
for attr_values in [ getall(data_set, attr) for attr in
range(len(data_set[0])) ]
]
filled_in = [
[ avg if value is None else value for value in values ]
for (avg, values) in partitioned_by_attr
]
return [
[ biased_attr_values[i] for biased_attr_values in filled_in ]
for i in range(len(data_set))
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment