Skip to content

Instantly share code, notes, and snippets.

@spolu
Last active August 29, 2015 14:06
Show Gist options
  • Save spolu/c0c91eb51270ff8a8087 to your computer and use it in GitHub Desktop.
Save spolu/c0c91eb51270ff8a8087 to your computer and use it in GitHub Desktop.
/**********************************************************************/
/* count_: the number of occurrences of the feature (n-gram or word) */
/* all_count: training set size */
/* f_count_: the number of occurrences of `feature & female` */
/* m_count_: the number of occurrences of `feature & male` */
/* all_count_f: number of occurrences of `female` */
/* all_count_m: number of occurrences of `male` */
/**********************************************************************/
/* p(feature) */
double p_ = (double)count_ / all_count;
/* p(feature & gender) */
double p_feature_f = (double)f_count_ / all_count;
double p_feature_m = (double)m_count_ / all_count;
/* mutual information */
mi_ = p_feature_f * log(max(p_feature_f, 0.001) / (((double)all_count_f / all_count) * p_)) +
p_feature_m * log(max(p_feature_m, 0.001) / (((double)all_count_m / all_count) * p_));
/* p(gender|feature) */
p_f_c_ = p_feature_f / p_;
p_m_c_ = p_feature_m / p_;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment