Skip to content

Instantly share code, notes, and snippets.

@joshbode
Created February 7, 2013 07:01
Show Gist options
  • Select an option

  • Save joshbode/4729071 to your computer and use it in GitHub Desktop.

Select an option

Save joshbode/4729071 to your computer and use it in GitHub Desktop.
fix_numbers = function(x) {
# strip out irrelevant characters
x = gsub('[^a-zA-Z0-9. ]', '', x)
# map SI units
si_map = c('k'=10^3, 'M'=10^6, 'G'=10^9)
for (unit in names(si_map)) {
zeros = paste0(rep('0', log10(si_map[unit])), collapse='')
x = sub(paste0('(?<=\\d)', unit, '$'), zeros, x, ignore.case=TRUE, perl=TRUE)
}
# detect GST
gst_re = '\\GST$'
with_gst = grep(gst_re, x, ignore.case=TRUE)
x = sub(gst_re, '', x)
# NA-out any non-digit remnants and convert
x[!grepl('^\\d+(\\.\\d+)?$', x)] = NA
x = as.double(x)
# apply GST
gst = 0.10
x[with_gst] = (1 + gst) * x[with_gst]
return(x)
}
x = c('1,234,545,655.19', '$150K', '$1M', '$324G', ') $134234', 'This is not real', '$123412 + GST')
fix_numbers(x)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment