Skip to content

Instantly share code, notes, and snippets.

@lokeshh
Created July 13, 2016 01:36
Show Gist options
  • Select an option

  • Save lokeshh/f62dedbd183bee283f0e6efe542eb8ab to your computer and use it in GitHub Desktop.

Select an option

Save lokeshh/f62dedbd183bee283f0e6efe542eb8ab to your computer and use it in GitHub Desktop.
> try = Daru::DataFrame.from_csv 'try.csv'
> Statsample::GLM.compute try, 'y', :logistic
ExceptionForMatrix::ErrNotRegular: Not Regular Matrix
from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/backports-3.6.8/lib/backports/1.9.2/stdlib/matrix.rb:933:in `block in inverse_from'
c_yes b y
0 62.1 0
1 74.7 1
0 69.7 1
0 71 0
0 56.9 1
0 58.7 0
0 63.3 0
1 70.4 1
0 70.5 1
0 59.2 0
0 76.4 0
0 71.7 0
1 57.5 1
0 61.1 1
@agisga
Copy link

agisga commented Jul 13, 2016

I think that's some weird problem with statsample-glm...

For example if I do

> try = Daru::DataFrame.from_csv 'try.csv'
> try["c_yes"][0] = 1.0    # change just oone value!
> mod = Statsample::GLM.compute try, 'y', :logistic
> mod.coefficients    # same coefficient estimates as computed in R!
=> 
#<Daru::Vector:47381151439340 @name = nil @metadata = {} @size = 2 >
                                      nil
                   0   1.5085866237794123
                   1 -0.00618870762797073

then everything works fine...

By the way, R does not have any problem with these data:

dat <- read.csv("try.csv")
mod <- glm(y ~ 0 + c_yes + b, dat, family = "binomial")
summary(mod)
# 
# Call:
# glm(formula = y ~ 0 + c_yes + b, family = "binomial", data = dat)
# 
# Deviance Residuals: 
#     Min       1Q   Median       3Q      Max  
# -0.9668  -0.9448  -0.4537   1.0475   1.4503  
# 
# Coefficients:
#         Estimate Std. Error z value Pr(>|z|)
# c_yes  1.916e+01  3.762e+03   0.005    0.996
# b     -8.822e-03  9.566e-03  -0.922    0.356
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
#     Null deviance: 19.408  on 14  degrees of freedom
# Residual deviance: 14.361  on 12  degrees of freedom
# AIC: 18.361
# 
# Number of Fisher Scoring iterations: 17
# 

However, notice the huge p-value of 0.996 in the R output. This means that there is almost no linear relationship between b and y. That might be related to the problem that statsample-glm has...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment