We're noticing two major things:
- Occasionally, the prediction service returns illegal
NaN
values. This should never happen. If there's some problem with the input data it should fail immediately and raise an exception, but it shouldn't return bogus data. This series will generate a NaN:
242.0 267.0 756.0 1101.0 1211.0 1181.0 929.0 271.0 381.0 1212.0 1277.0 1265.0 1207.0 955.0 233.0 268.0 1020.0 1049.0 1140.0 1185.0 925.0 251.0 286.0 1020.0 1187.0 1094.0 1082.0 863.0 214.0 305.0 972.0 1014.0 1046.0 1046.0 929.0 213.0 285.0 1119.0 1224.0 1140.0 1062.0 862.0 234.0 273.0 1099.0
- It's still a little too slow. Previously we had discussed not trying every possible combination of ARIMA parameters and instead just picking a few that would be representative and trying those. What is your assessment of this strategy as a tactic for speeding things up? What datasets do you need to perform this analysis and make this happen?
Let's start with six samples as the basis for the ARIMA modeling:
- the attached
observations.csv
below, an extraction from a single metric - a metric consisting of only zeroes
- a metric consisting of only constant non-zero value
- a metric consisting of a linearly increasing value
- a metric consisting of an logarithmically increasing value
- a metric consisting of an slowly but exponentially increasing value
and use the ARIMA parameters by those as our test cases.