Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save loicbertron/e3323b2500f2f634ac9188cc7203352a to your computer and use it in GitHub Desktop.
Save loicbertron/e3323b2500f2f634ac9188cc7203352a to your computer and use it in GitHub Desktop.
How to identify recurring patterns in this set of transactional data
I'm working on a dataset of banking transactions and would like to find recurrent transactions.
I've been mapping transactions per merchant in timeseries, and tried to use acf from statsmodels.tsa.stattools to calculate the autocorrelation function but i'm not getting the expected results:
`r = acf(ts, fft=False)`
For example this set of transaction (ASSURANCE DESJ) is getting an acf score of 0.3159 when it's obviously a recurring transaction (same amount, same frequency).
[![enter image description here][1]][1]
Another example of recurring transactions with acf=0.22775:
[![enter image description here][2]][2]
But this one should not be found as a recurring transactions, and get a score not too far from the previous set (0.26919):
[![enter image description here][3]][3]
I've been checking a lot of different methods, I acutally came up with a combination of auto-correlation on the regular timeserie, auto-correlation on a timeserie with amount=1, with stationnary checking and other rules to have a not so perfect results. I've also checked at ARIMA and other methodology without luck.
**Would you have a better way to detect recurring transactions from timeseries ?**
Dataset A ('ASSURANCE DESJ. ASS. GEN.') :
```
{
'2019-07-15': 9831.0,
'2019-08-15': 9818.0,
'2019-09-16': 9818.0,
'2019-10-15': 9818.0,
'2019-11-15': 9818.0,
'2019-12-16': 9818.0,
'2020-01-15': 9818.0,
'2020-02-17': 9818.0,
'2020-03-16': 9818.0
}
```
Dataset B ('STATI NEMENT VILLE MTL') :
```
{
'2018-12-10': 447.0,
'2019-02-11': 107.0,
'2019-02-25': 82.0,
'2019-03-12': 418.0,
'2019-03-28': 142.0,
'2019-04-01': 167.0,
'2019-04-04': 261.0,
'2019-04-17': 127.0,
'2019-04-22': 223.0,
'2019-04-29': 326.0,
'2019-05-14': 657.0,
'2019-06-20': 332.0,
'2019-07-02': 332.0,
'2019-07-17': 332.0,
'2019-07-29': 69.0,
'2019-09-09': 277.0,
'2019-12-12': 332.0,
'2019-12-31': 169.0,
'2020-01-19': 169.0,
'2020-02-21': 657.0,
'2020-02-28': 657.0,
'2020-02-29': 537.0,
'2020-03-06': 575.0
}
```
Dataset C ('STATI NEMENT VILLE MTL') :
```
{
'2018-12-10': 447.0,
'2019-02-11': 107.0,
'2019-02-25': 82.0,
'2019-03-12': 418.0,
'2019-03-28': 142.0,
'2019-04-01': 167.0,
'2019-04-04': 261.0,
'2019-04-17': 127.0,
'2019-04-22': 223.0,
'2019-04-29': 326.0,
'2019-05-14': 657.0,
'2019-06-20': 332.0,
'2019-07-02': 332.0,
'2019-07-17': 332.0,
'2019-07-29': 69.0,
'2019-09-09': 277.0,
'2019-12-12': 332.0,
'2019-12-31': 169.0,
'2020-01-19': 169.0,
'2020-02-21': 657.0,
'2020-02-28': 657.0,
'2020-02-29': 537.0,
'2020-03-06': 575.0
}
```
[1]: https://i.stack.imgur.com/hw8n1.png
[2]: https://i.stack.imgur.com/1XLNn.png
[3]: https://i.stack.imgur.com/tmw6J.png
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment