SciPy: Improving Numerical Integration of Time Series

The goal of this project is to implement better interoperation between scipy.integrate and pandas.TimeSeries. Currently, time-stamped arrays have to be converted to an integer-based domain (which incurs a computational overhead) in order for numeric integration to work. The aim is to improve this by adding runtime TimeSeries detection and calculation of numerical integrals (e.g. the trapezoidal rule) without any domain transformations by utilizing the already available timestamp arithmetic.

Proposal Detailed Description:

The main motivation is that performing numerical integration (using functions available in scipy.integrate) of a time series (pandas.TimeSeries) currently requires a transformation of the data domain into integers corresponding to the desired unit (e.g. 1 unit = 1 second). This transformation incurs an additional computational overhead and results in an "unpythonic" (not user-friendly) interface.

The goal of this project is to improve how numerical integration of time series in SciPy is performed - both performance-wise and from a usability perspective. This will be accomplished by extending SciPy to use the available timestamp arithmetic when time-stamped data is detected (dynamically, without introducing any dependencies), instead of forcing the user to manually transform the data domain.

Project timeline:

3 - 24 May - Get to know the community, participate in the mailing list, IRC channel. Read all the documentation for developers. Go through SciPy's source code and try to get the lay of the land.
25 May - 2 June - start communicating with the mentor. Try to do some fixing of a simple bug to understand how code and tests are written in SciPy
3 June - 16 June - learn more about the related technologies relevant to improving numerical integration - details of Python's and Pandas' DateTime/Timestamp objects, numpy data structures used by SciPy to perform the integration functions. Tackle a more advanced bug related to scipy.integrate. First solution design milestone.
17 June - 30 June - develop a first rudimentary time series numerical integration function based on the trapezoidal rule
1 July - 14 July - expand the solution (and tests) to cover other samples-related rules (Simpson's and Romberg's)
15 July - 4 August - benchmarking and changing the code to improve the quality of the new features. Feature freeze milestone.
5 August - 18 August - thorough testing on various real-world data and fixing of any bugs. Start writing the documentation.
19 August - 1 September - Most of the work should be done by now. Finish the documentation. Work done milestone.
2 - 16 September - Backup for any remaining tasks. If everything before went as planned, the usability of scipy.integrate for time series integration could be further improved by adding some new features to Pandas too. For example - adding a resample option that includes both borders would be beneficial to numerical integration. Pencils down.

Link to a patch/code sample:

Simple documentation fix to test the SciPy GitHub workflow - scipy/scipy#523
philharmonic - a project I am working on that uses scipy.integrate and pandas to calculate electricity prices

Links to additional information:

My other open source contributions are visible on GitHub, Launchpad and here
StackOverflow profile
CV

Sub-organization Information

Sub-organization with whom I hope to work: SciPy

Student Information

Name: Dražen Lučanin
Email: drazen.lucanin at gmail.com
Telephone: +385958838241
IRC: kermit666 at irc.freenode.net
GitHub: kermit666
Launchpad: kermit666
gtalk: kermit666 at gmail.com
Skype: kermit665
Twitter: @kermit666
Home Page: http://www.infosys.tuwien.ac.at/staff/drazen/
Blog: http://kermit.epska.org
G+: https://plus.google.com/102247421944915765889/posts

University Information

University: Vienna University of Technology, Austria
Major: Computer science
Current Year and Expected Graduation date: 2nd year, expecting graduation in 2014
Degree: PhD