Skip to content

Instantly share code, notes, and snippets.

@metakermit
Last active December 17, 2015 01:10
Show Gist options
  • Save metakermit/5526790 to your computer and use it in GitHub Desktop.
Save metakermit/5526790 to your computer and use it in GitHub Desktop.
Dražen's GSoC 2013 SciPy proposal

SciPy: Improving Numerical Integration of Time Series

The goal of this project is to implement better interoperation between scipy.integrate and pandas.TimeSeries. Currently, time-stamped arrays have to be converted to an integer-based domain (which incurs a computational overhead) in order for numeric integration to work. The aim is to improve this by adding runtime TimeSeries detection and calculation of numerical integrals (e.g. the trapezoidal rule) without any domain transformations by utilizing the already available timestamp arithmetic.

Proposal Detailed Description:

The main motivation is that performing numerical integration (using functions available in scipy.integrate) of a time series (pandas.TimeSeries) currently requires a transformation of the data domain into integers corresponding to the desired unit (e.g. 1 unit = 1 second). This transformation incurs an additional computational overhead and results in an "unpythonic" (not user-friendly) interface.

The goal of this project is to improve how numerical integration of time series in SciPy is performed - both performance-wise and from a usability perspective. This will be accomplished by extending SciPy to use the available timestamp arithmetic when time-stamped data is detected (dynamically, without introducing any dependencies), instead of forcing the user to manually transform the data domain.

Project timeline:

  • 3 - 24 May - Get to know the community, participate in the mailing list, IRC channel. Read all the documentation for developers. Go through SciPy's source code and try to get the lay of the land. 
  • 25 May - 2 June - start communicating with the mentor. Try to do some fixing of a simple bug to understand how code and tests are written in SciPy
  • 3 June - 16 June - learn more about the related technologies relevant to improving numerical integration - details of Python's and Pandas' DateTime/Timestamp objects, numpy data structures used by SciPy to perform the integration functions. Tackle a more advanced bug related to scipy.integrate. First solution design milestone.
  • 17 June - 30 June - develop a first rudimentary time series numerical integration function based on the trapezoidal rule
  • 1 July - 14 July - expand the solution (and tests) to cover other samples-related rules (Simpson's and Romberg's)
  • 15 July - 4 August - benchmarking and changing the code to improve the quality of the new features. Feature freeze milestone.
  • 5 August - 18 August - thorough testing on various real-world data and fixing of any bugs. Start writing the documentation.
  • 19 August - 1 September - Most of the work should be done by now. Finish the documentation. Work done milestone.
  • 2 - 16 September - Backup for any remaining tasks. If everything before went as planned, the usability of scipy.integrate for time series integration could be further improved by adding some new features to Pandas too. For example - adding a resample option that includes both borders would be beneficial to numerical integration. Pencils down.

Link to a patch/code sample:

  • Simple documentation fix to test the SciPy GitHub workflow - scipy/scipy#523
  • philharmonic - a project I am working on that uses scipy.integrate and pandas to calculate electricity prices

Links to additional information:

Sub-organization Information 

Sub-organization with whom I hope to work: SciPy

Student Information

University Information

  • University: Vienna University of Technology, Austria
  • Major: Computer science
  • Current Year and Expected Graduation date: 2nd year, expecting graduation in 2014
  • Degree: PhD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment