[This is a follow up on the work I did last semester. Some of the descriptive text is the same.]
PyPy is an implementation of Python, written in (a restricted subset of) Python, featuring an advanced tracing just-in-time compiler. It is currently the fastest implementation of Python available.
NumPy is a framework for vector and matrix numerical calculations in Python, it is written largely in C, utilizing the CPython extension API, and thus does not run on PyPy.
Reimplementing NumPy for PyPy is considered advantageous because one of NumPy's chief goals is performance, and PyPy is a) the demonstrated performance king for Python VMs, and b) has the potential to further speed up NumPy. NumPy is widely used by scientists in all fields, and the ability to bring greater speed to their applications, without the need to resort to languages such as C/C++/Java, is considered both a technical and social gain.
Since last semester NumPyPy has made increadible strides in compatibility. We now support multi-dimensional arrayswith a wide variety of data types. Therefore, my goal for this semester are to increase compatibility by adding more of the methods for interacting with data that NumPy has. At the time of writing, NumPyPy implements 66/161 methods on arrays, 22/48 methods on datatypes, and 121/559 module level methods.
This projects will address these missing methods in three chunks:
- First, exposing
ndarray.ctypes
, this is exposes low-level information about the memory location of an array's data, and can be used, in conjuction withf2pypy
, to interface directly with existing fortran and C libraries, such as ATLAS, BLAS, and LAPACK. - Second, implementing methods that are views on the memory of existing data,
such as
ravel()
, which exposes a single dimensional view of a multi-dimensional array. This step includes allowing arrays to be created in Fortran iteration order, column-major, (as opposed to the default, C order, row-major). - The final goal will be bringing in more of NumPy's pure-Python code. Though
NumPy is largely written in C, there are portions of it written in Python.
This step will involve analysing the dependencies of different pure-Python
code, implementing those, and then bringing in the pure-Python code. For
example, NumPy includes a pure-Python Matrix class, which relies on being
able to subclass
ndarray
, which NumPyPy doesn't yet support.
These would be implemented on the following timeline:
- Weeks 1-3: Exposing
ndarray.ctypes
and exposing any fortran libraries that NumPy does. - Weeks 4-6: Implementing fortran/column-major storage for arrays.
- Weeks 7-8: Adding in new data-manipulation methods.
- Weeks 9-10: Adding support for subclassing
ndarray
. - Weeks 11-14: Bringing in addition NumPy Python code.
All milestones include tests and documentation.