Skip to content

Instantly share code, notes, and snippets.

@davidhewitt
Last active October 25, 2023 13:08
Show Gist options
  • Save davidhewitt/d0ed031fb05f6db98ee249ae089b268e to your computer and use it in GitHub Desktop.
Save davidhewitt/d0ed031fb05f6db98ee249ae089b268e to your computer and use it in GitHub Desktop.
Dreaming of arbitrary self types for PyO3
//! The following is a simplified form of a possible PyO3 API which shows
//! cases where arbitrary self types would help resolve papercuts.
// ----------------------------------------------------------------------------------
//
// Case 1 - PyO3's object hierarchy. We have a smart pointer type Py<T> and want to
// use it as a receiver for Python method calls.
//
//
/// Python's C API is wrapped by `pyo3-ffi` crate, also exported as `pyo3::ffi`
/// submodule.
mod ffi {
extern {
/// A Python object. For this model we don't care about it's contents, so we
/// just use unstable "extern type" syntax to name it.
type PyObject;
}
}
/// A smart pointer to a Python object, which is reference counted. A good enough
/// description is that it is approximately an `Arc<T>` where the memory is
/// stored on the Python heap and reference counting is synchronized by the
/// Python GIL (Global Interpreter Lock).
///
/// Here in this model we ignore the existence of the Python GIL as it is just a
/// distraction. In PyO3's real API we have a lifetime `'py` on several types to
/// model this
struct Py<T>(NonNull<ffi::PyObject>);
// -- Some zero-sized types to describe Python's object hierarchy. --
/// Any Python object.
struct PyAny(());
/// A concrete subtype, a Python list.
struct PyList(());
// -- Implementations of methods on these types --
// In practice these methods return results, we'll ignore that here.
impl PyAny {
/// Get an attribute on this object. In Python syntax this is `self.name`.
///
/// Receiver is &Py<PyAny> - arbitrary self type!
fn getattr(self: &Py<PyAny>, name: &str) -> Py<PyAny> { /* ... */ }
}
impl PyList {
/// Get an element from this list. In Python syntax this is `self[idx]`.
///
/// Receiver is &Py<PyList> - arbitrary self type!
fn get_item(self: &Py<PyList>, idx: usize) -> Py<PyAny> { /* ... */ }
}
// In addition, we want to call `getattr` with a `Py<PyList>`, because this is
// a valid operation too. The cleanest way to do this is with `Deref`:
impl Deref for Py<PyList> {
type Target = Py<PyAny>;
fn deref(&self) -> &Py<PyAny> { /* ... */ }
}
// ... but if arbitrary self types is tied to Deref, instead we have to have
impl Deref for Py<PyList> {
type Target = PyList;
fn deref(&self) -> &PyList { /* ... */ }
}
// We could find other ways to make Py<PyList> have a getattr method without
// `Deref`, e.g. by moving all of `PyAny` methods onto a trait and implementing
// it for `Py<PyAny>`, `Py<PyList>` and so on. This leads to a lot of repetition;
// N trait implementations for N concrete types PyAny, PyList, etc.
// Also the `&PyList` reference on its own is useless, so `Deref<Target = PyList>`
// is a little weird.
// ----------------------------------------------------------------------------------
//
// Case 2 - PyO3's "refcell" container synchronized by the GIL. This has a close
// cousin in `std::cell::RefCell`.
//
//
/// PyO3 has a `#[pyclass]` macro which generates a Python type for a Rust
/// struct.
/// - `Foo` continues to be the plain old Rust struct.
/// - `Py<Foo>` is a smart pointer to a Python object which contains a `Foo`.
#[pyclass]
struct Foo { /* ... */ }
/// To implement methods on the Python type PyO3 has a `#[pymethods]` macro.
///
/// Users can use `&self` and `&mut self` receivers. To make this possible,
/// `Py<Foo>` like `RefCell<Foo>` but uses the Python GIL for synchronization.
/// `PyRef<'_, Foo>` and `PyRefMut<'_, Foo>` are the guards to `Py<Foo>`.
impl Foo {
/// Receive by `&self``, read only the Rust data. Possible today.
fn a(&self) { /* ... */ }
/// Receive by `&mut self`, read or write only the Rust data. Possible today.
fn b(&mut self) { /* ... */ }
/// Receive by `Py<Foo>`. `Py<Foo>` implements `Deref<Target = Py<PyAny>>`
/// so that all Python operations are accessible.
///
/// This is an arbitrary self type.
///
/// Current users of PyO3 have to use `slf: Py<Foo>` which is awkward
/// and also loses method call syntax.
fn c(self: Py<Foo>) { /* ... */ }
/// Receive by `PyRef<'_, Foo>`. `PyRef<'_, Foo>` is a pointer to the Python
/// data. It implements `Deref<Target = Foo>` to give read access to the Rust
/// data.
///
/// This is an arbitrary self type.
///
/// Same workarounds for current users of PyO3 apply.
fn d(self: PyRef<'_, Foo>) { /* ... */ }
/// Receive by `PyRefMut<'_, Foo>`. `PyRefMut<'_, Foo>` is a pointer to the Python
/// data. It implements `DerefMut<Target = Foo>` to give read and write access to
/// the Rust data.
///
/// This is an arbitrary self type.
///
/// Same workarounds for current users of PyO3 apply.
fn e(self: PyRefMut<'_, Foo>) { /* ... */ }
}
// Note that in the above, `PyRef<'_, Foo>` and `PyRefMut<'_, Foo>` both implement
// `Deref<Target = Foo>` so would fit fine with deref-based arbitrary self types.
//
// However `Py<Foo>` cannot implement `Deref<Target = Foo>`, just like how `RefCell<T>`
// cannot implement `Deref<Target = T>`.
//
// To make `Py<Foo>` be able to implement `Deref`, we must give up its refcell-like
// feature. This removes `PyRef<'_, Foo>` and `PyRefMut<'_, Foo>`, and it also
// removes the ability to have `&mut self` as a receiver. The mutable access
// needs the runtime refcell protection due to Python code being incompatible with
// the borrow checker.
//
// There is a possible argument that removing `&mut self` and refcell feature is
// a good thing, but it is also _extremely_ ergonomic for users. We could have
// a long conversation about whether PyO3 made the wrong API choice here. There is
// `#[pyclass(frozen)]` which opts-in to this restriction, so by flipping the default
// and then removing the option we could evolve PyO3's API over time if we think
// deref-based arbitrary self types is the correct formulation of arbitrary self types.
//
// If you feel like a long distraction, we can discuss how Python might
// be removing the GIL, and how that means that PyO3 might be forced to change
// anyway.
@adetaylor
Copy link

Thanks @madsmtm and @davidhewitt for all the discussion.

I think I agree with your last comment David - I think you can make everything work with just a Receiver impl (without Deref) and then some traits.

That said,

  • I am pretty sure that the blanket impl of Receiver for Deref will be seen as Not The Rust Way as soon as I raise the PR. It is certainly unusual. I am worried we will sink a ton of time discussing this without really clear arguments on either side. If you can think of a way to avoid this, I'm all ears!
  • It's not your fault that you overlook the possibility of Receiver without Deref. I think the RFC is insufficiently clear about this, and I'll work on it.
  • Your PyO3 example made me realize an assumption underlying our blanket impl: we're assuming that folks are using Deref because their type is a smart pointer containing something (has-a relationships). You want(ed) to use Deref for a completely different purpose, to express is-a relationships, along the lines of coercion. In this case, people might validly want their Deref resolution and their Receiver resolution to point in different directions. I think we need to be more explicit in the RFC that we're choosing not to be compatible with such use-cases, and they should be achieved using traits (or some future Coercion trait) instead.

So this has been a most useful discussion, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment