Skip to content

Instantly share code, notes, and snippets.

@loic
Created January 23, 2014 11:17
Show Gist options
  • Save loic/8576940 to your computer and use it in GitHub Desktop.
Save loic/8576940 to your computer and use it in GitHub Desktop.
Django custom index types.
ML thread:
https://groups.google.com/d/topic/django-developers/hVTLAp_jLzQ/discussion
2013-10-03:
20:15 loic84: akaariai: what's your opinion no the recent ML thread regarding custom index types?
20:18 akaariai: loic84: write the SQL by hand... Or, maybe you could have a VirtualField that just creates indexes for you.
20:20 loic84: akaariai: I've been thinking about integrating PG arrays and JSON support in my projects.
20:20 loic84: your work on custom lookups would help quite a bit for that
20:20 akaariai: I mean, maybe you could have a virtual field hash_idx_id = HashIndex(id)
20:21 loic84: but without the index, it's not exactly useful, and SQL by hand is not very elegant.
20:21 akaariai: where HashIndex is custom written virtual field, it just creates the index for you, nothing else.
20:21 akaariai: koniiiik might know better if the above is actually possible to do...
20:25 akaariai: loic84: I don't know what is wrong in adding indexes by raw SQL in a migration?
20:25 loic84: well with migrations I guess the situation improves, when it was in a fixture I wasn't a big fan.
20:26 loic84: I was quite glad to switch to index_together when it landed.
20:26 loic84: that said, having it in a different file makes mistakes (missing index) more difficult to spot.
20:27 loic84: if you only need db_index, unique_together and index_together, a quick look at a Model and you know the whole story about the current index situation.
20:34 loic84: akaariai: as a sidenote, I think supporting more PG types would be a killer feature, I'm not a big fan of lowest common denominator in general.
20:36 akaariai: I try to implement public API infrastructure to implement more PG types.
20:37 akaariai: if those implementations should be in core isn't *that* big of a deal for me.
20:40 akaariai: if you can implement JSONField with public API that is much better than hacking JSONField into current code.
20:44 loic84: True that, if then can be nicely and reliably implemented outside of core using only public API, it's not that big of a deal.
2013-10-05:
15:57 akaariai: loic84: coming back to custom index definitions in models.py... I changed my mind...
15:57 loic84: akaariai: what made you change your mind?
15:58 akaariai: for example a module offering pg_trgm would be a lot better if it could actually also create the indexes
15:58 akaariai: basically, custom fields on postgresql want to customize the indexes, and currently the only way (I know of) to do that is to subclass the used db backend
15:58 akaariai: also, GIS needs this...
15:58 loic84: yeah...
15:59 akaariai: I don't know what the right API is, but ability to do what GIS needs is pretty good starting point.
15:59 loic84: also a ArrayField used for lookups would be rather useless without a GIN index.
16:01 akaariai: I have a new goal in ORM development - make contrib.gis as much as possible public API.
16:01 akaariai: if gis is possible with public API it means a lot of interesting stuff is possible.
16:01 loic84: and yeah as I mentioned last time, it's very nice to be able to scan a model complete with index in a single file.
16:02 loic84: especially if you work on a project where most of the code isn't yours.
16:03 akaariai: I'll have to think about this, but it seems what is needed is for a field to specify the SQL needed for field creation & index creation.
16:03 akaariai: currently it isn't that way, the backend decides what the field needs, and that isn't good for extensibility.
16:04 loic84: yeah, that sounds right.
16:09 loic84: It would be great to brainstorm with koniiiik regarding an API, I'm just he's got a few things in mind for VirtualFields.
16:10 loic84: but a custom index for a single field would probably belong to that field's parameters, for legibility.
16:12 loic84: multicol indexes though would be nicer as VirtualFields rather than Meta.
16:22 akaariai: loic84: I think we can support both - if we just ask the field "what indexes do you want to create" both single field indexes by field flag and multicolumn indexes by virtual fields would be possible.
16:24 loic84: true
16:27 loic84: akaariai: I don't know GIS at all so I might be completely off, but how a single field type would deal with different index for different backends?
16:28 loic84: that's not a problem for a PG type like Array or JSON, since it's 1 field for 1 backend, but for GIS, dunno.
16:34 akaariai: loic84: a idea is that the backend asks the field for index definitions, that is indexes.extend(field.get_index_definitions())
16:34 akaariai: the index_sql = [def.as_sql(connection) for def in index_definitions]
16:35 akaariai: if the index definition supports only one db, the code would likely be: if connection.vendor == 'postgresql': return sql; else raise Exeption("Not supported")
16:36 akaariai: for gis the way would likely be to call back to connection.gis.get_index_sql(field)
16:37 loic84: that would work
16:37 akaariai: now, how you register the gis module for the connection without subclassing it is left as an exercise...
16:38 akaariai: (if there was a well defined application startup order that wouldn't be a big problem...)
16:39 akaariai: but I think registering modules for backend, not subclassing it is the correct way to support things like gis.
16:39 loic84: definitely
16:39 akaariai: basic problem currently is that if you need both gis and say hstore, both of which require custom backend -> not easy.
16:40 loic84: I was gonna mention that earlier, the backend is too much of a blanket right now
16:41 loic84: hstore or gis without requiring a custom backend would be a lovely improvement.
16:42 loic84: we can always move the registration to setings...
16:43 loic84: after all, it's already in there with custom backends...
16:44 akaariai: loic84: something like DATABASES['default']['modules']: {'gis': 'django.contrib.gis.db.backends.postgis'}?
16:44 loic84: yup
16:46 loic84: if people scream no more settings, we can scream back, unreliable loading order :P
16:46 akaariai: yeah, that would work... Another possibility might be to do "for app in settings.INSTALLED_APPS: if has_connection_modules('app'): self.register_module(app)" in connection class creation.
16:47 loic84: or that yes.
16:48 akaariai: app loading refactor could give something for this use case
2014-01-23:
12:26 loic84: akaariai_: now that class based lookups are merged, we should discuss custom indexes ;)
13:13 akaariai: loic84: for custom indexes - any way that lets users add whatever indexes they want
13:16 loic84: akaariai: I was thinking maybe class based indexes, that can go either in db_index, or index_together.
13:16 akaariai: loic84: yeah, class based seems to be the way to go
13:16 loic84: and ideally custom fields should be able to define some indexes for themselves.
13:18 akaariai: seems like we need support for GISField(index=True) => custom index types (depending on the DB backend), TextField(index=True) => two indexes when using PostgreSQL.
13:19 akaariai: so, fields need to be able to define what index=True means per field
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment