Skip to content

Instantly share code, notes, and snippets.

@cobusc
Last active July 26, 2024 09:16
Show Gist options
  • Save cobusc/5875282 to your computer and use it in GitHub Desktop.
Save cobusc/5875282 to your computer and use it in GitHub Desktop.
Short explanation of the issues faced when trying to create a PostgreSQL index using the date() function and how to resolve it.

Update: 2023/11/30 Bruce Momjian's take on things: https://momjian.us/main/blogs/pgblog/2023.html#November_22_2023

Given a table...

CREATE TABLE foo (
  id SERIAL PRIMARY KEY,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  ...
);

...you realize that you have to perform complex queries with the following type of condition:

WHERE ...
  AND DATE(created_at) = DATE('2013-01-01') ...

The table is big and other indexes are not helping, so you decide to create an index on DATE(created_at):

postgres=> CREATE INDEX ON foo (DATE(created_at));
ERROR:  functions in index expression must be marked IMMUTABLE

It turns out that the function DATE(TIMESTAMP WITH TIME ZONE) is mutable, due to the time zone. On the other hand DATE(TIMESTAMP WITHOUT TIME ZONE) is immutable, as shown in the following example:

postgres=> CREATE TABLE test (created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NOW());
CREATE TABLE
postgres=> CREATE INDEX ON test (date(created_at));
CREATE INDEX

So how do we handle the case where we have a time zone? Well, since the DATE() function may give different results based on the time zone, we have to decide which time zone is applicable. In this example we use UTC:

postgres=> CREATE INDEX ON foo (DATE(created_at AT TIME ZONE 'UTC'));
CREATE INDEX

That worked. Now how do get the query planner to use it? I was under the impression that I simply had to use the same function I used to create the index in the WHERE condition. Let's see what the query planner says:

postgres=> EXPLAIN SELECT * FROM foo WHERE DATE(created_at AT TIME ZONE 'UTC') = DATE('2013-01-01');
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Seq Scan on foo  (cost...)
   Filter: (date(timezone('UTC'::text, created_at)) = '2013-01-01'::date)

The index is not used. Let's go look at the index definition:

postgres=> \d foo_date_idx 
       Index "public.foo_date_idx"
 Column | Type |               Definition                
--------+------+-----------------------------------------
 date   | date | date(timezone('UTC'::text, created_at))
btree, for table "public.foo"

Let's try to use the definition as stored by the database. Note that this is (and should be) functionally equivalent to DATE(created_at AT TIME ZONE 'UTC'), since the documentation says:

The function timezone(zone, timestamp) is equivalent to the SQL-conforming construct timestamp AT TIME ZONE zone.
postgres=> EXPLAIN SELECT * FROM foo WHERE DATE(TIMEZONE('UTC'::text, created_at)) = DATE('2013-01-01');
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo (cost...)
   Recheck Cond: (date(timezone('UTC'::text, created_at)) = '2013-01-01'::date)
   ->  Bitmap Index Scan on foo_date_idx  (cost...)
         Index Cond: (date(timezone('UTC'::text, created_at)) = '2013-01-01'::date)

This seems to have done the trick. It is strange though that PostgreSQL rewrites the function used to create the index to a canonical form, but does not seem to do the same when the function is used in the WHERE clause (in order to match the index function).

Footnote: PostgreSQL 9.1.9 was used.

Copy link

ghost commented Oct 13, 2015

wow excellent post

@davidqhr
Copy link

Hi @cobusc,

I use pg 9.5. I still got functions in index expression must be marked IMMUTABLE error, when I create function index with sql below.

CREATE UNIQUE INDEX unique_idx_card_logs_on_created_at_date ON card_logs ( date(created_at AT TIME ZONE 'UTC+8') );

It seem created_at AT TIME ZONE 'UTC+8' return value is immutable, but data(..) function is not

Any help will be greatly appreciated!

@eldog
Copy link

eldog commented Jan 7, 2017

Really useful. I used this for an ORDER BY with a LIMIT. Thought the index wasn't working, but then read the manual and saw the LIMIT I was using was too high (1000), so the planner decided to do a Seq Scan. A lower LIMIT (100) made it use the index.

@MichaelDBA
Copy link

MichaelDBA commented Jan 17, 2017

A little more elaboration might suffice here....

Although this example uses UTC as the timezone value, if you have something specified differently as timezone in postgresql.conf, then you must use that value instead of UTC when creating and using the index in a query. Assuming timezone = 'localtime', then this needs to happen:

CREATE INDEX ON foo (DATE(created_at AT TIME ZONE 'localtime'));
EXPLAIN SELECT * FROM foo WHERE DATE(TIMEZONE('localtime'::text, created_at)) = DATE('2013-01-01');

If you don't do that, you may be deceived by erroneous query results, even though the explain would still show that it's using UTC.

I stumbled on this while implementing this solution in our own environment. Before the index, I got 188 rows back. Afterwards, I got none back. But when I changed index and query to use "localtime", then I got the expected 188 rows back using the index.

@MichaelDBA
Copy link

A little more elaboration might suffice here....

Although this example uses UTC as the timezone value, if you have something specified differently as timezone in postgresql.conf, then you must use that value instead of UTC when creating and using the index in a query. Assuming timezone = 'localtime', then this needs to happen:

CREATE INDEX ON foo (DATE(created_at AT TIME ZONE 'localtime'));
EXPLAIN SELECT * FROM foo WHERE DATE(TIMEZONE('localtime'::text, created_at)) = DATE('2013-01-01');
If you don't do that, you may be deceived by erroneous query results, even though the explain would still show that it's using UTC.

I stumbled on this while implementing this solution in our own environment. Before the index, I got 188 rows back. Afterwards, I got none back. But when I changed index and query to use "localtime", then I got the expected 188 rows back using the index.

@kehao95
Copy link

kehao95 commented Aug 2, 2018

Good post, very helpful! Thank you.

@RhodiumToad
Copy link

More or less everything in this post is wrong.

Firstly, life is so much simpler if you follow the rule of always having a bare column on one side of a condition in the WHERE clause (and the other side should be a pseudoconstant, i.e. not referring to any other column of the same table).

Thus WHERE created_at >= '2013-01-01' AND created_at < '2013-01-02', which will use an index on created_at with no fuss.

Secondly, even if you don't make that simple fix, there's no reason to rewrite using timezone(), using the same expression in the query as in the index always works. Most likely here the poster was misled by statistics (but with the cost numbers redacted from the explains there's no way to know).

@ssi-anik
Copy link

I was creating an index with extract(month from the_field at time zone 'UTC') and querying like extract(month from the_field) = month_variable which was not hitting the index.

Then I found your solution. And the idea here is to use exactly the same thing you did for indexing. So, extract(month from the_field at time zone 'UTC') = month_variable definitely hits the index now. Thanks for the gist. And I'd request you to put it somewhere that is easily indexed in search results like publishing on Medium.

@md-redwan-hossain
Copy link

None of the solutions work in Postgres 16

@cobusc
Copy link
Author

cobusc commented Nov 30, 2023

None of the solutions work in Postgres 16

I've added a link to a recent blogpost by Bruce Momjian on the subject: https://momjian.us/main/blogs/pgblog/2023.html#November_22_2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment