Skip to content

Instantly share code, notes, and snippets.

@abn
Last active August 21, 2024 14:26
Show Gist options
  • Save abn/779166b0c766ce67351c588489831852 to your computer and use it in GitHub Desktop.
Save abn/779166b0c766ce67351c588489831852 to your computer and use it in GitHub Desktop.
A slugify function for postgres
-- original source: https://medium.com/adhawk-engineering/using-postgresql-to-generate-slugs-5ec9dd759e88
-- https://www.postgresql.org/docs/9.6/unaccent.html
CREATE EXTENSION IF NOT EXISTS unaccent;
-- create the function in the public schema
CREATE OR REPLACE FUNCTION public.slugify(
v TEXT
) RETURNS TEXT
LANGUAGE plpgsql
STRICT IMMUTABLE AS
$function$
BEGIN
-- 1. trim trailing and leading whitespaces from text
-- 2. remove accents (diacritic signs) from a given text
-- 3. lowercase unaccented text
-- 4. replace non-alphanumeric (excluding hyphen, underscore) with a hyphen
-- 5. trim leading and trailing hyphens
RETURN trim(BOTH '-' FROM regexp_replace(lower(unaccent(trim(v))), '[^a-z0-9\\-_]+', '-', 'gi'));
END;
$function$;
@GemN
Copy link

GemN commented Aug 23, 2021

@libovness

Replace the regex by [^a-z0-9\-_]+ (remove the extra slash)

There is an extra \ which means according to regex101

\-_ matches a single character in the range between \ (index 92) and _ (index 95) (case insensitive)

Which means that \, ], ^, _ won't be replaced.

@benkroeger
Copy link

if you put the hyphen to the end, there is no need for escaping at all [^a-z0-9_-]+

@kbsali
Copy link

kbsali commented May 23, 2023

Thanks!
Note that I've had issues when trying to restore a dump of a db using this slugify method. The solution was to specify the schema for unaccent (not obvious, it took me hours to find the solution) :

CREATE OR REPLACE FUNCTION public.slugify(
  v TEXT
) RETURNS TEXT
  LANGUAGE plpgsql
  STRICT IMMUTABLE AS
$function$
BEGIN
  -- 1. trim trailing and leading whitespaces from text
  -- 2. remove accents (diacritic signs) from a given text
  -- 3. lowercase unaccented text
  -- 4. replace non-alphanumeric (excluding hyphen, underscore) with a hyphen
  -- 5. trim leading and trailing hyphens
  RETURN trim(BOTH '-' FROM regexp_replace(lower(public.unaccent(trim(v))), '[^a-z0-9\\-_]+', '-', 'gi'));
END;
$function$;

@billrobertson42
Copy link

@kbsali if public is in your user's search path, you should not need to explicitly need to prefix unaccent with public.

AFAIK, the default search path is "$user", public

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment