z3niths · November 19, 2020 06:39
diff --git a/mysql_to_postgresql.txt b/mysql_to_postgresql.txt
 To help you with the process of converting a MySQL app to PostgreSQL, I collected a list of differences between MySQL and PostgreSQL (PG).

 Important changes:

 * Strings are quoted with '...' or with $token$...$token$. Single-quotes are escaped with single-quotes.
 * Identifiers are folded to lowercase, unless they are quoted with "..." which makes them case-sensitive
 * The max identifier length is 63
 * There are some system columns that cannot be used as an identifier (probably not a problem)
 * Expressions are evaluated in arbitrary order, so WHERE x > 0 AND y/x > 1.5 can lead to a division by zero and has to be replaced with WHERE CASE WHEN y > 0 THEN y/x > 1.5 ELSE FALSE END
 * SELECT CASE WHEN x > 0 THEN x ELSE 1/0 END would still lead to a division by zero because the constant 1/0 is evaluated even before the query starts
 * Permissions can only be managed on the database level. (And row level with Row Security Policies.)

 Types:

 * SERIAL instead of AUTO_INCREMENT
 * VARCHAR (for text) doesn't need a length limit, is called text and does not allow 0-bytes and invalid UTF-8 characters
 * VARCHAR (for binary data) doesn't need a length limit and is called bytea
 * DATETIME is called timestamp
 * Casts are done by simply prepending the type (or appending ::type)
 * There's a proper boolean type
 * ENUM types have to be created explicitly or use text (see below)
 * There is a jsonb type (and json which preserves the original text)
 * There is a GUID type
 * There are IP address types
 * There are geometric types
 * There are range types (time ranges, price ranges, etc.) that can be unioned, etc.

 Features we should use to replace equivalent MySQL constructs:

 * timestamp with time zone
 * FULL OUTER JOIN
 * Foreign keys (`something_id integer references somethings` points to the primary key of `somethings`)
 * FILTER for aggregate functions
 * Additional constraints with CHECK
 * VALUES() subselects instead of SELECT 'a' UNION SELECT 'b' ...
 * Condition pushdown and subquery caching
 * Functionally dependent columns are properly detected, so we don't have to give all columns in GROUP BY if we give a unique key
 * Grouping sets are WITH ROLLUP on steroids
 * We can define distinctness with SELECT DISTINCT ON ...
 * Besides UNION we also have intersection and non-symmetric diff
 * Expression indexes
 * Partial indexes
 * RECURSIVE queries
 * DELETE/UPDATE + RETURNING
 * DELETE/UPDATE + RETURNING + WITH (note that they are running at the same time)
 * Arbitrary precision (but slow) math with NUMERIC (= DECIMAL)
 * NUMERIC does not need to be limited in scale and precision
 * BETWEEN SYMMETRIC (like BETWEEN but works backwards)
 * IS DISTINCT FROM combines = with IS NULL
 * There are more string operators and fuctions
 * Hahaha, there is a square root operator and it's |/
 * JSON instead of GROUP_CONCAT
 * Index sort order is honored
 * OR is not ridiculously expensive because indexes can be combined to some extent

 Additional features:

 * Data migrations with USING
 * UNIQUE indexes can contain more than one NULL value
 * Default values can be expressions
 * RANK() and other aggregate functions
 * VIEWs
 * Indexes with operator classes
 * ARRAYs and unnest()
 * Multiple unnest()s has two modes: same number of elements and different number of elements
 * JOINs to set returning functions like unnest() or generate_series()
 * Foreign keys can have some NULL values (see `MATCH`)
 * Bit string, text search and XML types
 * There are array types
 * There are custom composite types (structs)
 * We can run EXPLAIN on UPDATE queries if we use ROLLBACK

 Finicky features:

 * Window functions (GROUPs over other rows that share some property)
 * Table inheritance
 * Row security policies
 * Table subselects can be named with WITH

 Useless features:
 * money type

 Other features:

 * Owners
 * If permission has been given with WITH GRANT OPTION, it is revoked when the GRANT OPTION is taken away


 ENUMs:

 There are three typical ways to represent ENUMs in PostgreSQL:

 * User-defined ENUM type
 * text with CHECK constraint
 * ID and lookup table

 The text solution isn't as bad as one might think:

                                 ENUM                               text
 Key               ---------------------------------- -----------------------------------
 Length   Rows      Storage*   Index   Toast   Table   Storage*   Index   Toast   Table   Loss of text  
 -------- --------- ---------- ------- ------- ------- ---------- ------- ------- ------- -------------- 
      1   1000000          4      21       0      42          2      21       0      42           0 MB
      4   1000000          4      21       0      42          5      21       0      42           0 MB
      7   1000000          4      21       0      42          8      21       0      42           0 MB
     10   1000000          4      21       0      42         11      21       0      50           8 MB
      7   5000000          4     107       0     211          8     107       0     211           0 MB

 * Storage requirement according to docs
	To help you with the process of converting a MySQL app to PostgreSQL, I collected a list of differences between MySQL and PostgreSQL (PG).

	Important changes:

	* Strings are quoted with '...' or with $token$...$token$. Single-quotes are escaped with single-quotes.
	* Identifiers are folded to lowercase, unless they are quoted with "..." which makes them case-sensitive
	* The max identifier length is 63
	* There are some system columns that cannot be used as an identifier (probably not a problem)
	* Expressions are evaluated in arbitrary order, so WHERE x > 0 AND y/x > 1.5 can lead to a division by zero and has to be replaced with WHERE CASE WHEN y > 0 THEN y/x > 1.5 ELSE FALSE END
	* SELECT CASE WHEN x > 0 THEN x ELSE 1/0 END would still lead to a division by zero because the constant 1/0 is evaluated even before the query starts
	* Permissions can only be managed on the database level. (And row level with Row Security Policies.)

	Types:

	* SERIAL instead of AUTO_INCREMENT
	* VARCHAR (for text) doesn't need a length limit, is called text and does not allow 0-bytes and invalid UTF-8 characters
	* VARCHAR (for binary data) doesn't need a length limit and is called bytea
	* DATETIME is called timestamp
	* Casts are done by simply prepending the type (or appending ::type)
	* There's a proper boolean type
	* ENUM types have to be created explicitly or use text (see below)
	* There is a jsonb type (and json which preserves the original text)
	* There is a GUID type
	* There are IP address types
	* There are geometric types
	* There are range types (time ranges, price ranges, etc.) that can be unioned, etc.

	Features we should use to replace equivalent MySQL constructs:

	* timestamp with time zone
	* FULL OUTER JOIN
	* Foreign keys (`something_id integer references somethings` points to the primary key of `somethings`)
	* FILTER for aggregate functions
	* Additional constraints with CHECK
	* VALUES() subselects instead of SELECT 'a' UNION SELECT 'b' ...
	* Condition pushdown and subquery caching
	* Functionally dependent columns are properly detected, so we don't have to give all columns in GROUP BY if we give a unique key
	* Grouping sets are WITH ROLLUP on steroids
	* We can define distinctness with SELECT DISTINCT ON ...
	* Besides UNION we also have intersection and non-symmetric diff
	* Expression indexes
	* Partial indexes
	* RECURSIVE queries
	* DELETE/UPDATE + RETURNING
	* DELETE/UPDATE + RETURNING + WITH (note that they are running at the same time)
	* Arbitrary precision (but slow) math with NUMERIC (= DECIMAL)
	* NUMERIC does not need to be limited in scale and precision
	* BETWEEN SYMMETRIC (like BETWEEN but works backwards)
	* IS DISTINCT FROM combines = with IS NULL
	* There are more string operators and fuctions
	* Hahaha, there is a square root operator and it's \|/
	* JSON instead of GROUP_CONCAT
	* Index sort order is honored
	* OR is not ridiculously expensive because indexes can be combined to some extent

	Additional features:

	* Data migrations with USING
	* UNIQUE indexes can contain more than one NULL value
	* Default values can be expressions
	* RANK() and other aggregate functions
	* VIEWs
	* Indexes with operator classes
	* ARRAYs and unnest()
	* Multiple unnest()s has two modes: same number of elements and different number of elements
	* JOINs to set returning functions like unnest() or generate_series()
	* Foreign keys can have some NULL values (see `MATCH`)
	* Bit string, text search and XML types
	* There are array types
	* There are custom composite types (structs)
	* We can run EXPLAIN on UPDATE queries if we use ROLLBACK

	Finicky features:

	* Window functions (GROUPs over other rows that share some property)
	* Table inheritance
	* Row security policies
	* Table subselects can be named with WITH

	Useless features:
	* money type

	Other features:

	* Owners
	* If permission has been given with WITH GRANT OPTION, it is revoked when the GRANT OPTION is taken away


	ENUMs:

	There are three typical ways to represent ENUMs in PostgreSQL:

	* User-defined ENUM type
	* text with CHECK constraint
	* ID and lookup table

	The text solution isn't as bad as one might think:

	ENUM text
	Key ---------------------------------- -----------------------------------
	Length Rows Storage* Index Toast Table Storage* Index Toast Table Loss of text
	-------- --------- ---------- ------- ------- ------- ---------- ------- ------- ------- --------------
	1 1000000 4 21 0 42 2 21 0 42 0 MB
	4 1000000 4 21 0 42 5 21 0 42 0 MB
	7 1000000 4 21 0 42 8 21 0 42 0 MB
	10 1000000 4 21 0 42 11 21 0 50 8 MB
	7 5000000 4 107 0 211 8 107 0 211 0 MB

	* Storage requirement according to docs