The branch used for this test was this one.
After checking it out, and installing everything in a virtual environment, you will need pip
and Docker
(or have a running Postgres server).
# install Postgres Python driver/module
pip install psycopg2
# start a postgres database for the suite five (or any other suite you'd like to try)
docker run --name cylc-postgres -p 5432:5432 -e POSTGRES_PASSWORD=cylc -e POSTGRES_USER=cylc -e POSTGRES_DB=cylc-five -d postgres
Finally,
(venv) kinow@ranma:~/Development/python/workspace/cylc-flow$ cylc run --no-detach five
._.
| | The Cylc Suite Engine [8.0a1]
._____._. ._| |_____. Copyright (C) 2008-2019 NIWA
| .___| | | | | .___| & British Crown (Met Office) & Contributors.
| !___| !_! | | !___. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
!_____!___. |_!_____! This program comes with ABSOLUTELY NO WARRANTY.
.___! | It is free software, you are welcome to
!_____! redistribute it under certain conditions;
see `COPYING' in the Cylc source distribution.
2020-02-02T04:39:22Z INFO - Suite server: url=tcp://ranma:43028/ pid=21747
2020-02-02T04:39:22Z INFO - Suite publisher: url=tcp://ranma:43077
2020-02-02T04:39:22Z INFO - Run: (re)start=0 log=1
2020-02-02T04:39:22Z INFO - Cylc version: 8.0a1
2020-02-02T04:39:22Z INFO - Run mode: live
2020-02-02T04:39:22Z INFO - Initial point: 20130808T0000Z
2020-02-02T04:39:22Z INFO - Final point: None
2020-02-02T04:39:22Z INFO - Cold Start 20130808T0000Z
2020-02-02T04:39:22Z INFO - [prep.20130808T0000Z] -submit-num=01, owner@host=ranma
2020-02-02T04:39:22Z INFO - [prep.20130808T0000Z] -triggered off []
2020-02-02T04:39:23Z INFO - [prep.20130808T0000Z] status=ready: (internal)submitted at 2020-02-02T04:39:23Z for job(01)
2020-02-02T04:39:23Z INFO - [prep.20130808T0000Z] -health check settings: submission timeout=None
2020-02-02T04:39:24Z INFO - [client-command] put_messages kinow@ranma:cylc-message
2020-02-02T04:39:24Z INFO - [prep.20130808T0000Z] status=submitted: (received)started at 2020-02-02T04:39:24Z for job(01)
2020-02-02T04:39:24Z INFO - [prep.20130808T0000Z] -health check settings: execution timeout=None
You can check the folders where you would normally find the db
SQLite databases. They should both be
And you can also confirm that you have now everything stored in your remote/local Postgres database.
kinow@ranma:~$ docker exec -t -i cylc-postgres /bin/bash
root@8e89eba5cb0c:/# psql -h localhost -U cylc -W cylc-five
Password:
psql (12.1 (Debian 12.1-1.pgdg100+1))
Type "help" for help.
cylc-five=# select * from suite_params;
key | value
--------------+--------------------------------------
uuid_str | d8cab0e4-cb72-44ba-92ba-9a28bf8d061c
cylc_version | 8.0a1
UTC_mode | true
(3 rows)
cylc-five=# select * from task_pool;
cycle | name | spawned | status | is_held
----------------+------+---------+---------+---------
20130808T0000Z | prep | 1 | running | f
20130808T0000Z | foo | 0 | waiting | f
20130808T0000Z | bar | 0 | waiting | f
(3 rows)
cylc-five=#
That's it. Everything works! 🎉
ps: if you have already executed the suite, and would like to re-create the Postgres database, it takes 2-3 seconds:
$ docker stop cylc-postgres; docker rm cylc-postgres; docker run --name cylc-postgres -p 5432:5432 -e POSTGRES_PASSWORD=cylc -e POSTGRES_USER=cylc -e POSTGRES_DB=cylc-five -d postgres
There is a draft pull request that adds support to SQLAlchemy Core (not ORM) here.
All tests passed. It does add SQLAlchemy core, but alas it doesn't mean it's that simple to use Cylc with different databases. Here are the pending issues:
- Cylc database is single-tenant. Schema changes would be required to allow multi-tenancy (regarding workflows).
- The INSERT statements in Cylc are performed using SQLite
INSERT OR REPLACE
; that's problematic because not all databases support it. And the ones that do, offer different ways of doing it. - The schema uses
INTEGER
for some fields likeis_held
, and passesfalse
values for example. That works with SQLite usingsqlite3
, and with SQLAlchemy when the database is SQLite. With Postgres,false
can only be used forBOOLEAN
types. I suspect the same is true for Oracle & MySQL. And I haven't tested, but I think SQLite should work withBOOLEAN
too. - Cylc uses two databases, private and public. This obviously doesn't work with Postgres, Oracle, MySQL, etc.
- We would need to extend the Cylc configuration file, to support passing the connection parameters for each
implementation. If a site uses Postgres, then we would need
user
,port
,database
, etc. But also, the user could want to customize the connection string, to pass extra parameters, or use a different implementation (a different Python module).
So here's a summary of what changes were required, besides the work in the draft pull request.
Cylc Flow uses sqlite
for its database. It keeps two databases, private & public. When the scheduler starts
it creates a private database. This is the most important database.
Then it copies the private database as public database. And in some parts of the process, the scheduler and other components of the system will send queries to the private database, or to both.
Errors in the public database are logged. Errors in the private database result in system errors.
This commit contains a change disables the creation of the public database, and also comments some code so that it is not called causing runtime errors.
The connection URL used by SQLAlchemy for SQLite is something like sqlite:///home/kinow/five/log/db
. But for Postgres,
that could be postgres://cylc:cylc@localhost:5432/cylc-five
. Or it could have extra parameters like
psycog2://cylc:cylc@localhost:5432/cylc-five?timeout=???
(if supported, haven't tested but quite sure it
has this feature just like Hibernate).
So I've hardcoded the connection string to self.conn_url = "postgresql://cylc:cylc@localhost/cylc-five"
in this commit.
That was due to Postgres not liking to have a field INTEGER
receiving a false
or true
as value.
Same commit mentioned before.
Postgres supports UPSERT. It can update or ignore when there are duplicates. But you must specify the constraint name or the index used to identify duplicated values. And also need to specify what gets updated.
This is clearly more complex than INSERT OR REPLACE
from SQLite. MySQL has a similar feature. Not
quite sure if there is a similar feature in Oracle.
It would be much easier to support multiple databases if we used just INSERT
. Not sure if doable,
as that would mean either querying the database to confirm whether values exist or not, or keep
some variable in memory or a cache in memory I think.
Done in the same commit as before.