My mental model of a Docker image is "the environment in which a given application runs", from the application's perspective, it has the entire machine to itself.
My mental model of a service is a single entry point ("address") that dispatches incoming service requests to one of N different processes. Each process is running inside a Docker container.
A process is an program, running inside a Docker container. A process has 0 to N service dependencies, and connects to these services by way of their addresses.
When a process with dependencies starts, it must know service addresses by either:
A) being explicitly told at invocation (docker run -e SERVICE_URL=xxx) B) discover service addresses on its own, before running the target program
It's worth noting that (A) can be accomplished by a (B) implementation, simply by reading the service addresses from the environment and returning them to the target process.
Ideally, addressable services run inside Docker. An addressable service is composed of N "service processes". Each service process may process-specific metadata that defines domain-specific information about how to connect to the service process. This metadata will be used in determining the service address.
When an addressable service process starts in Docker, one of the following must happen:
A) The service process announces its existence + metadata B) Something discovers the existence of the service process + metadata
After the service's existence is known, all of the following must happen:
A) Something remembers this servicen process's existence
B) Something determines the address for the service,
given all the service processes in existence.
C) Something must notice when the service process goes away,
and re-define the service address accordingly (if at all).
Simply knowing the existence of a collection of service processes is likely not enough information to determine the service address. For example, running a clustered PostgreSQL service requires that the entry point load balancer (like pgPool) must know the authentication credentials for every PostgreSQL server back-end instance. Clients will connect to pgPool as the "address", and be unaware of the back-end instances.
Any "director" type program that knows about all PostgreSQL instances and produces the address for the pgPool load balancer must have access to these credentials.
As such, service processes need to be able to declare process-specific metadata.
When a process starts, it must be aware of its dependency addresses and their associated metadata. I propose the following:
A) An /environment directory in the process's container
B) An executable program for each environment variable/address to be discovered
C) Before executing the process (whatever the target of run
is), sequentially
execute the programs in /environment and add their outputs to the current
environment.
Example:
/environment/DATABASE_URL:
#!/bin/bash
# In a large installation, reach out to Zookeeper to find my database.
curl http://zookeeper.vip.phx.ebay.com/znodes/v1/dbcluster/master?dataformat=utf8
And on execution:
root@46af61bef758:/# /environment/DATABASE_URL
postgresql://Ohm3quu7:[email protected]/Ieb8owee
root@46af61bef758:/#
And this ought to run in a harness, after which, the environment variable is set:
DATABASE_URL=postgresql://Ohm3quu7:[email protected]/Ieb8owee
There are several key advantages to this approach:
A) It allows for deployment-specific code to determine service addresses
and metadata, and this deployment-specific code runs in the application
container, not in the host launching the docker
process.
B) It can trivially work without central coordination. The Zookeeper
example provided demonstrates that it can work with central coordination,
but it's certainly reasonable for these environment-variable scrpts to
do trivial work, like pass-though environment variables set by the docker
invocation, or read data from the local dockerd
.
Clearly, you will not be connecting to the production database from development. We usually have separate configs for development, testing, and production, which can be trivially implemented with this approach:
my-project/ environments/ production/ DATABASE_URL MEMCACHE_SERVER PAYPAL_CLIENT_KEY PAYPAL_CLIENT_SECRET development/ DATABASE_URL MEMCACHE_SERVER PAYPAL_CLIENT_KEY PAYPAL_CLIENT_SECRET
And when you run your app, define the "environment" with a volume mapping:
ted@workstation:~/my-project $ docker run -v ./environments/development:/environment -d myproject
The "environment harness" then executes the environment programs before executing the main cmd
of the container.
On the other side of the coin, when a service process starts, it must make itself and its metadata known to the world.
I propose the mirror-image of the discovery method:
A) An /export directory in the service process's container B) An executable program in /export for each variable to be declared C) A special executable, /export/bootstrap, that is run before any of the declaration executables are run. D) When the container starts, a harness runs /export/bootstrap and then runs each of the declaration executables.
Like above, these declaration variables are free to do what they please, for example, register the process with an instance of Zookeeper in production. They are not required to print anything to STDOUT, as this sort of metadata registration will be installation specific.
All of the above can be implemented without any changes to Docker, but the single-machine case can be well served by some minor changes:
A) Allow containers to POST arbitrary key/value pairs to dockerd
B) Allow containers to GET these key/value pairs from dockerd
This allows us to do the following:
Service Process Starting Locally
/export/DATABASE_URL:
#!/bin/bash
curl -X POST -d "postgresql://pgdeveloper:pgdeveloper@$CONTAINER_HOST/pgdeveloper" \
http://$DOCKERD/environment/DATABASE_URL
Application Process Starting Locally
/environment/DATABASE_URL:
#!/bin/bash
curl http://$DOCKERD/environment/DATABASE_URL
These could easily be shortened with some syntactic sugar, but you get the idea.