One of the very good design decisions Presto designers made is that it's loosely coupled from storages.
Presto is a distributed SQL executor engine, and doesn't manager schema or metadata of tables by itself. It doesn't manage read data from storage by itself. Those businesses are done by plugins called Connector. Presto comes with Hive connector built-in, which connects Hive's metastore and HDFS to Presto.
We can connect any storages into Presto by writing connector plugins.
interface presto.spi.Plugin is the entry point of any kinds of plugins of Presto. It uses presto.server.PluginManager
loads them using Java's standard java.util.ServiceLoader.
Plugin needs to implement List getServices(Class type).
presto.server.PluginManager
calls Plugin.getServices(ConnectorFactory.class)
to get presto.spi.ConnectorFactory implementations.
Plugin needs to instantiate ConnectorFactory and return it.
PluginManager registers them to presto.connector.ConnectorManager
(in presto-main directory).
ConnectorFactory needs to implement String getName() and Connector create(connectorId, config) methods.
ConnectorManager calls ConnectorFactory.create
when it gets createConnection()
request from presto.metadata.CatalogManager
.
CatalogManager instance is owned by presto.server.PrestoServer
. So, the relationship is:
- PrestoServer has a CatalogManager
- CatalogManager has many ConnectorManager(s)
- ConnectorManager has many Connector(s) (managed in Map<String, Connector> name => connector)
CatalogManager.loadCatalogs()
reads files in a directory. Each file contains configuration of a connector. Its file format is Java's property file.
CatalogManager does almost nothing excepting loading configuration files for Connector.
Actually connector has similar design with Plugin. Connector needs to implement T getService(Class type) method. ConnectorManager calls the method for each following classes:
- connector.getService(ConnectorMetadata.class)
- connector.getService(ConnectorSplitManager.class)
- connector.getService(ConnectorDataStreamProvider.class)
- DataStreamProvider can be null. If it's null, it calls:
- connector.getService(ConnectorRecordSetProvider.class)
- connector.getService(ConnectorHandleResolver.class)
- DataStreamProvider can be null. If it's null, it calls:
In other words, to create our own Connector, we need to implement above interfaces. ConnectorManager registers those instances into other managers:
- ConnectorMetadata to
presto.metadata.MetadataManager
- ConnectorSplitManager to
presto.split.SplitManager
- ConnectorDataStreamProvider to
presto.split.DataStreamManager
- ConnectorHandleResolver to
presto.metadata.HandleResolver
Presto has 4 ConnectorFactory implementations:
- presto.hive.HiveConnectorFactory
- Reads metadata from Hive metastore
- Reads records from HDFS using Hive record readers
- presto.connector.jmx.JmxConnectorFactory
- Reads metadata from a remote
javax.management.MBeanServer
server - Reads records from a remote
javax.management.MBeanServer
server- It gets mbeanServer.getAttributes and convert the attributes to records
- Reads metadata from a remote
- presto.connector.NativeConnectorFactor
- Reads metadata from RDBMS
- Reads records from interface
presto.metadata.LocalStorageManager
- actual instance of LocalStorageManager is injected by Dependency Injection
- Presto has one built-in implementation
presto.metadata.DatabaseLocalStorageManager
- DatabaseLocalStorageManager reads data from local files; list of files are managed by RDBMS again
- presto.tpch.TpchConnectorFactory
- Metadata is hard coded in
TpchMetadata
class - Read records from
TpchBlocksProvider
interface; Presto has 2 implementations:DataFileTpchBlocksProvider
to read data from local filesInMemoryTpchBlocksProvider
to read data in memory
- Tpch is mainly used for benchmark and test
- Metadata is hard coded in
Thanks! It's very helpful to me.