Skip to content

Instantly share code, notes, and snippets.

@frsyuki
Last active December 10, 2019 22:34
Show Gist options
  • Save frsyuki/7553718 to your computer and use it in GitHub Desktop.
Save frsyuki/7553718 to your computer and use it in GitHub Desktop.

Presto connector development 1

One of the very good design decisions Presto designers made is that it's loosely coupled from storages.

Presto is a distributed SQL executor engine, and doesn't manager schema or metadata of tables by itself. It doesn't manage read data from storage by itself. Those businesses are done by plugins called Connector. Presto comes with Hive connector built-in, which connects Hive's metastore and HDFS to Presto.

We can connect any storages into Presto by writing connector plugins.

Plugin Architecture

Plugin interface

interface presto.spi.Plugin is the entry point of any kinds of plugins of Presto. It uses presto.server.PluginManager loads them using Java's standard java.util.ServiceLoader. Plugin needs to implement List getServices(Class type).

PluginManager

presto.server.PluginManager calls Plugin.getServices(ConnectorFactory.class) to get presto.spi.ConnectorFactory implementations. Plugin needs to instantiate ConnectorFactory and return it.

PluginManager registers them to presto.connector.ConnectorManager (in presto-main directory). ConnectorFactory needs to implement String getName() and Connector create(connectorId, config) methods.

ConnectorManager

ConnectorManager calls ConnectorFactory.create when it gets createConnection() request from presto.metadata.CatalogManager. CatalogManager instance is owned by presto.server.PrestoServer. So, the relationship is:

  • PrestoServer has a CatalogManager
  • CatalogManager has many ConnectorManager(s)
  • ConnectorManager has many Connector(s) (managed in Map<String, Connector> name => connector)

CatalogManager

CatalogManager.loadCatalogs() reads files in a directory. Each file contains configuration of a connector. Its file format is Java's property file. CatalogManager does almost nothing excepting loading configuration files for Connector.

Connector

Actually connector has similar design with Plugin. Connector needs to implement T getService(Class type) method. ConnectorManager calls the method for each following classes:

  • connector.getService(ConnectorMetadata.class)
  • connector.getService(ConnectorSplitManager.class)
  • connector.getService(ConnectorDataStreamProvider.class)
    • DataStreamProvider can be null. If it's null, it calls:
      • connector.getService(ConnectorRecordSetProvider.class)
      • connector.getService(ConnectorHandleResolver.class)

In other words, to create our own Connector, we need to implement above interfaces. ConnectorManager registers those instances into other managers:

  • ConnectorMetadata to presto.metadata.MetadataManager
  • ConnectorSplitManager to presto.split.SplitManager
  • ConnectorDataStreamProvider to presto.split.DataStreamManager
  • ConnectorHandleResolver to presto.metadata.HandleResolver

Examples

Presto has 4 ConnectorFactory implementations:

  • presto.hive.HiveConnectorFactory
    • Reads metadata from Hive metastore
    • Reads records from HDFS using Hive record readers
  • presto.connector.jmx.JmxConnectorFactory
    • Reads metadata from a remote javax.management.MBeanServer server
    • Reads records from a remote javax.management.MBeanServer server
      • It gets mbeanServer.getAttributes and convert the attributes to records
  • presto.connector.NativeConnectorFactor
    • Reads metadata from RDBMS
    • Reads records from interface presto.metadata.LocalStorageManager
      • actual instance of LocalStorageManager is injected by Dependency Injection
      • Presto has one built-in implementation presto.metadata.DatabaseLocalStorageManager
        • DatabaseLocalStorageManager reads data from local files; list of files are managed by RDBMS again
  • presto.tpch.TpchConnectorFactory
    • Metadata is hard coded in TpchMetadata class
    • Read records from TpchBlocksProvider interface; Presto has 2 implementations:
      • DataFileTpchBlocksProvider to read data from local files
      • InMemoryTpchBlocksProvider to read data in memory
    • Tpch is mainly used for benchmark and test
@haitaoyao
Copy link

Thanks! It's very helpful to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment