The Datahub::Factory is a Catmandu based toolkit which allows easy and efficient setup and management of ETL pipelines. A pipeline transforms and transports data between two systems. The set of primary use cases for which this toolkit was conceived is situated within the GLAM (Galleries, Libraries, Archives & Museums) domain.
Out of the box, the Datahub::Factory is a generic, extensible toolkit. While you can use the importer and exporter modules that are included with the core app, you can extend the functionality with your own custom modules.
The Arthub Flanders platform is a digital platform governed by the Flemish Art Collection non-profit. The Datahub::Factory is used to ingest metadata from a wide array of content providers into the platform.
In order to accomodate to the particularities of the platform, a separate set of modules called Datahub::Factory::Arthub was created. These modules contain specific pre-processing logic proper to the ecosystem.
You will need the Datahub::Factory and all it's dependencies installed. We'll assume you have a functional Perl environment and all system dependencies were met.
You will need OpenSSL. On mac with Homebrew: $ brew install openssl
on Ubuntu: apt-get install openssl
.
On a mac, you'll need to run this command if you installed OpenSSL through homebrew. Make sure you change the paths so they point to the appropriate locations.
$ OPENSSL_INCLUDE=/usr/local/Cellar/openssl/1.0.2l/include OPENSSL_LIB=/usr/local/Cellar/openssl/1.0.2l/lib/ cpanm --notest Datahub::Factory::Arthub
On other platforms:
$ cpanm --notest Datahub::Factory::Arthub
This should install these modules:
- Datahub::Factory::Arthub
- Datahub::Factory
- Catmandu
- ... All the sub-dependencies.
Running the bleeding edge version from Github via Carton. If carton has not been installed already:
$ cpanm --notest Carton
Then:
$ git clone https://github.com/thedatahub/Datahub-Factory-Arthub
$ cd Datahub-Factory-Arthub
$ carton install
$ carton exec dhconveyor transport -p <my-pipeline.ini>
You can also install the Arthub modules manually in a Perl environment.
$ git clone https://github.com/thedatahub/Datahub-Factory-Arthub
$ cd Datahub-Factory-Arthub
$ cpanm --notest --installdeps .
$ perl Build.pl
$ ./Build && ./Build install
$ dhconveyor transport -p <my-pipeline.ini>
$ git clone https://github.com/thedatahub/Datahub-Factory
$ cd Datahub-Factory
$ cpanm --notest --installdeps .
$ perl Build.pl
$ ./Build && ./Build install
$ dhconveyor transport -p <my-pipeline.ini>