Linkfire.md

Linkfire Data Ingestion solution

The solution divides the provided files into small pieces (1.000.000 lines) and then imports them to a relational database(MySQL) and aggregate it(index it) to the ElasticSearch.

Infrastructures and requirements

Docker as the container.
Mysql and Elasticsearch images.
.NET Core v3.1 SDK

Projects

The solution includes 5 projects:

Linkfire.DataIngestion.Core It holds domain models and provides domain and application services.
Linkfire.DataIngestion.Infrastructure It included Data storage persistence technology.
Linkfire.DataIngestion.App It is the main entry point of the app and responsible for data importing.
Linkfire.DataIngestion.UnitTests
Linkfire.DataIngestion.IntegrationTests

Algorithms and Methodologies

The Worker Service project type is used to run long-time jobs.
The solution follows the Clean Architecture and SOLID principles.
It uses MediatR as an In-Memory event bus(the other message brokers could easily replace that)

Config, Build & Run

run docker/docker-compose.yml in order to start databases.
For build and deploy main app, run ./build.sh
For splitting files into smallest pieces, run ./splitter.sh artist /path/to/Article
For import file to the relational database, run ./import.sh artist
For convert the data(thoese does not converted), run ./convert.sh

known issues

Part of the solution depends on Linux shell scripts.
Data import phase pipe could probably be implemented in a convenient and performant way with memSQL import pipe.
Lack of functional tests.
Hard-coded configurations in source.

Further Improvements

The Import process could easily be executed in parallel.
A message broker like Apache Kafka could manage events to reach out-of-process distribution.
There is room for performance improvement.
Many of the test scenarios have been ignored for the sake of time.

An API project is created to show the desired result.

linkfire :: Smart links for music marketing

Written with StackEdit.

haamond/Linkfire.DataIngestion.md