Skip to content

Instantly share code, notes, and snippets.

@haamond
Last active March 10, 2021 07:11
Show Gist options
  • Save haamond/e7a4e3ebe4d72b75039b8106ec87d1ff to your computer and use it in GitHub Desktop.
Save haamond/e7a4e3ebe4d72b75039b8106ec87d1ff to your computer and use it in GitHub Desktop.
Linkfire.md

Linkfire Data Ingestion solution

The solution divides the provided files into small pieces (1.000.000 lines) and then imports them to a relational database(MySQL) and aggregate it(index it) to the ElasticSearch.

Infrastructures and requirements

  • Docker as the container.
  • Mysql and Elasticsearch images.
  • .NET Core v3.1 SDK

Projects

The solution includes 5 projects:

  • Linkfire.DataIngestion.Core It holds domain models and provides domain and application services.
  • Linkfire.DataIngestion.Infrastructure It included Data storage persistence technology.
  • Linkfire.DataIngestion.App It is the main entry point of the app and responsible for data importing.
  • Linkfire.DataIngestion.UnitTests
  • Linkfire.DataIngestion.IntegrationTests

Algorithms and Methodologies

  • The Worker Service project type is used to run long-time jobs.
  • The solution follows the Clean Architecture and SOLID principles.
  • It uses MediatR as an In-Memory event bus(the other message brokers could easily replace that)

Config, Build & Run

  • run docker/docker-compose.yml in order to start databases.
  • For build and deploy main app, run ./build.sh
  • For splitting files into smallest pieces, run ./splitter.sh artist /path/to/Article
  • For import file to the relational database, run ./import.sh artist
  • For convert the data(thoese does not converted), run ./convert.sh

known issues

  • Part of the solution depends on Linux shell scripts.
  • Data import phase pipe could probably be implemented in a convenient and performant way with memSQL import pipe.
  • Lack of functional tests.
  • Hard-coded configurations in source.

Further Improvements

  • The Import process could easily be executed in parallel.
  • A message broker like Apache Kafka could manage events to reach out-of-process distribution.
  • There is room for performance improvement.
  • Many of the test scenarios have been ignored for the sake of time.

An API project is created to show the desired result.


linkfire :: Smart links for music marketing

Written with StackEdit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment