The solution divides the provided files into small pieces (1.000.000 lines) and then imports them to a relational database(MySQL) and aggregate it(index it) to the ElasticSearch.
- Docker as the container.
- Mysql and Elasticsearch images.
- .NET Core v3.1 SDK
The solution includes 5 projects:
Linkfire.DataIngestion.Core
It holds domain models and provides domain and application services.Linkfire.DataIngestion.Infrastructure
It included Data storage persistence technology.Linkfire.DataIngestion.App
It is the main entry point of the app and responsible for data importing.Linkfire.DataIngestion.UnitTests
Linkfire.DataIngestion.IntegrationTests
- The Worker Service project type is used to run long-time jobs.
- The solution follows the Clean Architecture and SOLID principles.
- It uses MediatR as an In-Memory event bus(the other message brokers could easily replace that)
- run
docker/docker-compose.yml
in order to start databases. - For build and deploy main app, run
./build.sh
- For splitting files into smallest pieces, run
./splitter.sh artist /path/to/Article
- For import file to the relational database, run
./import.sh artist
- For convert the data(thoese does not converted), run
./convert.sh
- Part of the solution depends on Linux shell scripts.
- Data import phase pipe could probably be implemented in a convenient and performant way with memSQL import pipe.
- Lack of functional tests.
- Hard-coded configurations in source.
- The Import process could easily be executed in parallel.
- A message broker like Apache Kafka could manage events to reach out-of-process distribution.
- There is room for performance improvement.
- Many of the test scenarios have been ignored for the sake of time.
An API project is created to show the desired result.
linkfire :: Smart links for music marketing
Written with StackEdit.