DataCat plans

Background:

TaskCat needs a dedicated reporting micro-service. This epic should serve as the accumulator of all the associated work with TaskCat reporting. This background serves as a proposal for TaskCat reporting system.

Components and protocols

Api layer design:

The api layer's design guidelines would be:

Making sure the api layer only exposes the underlying mongodb's READ facilities, in no circumstances the write/update/create operations should be exposed.
Making sure the api layer is authenticated through TaskCat.Auth.
The api layer should accept native json Mongodb query objects and should know how to translate them into proper mongodb queries.
The api layer should know how and when to translate the result set to proper POCO if need be since a lot of the property gets populated through that. Note to remember here is if .net core is used we might run into some new formatter specific code here.
The api layer should be able to translate and execute aggregation over the retrieved result.
The api layer should be able to provide saved queries and aggregations for future use.

Exposing mongodb read facilities:

Mongodb C# driver is capable of finding JSON documents based on a json query. These queries can be invoked with collection.Find() method from the driver. We might also need to make sure that we can translate a native aggregation pipeline from mongodb. Currently no concrete spec comes in mind but a simple json model like the following would suffice. Due to C# driver complexities, we might need a projection segment too. Im not entirely sure of this. But if that comes down to this, we can associate a query with projection payload too. It wont be needed if we see that it is possible to embed this with a regular query. Or if we end up using this we should not add a {{project}} unless we need some of the properties of the objects in result.

{
    "query": "{<Some query goes here>}",
    "project": "{<Projection payload goes here>} // Optional, only should be used if needed" 
}

For aggregation the format should be the same

{
    "aggregate": "[{<Aggregate Pipeline object goes here>}]"
}

The api layer has the right to reject a request if {{query}} and {{aggregate}} exists in the same payload. Please keep in mind this aggregation is pointing to mongodb aggregation, not the aggregation aforementioned in 5.

h5. Authentication through TaskCat.Auth: TaskCat.Auth authentication is bound to JWT tokens and OAuth2. TaskCat.Auth uses role based authentication, for now the reporting microservice should only be exposed to BackOfficeAdmin and Administrator role users. Enterprise users do have some need of it and we would eventually find out how to present things there in need.

Translating to proper mongodb queries:

The api layer retrieves native mongodb query from the request payload. But it needs to translate it to a proper find or aggregate based on the model provided. For aggregate it should make sure if we have a out or map-reduce sequence defined then it should return back a proper reference or result so the user can access when the request is finished. Mongodb projection related issues are already discussed above.

Api layer translation of retrieved result:

If a retrieved list is not projected we can essentially retrieve the full POCO version we currently use for our Odata queries. The upside is here that we will get properties that are not present in database models which is essentially a big plus sometimes. Properties like HRIDState is valuable for human readable reports and projecting to special columns might take it away. If we end up projecting in API side, this could be averted or we can save properties like HRIDState in the database otherwise. In any case, the api layer should understand when to invoke that based on the request provided.

Api layer aggregation over the retrieved result:

This should not be translated as mongodb's native aggregation pipeline. Since reports often summarizes the result, we would like to extend some of the facilities here. The API layer should be able to translate a result object and aggregate over those. For example, if we want summation of a property, we should be able to write a sample query like this:

{
    "query": "{<Some query goes here>}",
    "project": {
        "count": 1,
        "name": 1
    }, 
    "apiAggregate": [{"$sum": "count"}]
}

Here the $ essentially denotes that that is an api level aggregation operator and it should add a new row in the end where it says the summation of count in every result object. For beginners $avg, $sum, $distinctCount, $count would be sufficient set of operators we support.

Mongodb 3.4 came with facets. We might want to use that if need be. And custom api level aggregation extensions that will aggregate and generate a result to append with the result set should be allowed if possible.

Saved queries and aggregations for future use:

Along with real-time query and results the api layer should be able to save a query format and be able to reuse it later if asked. Queries can be saved against a user or as a public query if need be. Every query name has to be unique or the api layer should take the steps to make sure it is unique for a user or public domain. A sample query template should be close enough to the following:

{
    "query": "{name: %name%}",
}

This query template says that we can search on any name and all the user has to provide from their side is the name instead of the full payload. If the query is saved before the user should be able to invoke

{
    "query_template" : "Template_name",
    "inputs": {
        "name": "prateek"
    }
}

For repetitive data crunching micro services it would be much nicer to use python extensions. But that should only be used when are generating really deeper data set.

Multiple reporting format

Each and every request has to serve proper format. Formats including .xlsx and csv. A separate format parameter would do the proper judgement here.

Front-End design guidelines.

Front end design guidelines are based on Angular2. It should essentially use a same set of configs like the api layer but should also contain front-end layer configs for perfect representation of the data.

For a tabular config a sample can be:

{
    "dataQuery": {
        "query": "{<Some query goes here>}",
        "project": {
            "count": 1,
            "name": 1
        }, 
        "apiAggregate": [{"$sum": "count"}]
    },
    "dataView": {
        "type": "table // Defines that the representation view is a table", 
        "columnDef": [
            {
                "name":"Name // Defines the column name", 
                "ref": "name // Defines the data matched from result, we are only getting count and name back", 
            },
            {
                "name":"Count",
                "ref": "count"
            }
        ],
        "includeTitleRow": true
    }
}

For this specific representation {{columnDef}} should be arranged against a left-to-right formation in the UI and the array index should say the index of column from the leftmost side.

I would add more and more info after a series of reading. Please keep adding questions and references if need be. We haven't landed on a set of UI controls we would essentially end up using.

thehoneymad/DataCat.md