Skip to content

Instantly share code, notes, and snippets.

@mserranom
Last active May 31, 2018 16:39
Show Gist options
  • Select an option

  • Save mserranom/4e94f2267961b620be6648bf54302581 to your computer and use it in GitHub Desktop.

Select an option

Save mserranom/4e94f2267961b620be6648bf54302581 to your computer and use it in GitHub Desktop.
feature_store_mvp_backend.md

Feature Store MVP Backend

Data

Entities and cardinality:

                                                  1 +------------+
                                             +------+ stream_dtc |
+-------------+ 1     n +-------------+ 1    |      +------------+
| data_source +---------+ feature_set +------+
+-------------+         +------+------+      |    1 +------------+
                               | 1           +------+ window_dtc |
                               |                    +------------+
                               |
                               |
                               | n
                    +----------+----------+ 1       1 +---------------+
                    | feature_set_version +-----------+ training_data |
                    +---------------------+           +---------------+

Endpoints

GET /data_sources

Response body

{
   "data_sources":[
      {
         "name":"denmark_weather",
         "url": "<S3_URL>"
      },
       {
         "name":"denmark_traffic",
         "url": "<S3_URL>"
      }
    ]
}

GET /feature_sets

Retrieves a list of feature sets:

Response body

{
   "feature_sets":[
      {
         "name":"copenhagen_rainfall",
         "data_source": "denmark_weather",
         "versions":{
            "1.0": {
                "stream_dtc":"<PLAIN_YAML>",
                "window_dtc":"<PLAIN_YAML>",
                "training_data_generation_status": "idle|training|success|failed",
                "training_data_generation_duration": "<DURATION_IN_SECONDS_IF_NOT_IDLE>"
            },
            "2.0": {  }
         },
         "latest_version": "2.0"
      },
      {
         "name":"copenhagen_traffic",
         "data_source":"denmark_traffic",
         "versions":{  },
         "latest_version":{  },
         "training_data_generation_status" : "training|idle|failed"
      }
   ]
}

POST /feature_set

Creates a new feature set using an existing data source (s3 file supported only)

Request body

{
   "name": "feature_set_name",
   "data_source" : "<FULL_S3_URL>"
}

Response body

Same as GET /feature_sets, including the new feature set (that will have versions empty)

GET /s3_file_content?url=<FULL_S3_URL>

Returns the text content of the s3 file (s3://bucket/key).

POST /training_data

Starts a spark job to generate training data

Request body

{
   "s3_source": "<FULL_S3_URL_OF_SOURCE_CONSUMED>",
   "s3_target" : "<FULL_S3_URL_OF_TRAINING_DATA_DESTINATION>",
   "stream_dtc" : "<FULL_S3_URL>",
   "window_dtc" : "<FULL_S3_URL>"
}

Response body

Same as /feature_sets, with the training_status property updated accordingly.

POST /feature_set/<FEATURE_SET_NAME>/dtc

Creates/Updates DTCs in a feature set. It takes effect on latest_version only.

Request body

{
   "stream_dtc" : "<PLAIN_YAML>",
   "window_dtc" : "<PLAIN_YAML>"
}

Response body

{
   "status" : "success|failed",
   "error_message" : "<USER_FEEDBACK_IF_STATUS_FAILED>"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment