Created
January 15, 2019 19:53
-
-
Save ebernhardson/54e3bb60f234a0f956d824c55d673cb0 to your computer and use it in GitHub Desktop.
MLR Pipeline Sequence Diagram
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@startuml | |
== click log generation == | |
oozie -> oozie: schedule label generation | |
note left | |
arrow signify initiator | |
of communication, not | |
data flow | |
end note | |
activate oozie | |
database hdfs | |
oozie -> hdfs: retrieve click data | |
oozie -> hdfs: retrieve query data | |
oozie -> oozie: compute search click | |
oozie -> hdfs: store search click | |
deactivate oozie | |
== sampling and labeling == | |
actor operator | |
operator -> "mjolnir (spark)": start mjolnir | |
activate "mjolnir (spark)" | |
"mjolnir (spark)" -> hdfs: retrieve search click | |
"mjolnir (spark)" -> "mjolnir (spark)": grouping queries (1st pass, stemming) | |
"mjolnir (spark)" -> "kafka": grouping queries (2nd pass, clustering) | |
"inactive search cluster (codfw)" -> "kafka": retrieve queries to be run | |
"inactive search cluster (codfw)" --> "kafka": send query results back | |
"mjolnir (spark)" -> "kafka": retrieve results of grouping queries | |
"mjolnir (spark)" -> "mjolnir (spark)": sampling | |
"mjolnir (spark)" -> "mjolnir (spark)": label generation\nwith DBN click model | |
== feature vector retrieval == | |
database kafka | |
"mjolnir (spark)" -> kafka: send queries for feature vectors | |
"inactive search cluster (codfw)"-> kafka: retrieve queries to be analyzed | |
"inactive search cluster (codfw)"--> kafka: send feature vectors back | |
"mjolnir (spark)" -> kafka: retrieve feature vectors | |
"mjolnir (spark)" -> "mjolnir (spark)": feature selection | |
"mjolnir (spark)" -> hdfs: store query x feature vectors matrix\n(training data) | |
== machine learning == | |
"mjolnir (spark)" -> hdfs: retrieve query x feature vectors matrix | |
"mjolnir (spark)" -> "mjolnir (spark)": create decision trees with xgboost\n | |
"mjolnir (spark)" -> operator: store decision trees | |
deactivate "mjolnir (spark)" | |
== upload to production == | |
operator -> "elasticsearch\ncirrus": upload decision trees to production | |
note right | |
upload to production | |
isn't automated yet | |
end note | |
@enduml |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment