Kedro layer | Comment |
---|---|
raw |
In this situation 3 data source are described: an Excel file, a multi-part CSV export from a database as well as a single CSV export from a personnel management system. |
intermediate |
The intermediate layer is a typed mirror of the raw layer with a minor transformation applied to the equipment extract since the multi-part data received has been concatenated into a single parquet dataset. |
primary |
Two domain level datasets have been constructed from the intermediate layer which model equipment shutdowns and operator actions. |
feature |
Several features have been constructed form the primary layer which represent variables we think may be predictors of equipment shutdowns such as the maintenance schedule and recent shutdowns. |
model_input |
Two model inputs have been created since we are experimenting with two modeling approaches, one time-series based and another equipment centric without a temporal element. |
models |
The trained models constructed have been serialise |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Generate the commit message using llm | |
current_branch=$(git rev-parse --abbrev-ref HEAD) | |
git_diff=$(git diff "$current_branch") | |
commit_message=$(echo "$git_diff" | llm prompt --model gpt-4o-mini " | |
- Generate a conventional commit message based on the provided git diff. | |
- Start with one of the following prefixes: 'build', 'chore', 'ci', 'docs', 'feat', 'fix', 'perf', 'refactor', 'revert', 'style', 'test'. | |
- Summarize the changes at a high level without listing every code modification. | |
- Use concise bullet points to describe key changes (up to 5 bullets). | |
- Skip detailed descriptions for cosmetic changes by ruff. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
This module provides custom Kedro dataset | |
""" | |
import hashlib | |
import json | |
import logging | |
from pathlib import Path | |
from typing import Any, Dict, Optional, Union | |
from urllib.parse import urlparse |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def create_template_pipeline() -> Pipeline: | |
""" Template declareed here with real inputs, but placeholder outputs and parameters """ | |
return Pipeline( | |
[ | |
node( | |
func=create_model_inputs, | |
inputs=[ # These inputs are never overriden | |
"feat_days_since_last_shutdown", | |
"feat_days_between_shutdown_last_maintenance", | |
"feat_fte_maintenance_hours_last_6m", |
Layer | Order | Description |
---|---|---|
raw |
Sequential | Initial start of the pipeline, containing the sourced data model(s) that should never be changed, it forms your single source of truth to work from. These data models can be un-typed in most cases e.g. csv , but this will vary from case to case. Given the relative cost of storage today, painful experience suggests it's safer to never work with the original data directly! |
intermediate |
Sequential | This stage is optional if your data is already typed. Typed representation of the raw layer e.g. converting string based values into their current typed representation as numbers, dates etc. Our recommended approach is to mirror the raw layer in a typed format like Apache Parquet. Avoid transforming the structure of the data, but simple operations like cleaning up field names or 'unioning' mutli-part CSVs are permitted. |
primary |
Sequential |