Last active
November 25, 2020 18:15
-
-
Save Gorgoras/9faca9a726e874489c2b1d76078764e4 to your computer and use it in GitHub Desktop.
Check for Datasets not being used in any Pipeline in Azure Data Factory. Just clone the repo, then set repo_path, ADF name, and run!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
# set repo location and ADF name | |
repo_path = "D:/Work/Python/Azure" | |
dataFactory_name = "Highwayman-ADFv2" | |
full_path = "/".join([repo_path, dataFactory_name]) | |
# get list of dataset and pipeline files | |
datasets = os.listdir("{}/dataset".format(full_path)) | |
pipelines = os.listdir("{}/pipeline".format(full_path)) | |
# just the name of the dataset, without .json | |
list_datasets = [x.split(".")[0] for x in datasets] | |
# iterate over pipelines looking for dataset names | |
for i in range(0, len(pipelines)): | |
f = open("{}/pipeline/{}".format(full_path, pipelines[i]), "r", encoding='utf-8') | |
stringPipe = f.read() | |
for o in range(0, len(list_datasets)): | |
# if the dataset is being used, replace it with '' | |
if list_datasets[o] in stringPipe: | |
list_datasets[o] = "" | |
# remove used datasets from the list | |
list_datasets = [s for s in list_datasets if s != ""] | |
# datasets not removed are the ones not being used | |
print("These are your datasets that are not being used.") | |
print(list_datasets) |
Absolutely! As I said on Twitter it can also be applied to linked services by iterating over the datasets. I love your module! I've been trying to find some time to study it and contribute :)
Thanks and please do. If you want I can leave this feature for you to contribute and implement. No rush.
Great, I'll try my best to rewrite this in PowerShell and make it work like you describe in the issue.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Good stuff! I think it is a very interesting idea, so I will use it to extend my PS module:
Azure-Player/azure.datafactory.tools#37