This demo is mostly meant as a proof of concept for use in something like a Jupyter notebook, like for exploratory data analysis, for example. You can also authenticate with lib.auth() using a Service Principal or username/password/tenant ID combo. In this demo, you can load a Pandas Dataframe into memory from a CSV file that resides on Azure Data Lake.
Created
September 10, 2019 17:01
-
-
Save richiefrost/839ad8a7544b0064370caa3c0dc482df to your computer and use it in GitHub Desktop.
Pandas read_csv from Azure Data Lake with interactive login
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from azure.datalake.store import core, lib, multithread | |
import pandas as pd | |
class ADLSHelper: | |
def __init__(self, store_name='mystorename'): | |
""" | |
When initializing this helper, it will prompt you to do an interactive login to connect to your data lake store. | |
It uses Azure Active Directory for authentication, and you use the token returned from | |
your login process to connect to your Azure Data Lake instance. | |
You can also authenticate with username/password or ServicePrincipal for production. | |
""" | |
token = lib.auth() | |
self.client = core.AzureDLFileSystem(token, store_name=store_name) | |
def get_df(self, dataframe_path): | |
""" | |
Reads the Pandas Dataframe from your Azure Data Lake instance at the given path. | |
Dataframe is loaded into memory, not saved to disk. | |
""" | |
with self.client.open(dataframe_path) as dataframe_file_ptr: | |
df = pd.read_csv(dataframe_file_ptr) | |
return df | |
# Example | |
helper = ADLSHelper(store_name='dog_facts_fake_datalake') | |
# At this point you'll be asked to log in. You click a link to go to a separate screen and input a unique code generated here. | |
# Once you've connected, you can get the dataframe like so: | |
df = helper.get_df('/archive/interesting_facts/dog_facts.csv') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment