Skip to content

Instantly share code, notes, and snippets.

@jdthorpe
Last active August 9, 2017 16:39
Show Gist options
  • Save jdthorpe/0fd799562a7f421d2aaf62378f6ddba6 to your computer and use it in GitHub Desktop.
Save jdthorpe/0fd799562a7f421d2aaf62378f6ddba6 to your computer and use it in GitHub Desktop.
Returning Multiple Result Sets from a Azure Data Warehouse

Getting data from Azure SQL Data Warehouse into R

TL;DR

This note describes the process of automating the execution of queries against an Azure SQL Data Warehouse using Active Directory Integrated Authentication to produce (possibly multiple) CSV files

Automation with SqlCmd: Which SQLCMD to use?

There are (at least) 3 versions of SQLCMD out there:

  • SQLCMD.exe ships with SQL server, and which comes in both 32-bit and 64-bit versions, which are located here:

     C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\sqlcmd.exe
     C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\sqlcmd.exe
    
  • The Powershell command Invoke-Sqlcmd which is installed with SQL Server Management Studio, and another Invoke-Sqlcmd command which is part of the SQLServer PowerShell Module

The advantages of using the PowerShell commands is that they return .NET objects that represent the returned (and parsed) result sets whereas SQLCMD.exe only returns text output (and cannot return a valid CSV file!). Between the two versions of the invoke-sqlcmd, only the latter one allows the use of connection strings (which is required for using Active Directory Integrated Authentication) and supports the -OutputAs option (with it's alias -as), which allows returning multiple result sets (queries) from a single script, in addition to several other improvements.

Specifically, we'll use the -as DataTables option so that the command will return an array of table objects -- one for each query in the script which makes it easy to handle multiple related queries in a single script.

Hence, you'll want to install the latter invoke-sqlcmd by opening PowerShell as Admin and typing:

install-module -name SqlServer -AllowClobber

(Re-)Installing Active Directory Authentication Library

If you've installed Sql Studio Management Studio, you'll have the 32-bit version of the ADA library (i.e. adalsql.dll), and if you've installed SQL Server Data Tools you'll have the 64-bit version of adalsql.dll installed, but as of this writing, these are buggy, and if you get an error message stating "Unable to load adalsql.dll", then you'll need to go to settings > Add or Remove Programs and then uninstall Active Directory Authentication Library and re-install it from here

Also, by way of background, the 32-bit version of SQLCMD.exe requires the 32-bit adalsql.dll, and all the remaining options require the 64-bit version.

Executing Queries

Finally, to execute a script against an Azure Data Warehouse using Active Directory Authentication, you can use a PowerShell Script like this:

import-module SqlServer

# RUN THE QUERIES
$my_tables = invoke-sqlcmd `
	-Query "SELECT * FROM Customers; SELECT * FROM Products"  `
	-QueryTimeout (60*90) ` # * see note below
	-ConnectionString "..."
-- OR -- 
$my_tables = invoke-sqlcmd  `
	-InputFile path/to/my/queries.sql   `
	-QueryTimeout (60*90) `
	-ConnectionString "..."

# SAVE THE RESULTS TO FILE
$my_tables[0] | 
    Export-Csv -NoTypeInformation `
               -Path "Customers.csv" `
               -Encoding UTF8

$my_tables[1] | 
    Export-Csv -NoTypeInformation `
               -Path "Products.csv" `
               -Encoding UTF8

Where -ConnectionString is a valid SQL Server connection string which includes the AD Authentication option Authentication=Active Directory Integrated;

* Note that there is a bug in both versions of Invoke-Sqlcmd that will cause them to time out in the absence of the -querytimeout parameter (despite the docs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment