Loading Data from Azure Data Lake Store to Azure SQL Server via SSIS
Azure Data Lake store is the storage account in Azure and Azure SQL Server is the SQL Server instance in Azure. Data Lake can store unlimited data for both structed and instructed format and quite often we need to load data from Data Lake to Azure SQL Server to either build data warehouse or just process the data for reporting.
The common approach to load data from Azure Data Lake to Azure SQL Server is using Data Factory to a pipeline and specify the schedule for the pipeline. However, if you are more familiar with SSIS, you can use SSIS package to complete the same task and the procedure is straightforward. After the SSIS package is built, you can deploy it to the Azure SQL server and set the schedule at the On-Premise SQL Server by setting the Azure SQL server as the linked server to the On-Premise Server.
The task can be accomplished in the following 4 steps:
- Created Data Lake Store and Azure SQL Server instance at Azure Portal (can be skipped if they exist already). In the Azure Portal site, you can create an account first. Then click Create a resource -> Data + Analytics to create a new Data Lake Store, a new Data Factory.
Click Create a resource -> Databases to create a new Azure SQL Server. You can simply follow these steps to make these creations.
After complete these creations, you can load a sample file (inputEmp.txt) to your Azure Data Lake Store account (mydatalake123) using the Data Explorer in Data Lake store. The same file looks as below.
- Download and Install Azure Feature Pack to your Visual Studio
Go to https://docs.microsoft.com/en-us/sql/integration-services/azure-feature-pack-for-integration-services-ssis to download and install the corresponding Azure Feature Pack for Integration Services (SSIS).
After the installation, you should be able to see Data Lake project option when you try to create a new project in visual studio.
- Build the SSIS package via Visual Studio
I am using Visual Studio 2015 to build the SSIS package. First we need to build the connection to the Azure Data Lake Store. Create a new Integration Services package. In Connection Managers, right click -> New Connection -> Azure Data Lake. Then key in your Data Lake Name, user name and password for the Data Lake Account. Then click Ok.
Next we will build the connection for the Azure SQL Server. Still right click -> New OLEDB Connection -> New -> key in your Azure SQL Server name, user name, password, and the database name. Click Ok.
The start to build the package. For the source, in the toolbox, select Azure Data Lake Store Source and specify the Connection manager, the input file path and name, and delimiter.
For the destination, select OLE DB Destination and specify the Azure SQL Server connection, destination table as below. Then execute the package to make sure it works.
- Create a SSIS integration Runtime in Azure Data Factory
To deploy the package to the Azure SQL server, we need to create a SSIS Integration Runtime via Azure Data Factory. Go to Azure Portal, select the Data Factory you created -> Author & Monitor (you should use Chrome as the browser, otherwise you cannot open Author & Monitor) -> Configure SSIS Integration Runtime -> Integration Runtimes -> New -> Lift-and-shift existing SSIS packages to execute in Azure -> key in the integration run time name, data factory username and password. It will take around 10 mins to create the SSIS run time, so be patient.
Once it is created, the run time should like as follows.
- Deploy the SSIS package to the Azure SQL Server
Start your SSMS, connect to the Azure SQL Server using data engine. Remember to click Options -> key in ‘SSISDB’ in the Connect to database field
Once the Azure SQL server is connected, you can creat a folder under SSISDB (I named it as dataLake). Then right click Project and follow the instructions to deploy the SSIS package we just created here.
Once the SSIS package is deployed. Right click the package.dtsx at the Azure SQL Server to make sure it is executable.
- In your on-premise SQL Server, set the Azure SQL Server as the linked server
At your on-premise SQL Server, run the following scripts to set the Azure SQL Server as a linked server.
After it, you should be able to see the Azure SQL server is shown in the Linked Server node.
- In your on-premise SQL Server, set up a job and call the SSIS package in the Azure SQL Server and set up the job schedule.