Created
January 30, 2020 05:18
-
-
Save BirgittaHauser/e5872aba85764b8738eac3adfb0e0279 to your computer and use it in GitHub Desktop.
Read *.csv File directly with SQL
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Read *csv File from IFS | |
With x as (-- Split IFS File into Rows (at CRLF) | |
Select Ordinal_Position as RowKey, Element as RowInfo | |
from Table(SysTools.Split(Get_Clob_From_File('/home/Hauser/Employee.csv'), x'0D25')) a | |
Where Trim(Element) > ''), | |
y as (-- Split IFS File Rows into Columns (and remove leading/trailing double quotes ") | |
Select x.*, Ordinal_Position ColKey, | |
Trim(B '"' from Element) as ColInfo | |
from x cross join Table(SysTools.Split(RowInfo, ',')) a) | |
-- Return the Result as Table | |
Select RowKey, | |
Min(Case When ColKey = 1 Then ColInfo End) EmployeeNo, | |
Min(Case When ColKey = 2 Then ColInfo End) Name, | |
Min(Case When ColKey = 3 Then ColInfo End) FirstName, | |
Min(Case When ColKey = 4 Then ColInfo End) Address, | |
Min(Case When ColKey = 5 Then ColInfo End) Country, | |
Min(Case When ColKey = 6 Then ColInfo End) ZipCode, | |
Min(Case When ColKey = 7 Then ColInfo End) City | |
From y | |
Where RowKey > 1 -- Remove header | |
Group By RowKey; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
One needs to be cognizant about certain size limitation split() currently imposes.
Looking at the parameter definition, INPUT_LIST is defined as CLOB(1048576). I have a (relatively) large csv file (40 columns, just under 30K rows, and the size stands at just under 7M). The first CTE from the new UDTF parsecsv() will cut the processing short when run against that file.
The size of the input parm for split() could probably be bumped up to help the size limitation issue, but I would be concerned about the performance impact. The performance of the new UDTF parsecsv() is not too great already.
It seems to me that, while very cool, this routine could only be deployed to run against relatively small files. Birgitta, would you agree, or am I missing anything?