-
-
Save BirgittaHauser/e5872aba85764b8738eac3adfb0e0279 to your computer and use it in GitHub Desktop.
-- Read *csv File from IFS | |
With x as (-- Split IFS File into Rows (at CRLF) | |
Select Ordinal_Position as RowKey, Element as RowInfo | |
from Table(SysTools.Split(Get_Clob_From_File('/home/Hauser/Employee.csv'), x'0D25')) a | |
Where Trim(Element) > ''), | |
y as (-- Split IFS File Rows into Columns (and remove leading/trailing double quotes ") | |
Select x.*, Ordinal_Position ColKey, | |
Trim(B '"' from Element) as ColInfo | |
from x cross join Table(SysTools.Split(RowInfo, ',')) a) | |
-- Return the Result as Table | |
Select RowKey, | |
Min(Case When ColKey = 1 Then ColInfo End) EmployeeNo, | |
Min(Case When ColKey = 2 Then ColInfo End) Name, | |
Min(Case When ColKey = 3 Then ColInfo End) FirstName, | |
Min(Case When ColKey = 4 Then ColInfo End) Address, | |
Min(Case When ColKey = 5 Then ColInfo End) Country, | |
Min(Case When ColKey = 6 Then ColInfo End) ZipCode, | |
Min(Case When ColKey = 7 Then ColInfo End) City | |
From y | |
Where RowKey > 1 -- Remove header | |
Group By RowKey; |
NickLitten just raise your hand, the excuses will line up to find you :)
Every shop has tons of spreadsheets, and we programmers incessantly run the venerable cpyfrmimpf to satisfy the business needs.
I would point out the power of SQL that the tool makes applicable to the analysis of raw user data plus the ability to implement a configurable field mapping and casting as the main reasons to consider this tool.
These statements run in ACS's Run SQL will generate a table function offering an easy to use generic interface to run Birgitta's `code.
CREATE FUNCTION PARSECSV (
@path CHAR(256) , @startwith INTEGER )
RETURNS TABLE (
ROWKEY INTEGER ,
A CHAR(50) ,
B CHAR(50) ,
C CHAR(50) ,
D CHAR(50) ,
E CHAR(50) ,
F CHAR(50) ,
G CHAR(50) ,
H CHAR(50) ,
I CHAR(50) ,
J CHAR(50) ,
K CHAR(50) ,
L CHAR(50) ,
M CHAR(50) ,
N CHAR(50) ,
O CHAR(50) ,
P CHAR(50) ,
Q CHAR(50) ,
R CHAR(50) ,
S CHAR(50) ,
T CHAR(50) ,
U CHAR(50) ,
V CHAR(50) ,
W CHAR(50) ,
X CHAR(50) ,
Y CHAR(50) ,
Z CHAR(50) ,
AA CHAR(50) ,
AB CHAR(50) ,
AC CHAR(50) ,
AD CHAR(50) ,
AE CHAR(50) ,
AF CHAR(50) ,
AG CHAR(50) ,
AH CHAR(50) ,
AI CHAR(50) ,
AJ CHAR(50) ,
AK CHAR(50) ,
AL CHAR(50) ,
AM CHAR(50) ,
AN CHAR(50) ,
AO CHAR(50) ,
AP CHAR(50) ,
AQ CHAR(50) ,
AR CHAR(50) ,
"AS" CHAR(50) ,
"AT" CHAR(50) ,
AU CHAR(50) ,
AV CHAR(50) ,
AW CHAR(50) ,
AX CHAR(50) ,
AY CHAR(50) ,
AZ CHAR(50) ,
BA CHAR(50) ,
BB CHAR(50) ,
BC CHAR(50) ,
BD CHAR(50) ,
BE CHAR(50) ,
BF CHAR(50) ,
BG CHAR(50) ,
BH CHAR(50) ,
BI CHAR(50) ,
BJ CHAR(50) ,
BK CHAR(50) ,
BL CHAR(50) ,
BM CHAR(50) ,
BN CHAR(50) ,
BO CHAR(50) )
LANGUAGE SQL
SPECIFIC PARSECSV
NOT DETERMINISTIC
MODIFIES SQL DATA
CALLED ON NULL INPUT
CONCURRENT ACCESS RESOLUTION DEFAULT
SET OPTION ALWBLK = *ALLREAD ,
ALWCPYDTA = *OPTIMIZE ,
COMMIT = *CS ,
DECRESULT = (31, 31, 00) ,
DYNUSRPRF = *USER ,
SRTSEQ = *HEX
RETURN
WITH X AS ( -- Split IFS File into Rows (at CRLF)
SELECT ORDINAL_POSITION AS ROWKEY , ELEMENT AS ROWINFO
FROM TABLE ( SYSTOOLS . SPLIT ( GET_CLOB_FROM_FILE ( trim(@path) ) , X'0D25' ) ) A
WHERE TRIM ( ELEMENT ) > '' )
,
-- Split IFS File Rows into Columns (and remove leading/trailing double quotes ")
Y AS (
SELECT X . * , ORDINAL_POSITION COLKEY , TRIM ( B '"' FROM ELEMENT ) AS COLINFO
FROM X CROSS
JOIN TABLE ( SYSTOOLS . SPLIT ( ROWINFO , ',' ) ) A
-- ;
)
SELECT ROWKEY , MIN ( CASE WHEN COLKEY = 1 THEN COLINFO END ) A , MIN ( CASE WHEN COLKEY = 2
THEN COLINFO END ) B , MIN ( CASE WHEN COLKEY = 3 THEN COLINFO END ) C , MIN ( CASE
WHEN COLKEY = 4 THEN COLINFO END ) D , MIN ( CASE WHEN COLKEY = 5 THEN COLINFO END )
E , MIN ( CASE WHEN COLKEY = 6 THEN COLINFO END ) F , MIN ( CASE WHEN COLKEY = 7 THEN
COLINFO END ) G , MIN ( CASE WHEN COLKEY = 8 THEN COLINFO END ) H , MIN ( CASE WHEN
COLKEY = 9 THEN COLINFO END ) I , MIN ( CASE WHEN COLKEY = 10 THEN COLINFO END ) J ,
MIN ( CASE WHEN COLKEY = 11 THEN COLINFO END ) K , MIN ( CASE WHEN COLKEY = 12 THEN
COLINFO END ) L , MIN ( CASE WHEN COLKEY = 13 THEN COLINFO END ) M , MIN ( CASE WHEN
COLKEY = 14 THEN COLINFO END ) N , MIN ( CASE WHEN COLKEY = 15 THEN COLINFO END ) O ,
MIN ( CASE WHEN COLKEY = 16 THEN COLINFO END ) P , MIN ( CASE WHEN COLKEY = 17 THEN
COLINFO END ) Q , MIN ( CASE WHEN COLKEY = 18 THEN COLINFO END ) R , MIN ( CASE WHEN
COLKEY = 19 THEN COLINFO END ) S , MIN ( CASE WHEN COLKEY = 20 THEN COLINFO END ) T ,
MIN ( CASE WHEN COLKEY = 21 THEN COLINFO END ) U , MIN ( CASE WHEN COLKEY = 22 THEN
COLINFO END ) V , MIN ( CASE WHEN COLKEY = 23 THEN COLINFO END ) W , MIN ( CASE WHEN
COLKEY = 24 THEN COLINFO END ) X , MIN ( CASE WHEN COLKEY = 25 THEN COLINFO END )
Y , MIN ( CASE WHEN COLKEY = 26 THEN COLINFO END ) Z , MIN ( CASE WHEN COLKEY
= 27 THEN COLINFO END ) AA , MIN ( CASE WHEN COLKEY = 28 THEN COLINFO END )
AB , MIN ( CASE WHEN COLKEY = 29 THEN COLINFO END ) AC , MIN ( CASE WHEN COLKEY = 30
THEN COLINFO END ) AD , MIN ( CASE WHEN COLKEY = 31 THEN COLINFO END ) AE , MIN ( CASE
WHEN COLKEY = 32 THEN COLINFO END ) AF , MIN ( CASE WHEN COLKEY = 33 THEN COLINFO END )
AG , MIN ( CASE WHEN COLKEY = 34 THEN COLINFO END ) AH , MIN ( CASE WHEN COLKEY = 35
THEN COLINFO END ) AI , MIN ( CASE WHEN COLKEY = 36 THEN COLINFO END ) AJ , MIN ( CASE
WHEN COLKEY = 37 THEN COLINFO END ) AK , MIN ( CASE WHEN COLKEY = 38 THEN COLINFO END )
AL , MIN ( CASE WHEN COLKEY = 39 THEN COLINFO END ) AM , MIN ( CASE WHEN COLKEY = 40
THEN COLINFO END ) AN , MIN ( CASE WHEN COLKEY = 41 THEN COLINFO END ) AO , MIN ( CASE
WHEN COLKEY = 42 THEN COLINFO END ) AP , MIN ( CASE WHEN COLKEY = 43 THEN COLINFO
END ) AQ , MIN ( CASE WHEN COLKEY = 44 THEN COLINFO END ) AR , MIN ( CASE WHEN
COLKEY = 45 THEN COLINFO END ) AS "AS", MIN ( CASE WHEN COLKEY = 46 THEN COLINFO
END ) AT , MIN ( CASE WHEN COLKEY = 47 THEN COLINFO END ) AU , MIN ( CASE WHEN COLKEY
= 48 THEN COLINFO END ) AV , MIN ( CASE WHEN COLKEY = 49 THEN COLINFO END ) AW ,
MIN ( CASE WHEN COLKEY = 50 THEN COLINFO END ) AX , MIN ( CASE WHEN COLKEY = 51 THEN
COLINFO END ) AY , MIN ( CASE WHEN COLKEY = 52 THEN COLINFO END ) AZ ,
MIN ( CASE WHEN COLKEY = 53 THEN COLINFO END ) BA , MIN ( CASE WHEN COLKEY = 54 THEN
COLINFO END ) BB , MIN ( CASE WHEN COLKEY = 55 THEN COLINFO END ) BC , MIN ( CASE
WHEN COLKEY = 56 THEN COLINFO END ) BD , MIN ( CASE WHEN COLKEY = 57 THEN COLINFO
END ) BE , MIN ( CASE WHEN COLKEY = 58 THEN COLINFO END ) BF , MIN ( CASE WHEN
COLKEY = 59 THEN COLINFO END ) BG , MIN ( CASE WHEN COLKEY = 60 THEN COLINFO END )
BH , MIN ( CASE WHEN COLKEY = 61 THEN COLINFO END ) BI , MIN ( CASE WHEN
COLKEY = 62 THEN COLINFO END ) BJ , MIN ( CASE WHEN COLKEY = 63 THEN COLINFO END )
BK , MIN ( CASE WHEN COLKEY = 64 THEN COLINFO END ) BL , MIN ( CASE WHEN COLKEY
= 65 THEN COLINFO END ) BM , MIN ( CASE WHEN COLKEY = 66 THEN COLINFO END ) BN ,
MIN ( CASE WHEN COLKEY = 67 THEN COLINFO END ) BO
FROM Y
WHERE ROWKEY > (@startwith - 1) -- Remove header if necessary
GROUP BY ROWKEY
ORDER BY ROWKEY
;
LABEL ON SPECIFIC FUNCTION PARSECSV
IS 'CSV Parser' ;
--Grant permissions as necessary
--GRANT ALTER , EXECUTE
--ON SPECIFIC FUNCTION PARSECSV
--TO WITH GRANT OPTION ;
For example, the function could be run as
select * from table(parsecsv('/somefolder/somefile.csv',1)) t;
Or, if you don't want to have the header row returned, substitute the second parameter with a 2.
Thanks Birgitta. I think Modifies SQL Data would also be required here. Based on the ACS wizard, the default behavior would be Reads SQL Data. Without the Modifies, the system will return SQL0577 for this routine.
Also, just a matter of taste, I would name the columns after their Excel counterparts (i.e., A,B,C,...AA,AB,AC,...) to more easily map them out to the origins.
One needs to be cognizant about certain size limitation split() currently imposes.
Looking at the parameter definition, INPUT_LIST is defined as CLOB(1048576). I have a (relatively) large csv file (40 columns, just under 30K rows, and the size stands at just under 7M). The first CTE from the new UDTF parsecsv() will cut the processing short when run against that file.
The size of the input parm for split() could probably be bumped up to help the size limitation issue, but I would be concerned about the performance impact. The performance of the new UDTF parsecsv() is not too great already.
It seems to me that, while very cool, this routine could only be deployed to run against relatively small files. Birgitta, would you agree, or am I missing anything?
and concised! thanks Birgitta for sharing.