Skip to content

Instantly share code, notes, and snippets.

@DecisionNerd
Created November 13, 2015 03:13
Show Gist options
  • Save DecisionNerd/3de707bc656cf757a0cb to your computer and use it in GitHub Desktop.
Save DecisionNerd/3de707bc656cf757a0cb to your computer and use it in GitHub Desktop.
CSV to JSON converter using BASH. Original script from http://blog.secaserver.com/2013/12/convert-csv-json-bash/
#!/bin/bash
# CSV to JSON converter using BASH
# original script from http://blog.secaserver.com/2013/12/convert-csv-json-bash/
# thanks SecaGuy!
# Usage ./csv2json.sh input.csv > output.json
input=$1
[ -z $1 ] && echo "No CSV input file specified" && exit 1
[ ! -e $input ] && echo "Unable to locate $1" && exit 1
read first_line < $input
a=0
headings=`echo $first_line | awk -F, {'print NF'}`
lines=`cat $input | wc -l`
while [ $a -lt $headings ]
do
head_array[$a]=$(echo $first_line | awk -v x=$(($a + 1)) -F"," '{print $x}')
a=$(($a+1))
done
c=0
echo "{"
while [ $c -lt $lines ]
do
read each_line
if [ $c -ne 0 ]; then
d=0
echo -n "{"
while [ $d -lt $headings ]
do
each_element=$(echo $each_line | awk -v y=$(($d + 1)) -F"," '{print $y}')
if [ $d -ne $(($headings-1)) ]; then
echo -n ${head_array[$d]}":"$each_element","
else
echo -n ${head_array[$d]}":"$each_element
fi
d=$(($d+1))
done
if [ $c -eq $(($lines-1)) ]; then
echo "}"
else
echo "},"
fi
fi
c=$(($c+1))
done < $input
echo "}"
@grmpfhmbl
Copy link

grmpfhmbl commented Sep 14, 2018

@linosteenkamp's version does not work with CSV that contain quoted comma (",") e.g. printf "head1,head2,head3\n1,\"foo, bar, baz\",\"foo bar baz\"" | ./csv2json.sh will result in

[
        {
                "head1": 1,
                "head2": ""foo",
                "head3": " bar"
                "": " baz""
                "": "foo bar baz"
        }
]

Quick fix for @outwitevil's script (https://gist.github.com/dsliberty/3de707bc656cf757a0cb#gistcomment-2103308) is to replace the \r in the sed regex with $(printf '\r'). The script will still struggle with empty lines, so you have to delete them beforehand. A simple one-liner

printf "head1,head2,head3\n\n\n1,\"foo, bar, baz\",\"foo bar baz\"\n\n" | sed '/^[[:space:]]*$/d' | ./csv2json.sh
[
    {
        "head1": 1,
        "head2": "foo, bar, baz",
        "head3": "foo bar baz"
    }
]

I haven't checked if there are any side effects on Linux now.

#!/bin/bash

# CSV to JSON converter using BASH
# original script from https://gist.github.com/dsliberty/3de707bc656cf757a0cb
# Usage ./csv2json.sh input.csv > output.json
#       cat <input.csv> | csv2json > output.json
#set -x
shopt -s extglob

input="${1:-/dev/stdin}"

SEP=","

[ -z "${input}" ] && echo "No CSV input file specified" && exit 1
[ ! -e "${input}" ] && echo "Unable to locate ${input}" && exit 1

csv_nextField()
{
    local line="$(echo "${1}" | sed 's/$(printf '\r')//g')"
    local start=0
    local stop=0

    if [[ -z "${line}" ]]; then
        return 0
    fi

    local offset=0
    local inQuotes=0
    while [[ -n "${line}" ]]; do
        local char="${line:0:1}"
        line="${line:1}"

        if [[ "${char}" == "${SEP}" && ${inQuotes} -eq 0 ]]; then
            inQuotes=0
            break
        elif [[ "${char}" == '"' ]]; then
            if [[ ${inQuotes} -eq 1 ]]; then
                inQuotes=0
            else
                inQuotes=1
            fi
        else
            echo -n "${char}"
        fi
        offset=$(( ${offset} + 1 ))
    done

    echo ""
    return $(( ${offset} + 1 ))
}

read -r first_line < "${input}"
a=0
headings=$(echo "${first_line}" | awk -F"${SEP}" {'print NF'})

if [ "${input}" = "/dev/stdin" ]; then
  while read -r line
  do
    lines_str+="$line"$'\n'
    c=1
  done < "${input}"
else
  lines_str="$(cat "${input}")"
  c=0
fi

lines_num=$(echo "${lines_str}" | wc -l)

while [[ ${a} -lt ${headings} ]]; do
    field="$(csv_nextField "${first_line}")"
    first_line="${first_line:${?}}"
    head_array[${a}]="${field}"
    a=$(( ${a} + 1 ))
done

#c=0
echo "["
while [ ${c} -lt ${lines_num} ]
do
    read -r each_line
    each_line="$(echo "${each_line}" | sed 's/$(printf '\r')//g')"

    if [[ ${c} -eq 0 ]]; then
        c=$(( ${c} + 1 ))
    else
        d=0
        echo "    {"
        while [[ ${d} -lt ${headings} ]]; do
            item="$(csv_nextField "${each_line}")"
            each_line="${each_line:${?}}"
            echo -n "        \"${head_array[${d}]}\": "
            case "${item}" in
                "")
                    echo -n "null"
                    ;;
                null|true|false|\"*\"|+([0123456789]))
                    echo -n ${item}
                    ;;
                *)
                    echo -n "\"${item}\""
                    ;;
            esac
            d=$(( ${d} + 1 ))
            [[ ${d} -lt ${headings} ]] && echo "," || echo ""
        done

        echo -n "    }"

        c=$(( ${c} + 1 ))
        [[ ${c} -lt ${lines_num} ]] && echo "," || echo ""
    fi

done <<< "${lines_str}"
echo "]"

@onkar-cliqr
Copy link

#!/bin/bash

CSV to JSON converter using BASH

Usage ./csv2json input.csv > output.json

input=$1
[ -z $1 ] && echo "No CSV input file specified" && exit 1
[ ! -e $input ] && echo "Unable to locate $1" && exit 1
read first_line < $input
a=0
headings=echo $first_line | awk -F, {'print NF'}
lines=cat $input | wc -l
while [ $a -lt $headings ]
do
head_array[$a]=$(echo $first_line | awk -v x=$(($a + 1)) -F"," '{print $x}' | tr -d '\r')
a=$(($a+1))
done
c=0
echo "["
while [ $c -le $lines ]
do
read each_line
if [ $c -ne 0 ]; then
d=0
echo -n "{"
while [ $d -lt $headings ]
do
each_element=$(echo $each_line | awk -v y=$(($d + 1)) -F"," '{print $y}' | tr -d '\r')
if [ $d -ne $(($headings-1)) ]; then
echo -n ""${head_array[$d]}":"$each_element","
else
echo -n ""${head_array[$d]}":"$each_element""
fi
d=$(($d+1))
done
if [ $c -eq $(($lines)) ]; then
echo "}"
else
echo "},"
fi
fi
c=$(($c+1))
done < $input
echo "]"

This should give with quatation and array of json objects

@jamie-cole70
Copy link

Question, the script runs fine but does not output a json file?

@idzob
Copy link

idzob commented Aug 5, 2020

Problem if field value have more than 254 characters.After that field every other field will the same

@mrabino1
Copy link

I have a field that has the following value
"doc":0000000000000000000000000000000000000000000000000000000000000000,
what is interesting is that the all zeros is failing to be parsed by JSON tools ...

they either want a 0 or a "0000000000000000000000000000000000000000000000000000000000000000"

Is there a way that we can put quotes around all values even if they are numbers? or is that outside the accepted formatting of JSON?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment