Method to randomize column order in a csv using the BASH

I received a question about how to randomize column order in a text file. I came up with the method that using common Unix command line tools (sort, sed, tr, join) and the BASH shell. It has been tested on the BSD command line tools in macOS 12 (running zsh) and gnu command line tools BASH on CentOS (running bash).

NOTE: this does not handle quoted commas in the CSV. The only commas should be the delimiters.

Randomize all columns

NUM_COLS=10
NUM_RANDOMIZED_OUTPUT=5
INPUT_FILE=input.csv
OUTPUT_PREFIX=output

seq 1 ${NUM_COLS} > cols
for x in `seq 1 ${NUM_RANDOMIZED_OUTPUT}`; do
  sort --random-sort cols | sed -r 's/^/1./' | tr "\n" , | sed -r 's/,$//'
  echo
done >randomized_col_order

for y in `seq 1 ${NUM_RANDOMIZED_OUTPUT}`; do
  current_order=`sed -n ${y}p randomized_col_order`
  join -t, -o "${current_order}" ${INPUT_FILE} ${INPUT_FILE} > ${OUTPUT_PREFIX}_${y}
done

rm cols randomized_col_order

Example output

input.csv:

1,2,3,4,5,6,7,8,9,10

Output files:

ls output_*
output_1
output_2
output_3
output_4
output_5

cat output_*
3,8,1,9,2,4,6,5,10,7
3,1,7,4,10,9,6,2,8,5
6,1,10,8,7,3,5,2,4,9
5,3,2,4,7,6,1,9,8,10
6,4,3,2,7,9,8,1,5,10

Randomize columns, but keep first column in-place

This version is modified such that the first column position is not randomized. This would be for cases where the first column is an identifier column. The changes to the code are seq 1 ${NUM_COLS} > cols becomes seq 2 ${NUM_COLS} > cols and the addition of echo -n "1.1," to the beginning of each line of randomized_col_order.

NUM_COLS=10
NUM_RANDOMIZED_OUTPUT=5
INPUT_FILE=input.csv
OUTPUT_PREFIX=output

seq 2 ${NUM_COLS} > cols
for x in `seq 1 ${NUM_RANDOMIZED_OUTPUT}`; do
  echo -n "1.1,"
  sort --random-sort cols | sed -r 's/^/1./' | tr "\n" , | sed -r 's/,$//'
  echo
done >randomized_col_order

for y in `seq 1 ${NUM_RANDOMIZED_OUTPUT}`; do
  current_order=`sed -n ${y}p randomized_col_order`
  join -t, -o "${current_order}" ${INPUT_FILE} ${INPUT_FILE} > ${OUTPUT_PREFIX}_${y}
done

rm cols randomized_col_order

Example

cat output_*
1,3,4,6,2,5,10,9,7,8
1,5,7,10,8,4,6,2,9,3
1,2,4,7,3,5,10,9,8,6
1,6,2,3,7,4,5,8,9,10
1,6,3,7,9,4,5,2,8,10

mkweskin/randomize_columns.md

Select an option

No results found