New example. I used a \t
(tab) to separate the new column from the old text, but you can use whatever you want.
use strict;
use warnings;
use 5.020;
use autodie;
use Data::Dumper;
my @new_column = (
"Lineage",
"cellular organisms; XXX;",
"cellular organisms; YYY;",
);
open my $INFILE, '<', 'data.txt'; #Standard way to open a file for reading (autodie above handles errors opening file)
open my $OUTFILE, '>', 'new_data.txt'; #Standard way to open a file for writing
my $i = 0;
while(my $line = <$INFILE>) { #Read the file line by line.
chomp $line; #Remove the newline at the end of the line.
say {$OUTFILE} "$line\t$new_column[$i]"; #Write to the file. $new_column[0] is the first element in @new_column.
++$i;
}
close $INFILE;
close $OUTFILE;
Perl has facilities that allow you (seemingly) to do inplace editing of a file--just like awk
. A backup file will be created first so that you don't lose your original data if perl crashes while you are editing a file. If you open a file in write mode, the file is erased, and if perl then crashes, goodbye data.
The perl variable $^I
is set to undef
by default. If you set $^I
to a string, then you turn on inplace editing. After turning on inplace editing, it takes a few more steps to set things up properly. After things are setup for inplace editing, any output sent to stdout gets written to the file.
Starting file:
$ cat data.txt
Taxon Id Common Name Scientific Name
9606 human Homo sapiens
9483 white-tufted-ear marmoset Callithrix jacchus
Perl script:
use strict;
use warnings;
use 5.020;
use autodie;
use Data::Dumper;
my @new_column = (
"Lineage",
"cellular organisms; XXX;",
"cellular organisms; YYY;",
);
{
local $^I = ".bak"; #Blank string for no backup file.
local @ARGV = "data.txt"; ####REQUIRED. Cannot use: while (my $line = <$INFILE>)
## and get inplace editing
my $i = 0;
while (my $line = <>) { #The "diamond" operator reads from the files specified in @ARGV
chomp $line;
say "$line\t$new_column[$i]"; #Written to file.
++$i;
}
}
Ending file:
$ cat data.txt
Taxon Id Common Name Scientific Name Lineage
9606 human Homo sapiens cellular organisms; XXX;
9483 white-tufted-ear marmoset Callithrix jacchus cellular organisms; YYY;
Thank you very much for your help! Actually I will create a new file to store the combined data. I don't quite understand the use of
$^I
in your code, could you please explain?