Skip to content

Instantly share code, notes, and snippets.

@7stud
Last active February 10, 2018 18:17
Show Gist options
  • Save 7stud/b4167ac1c0a772f0ae20e8bee7f6c30c to your computer and use it in GitHub Desktop.
Save 7stud/b4167ac1c0a772f0ae20e8bee7f6c30c to your computer and use it in GitHub Desktop.
Inplace editing in perl

New example. I used a \t (tab) to separate the new column from the old text, but you can use whatever you want.

use strict;
use warnings; 
use 5.020;
use autodie;
use Data::Dumper;

my @new_column = (
    "Lineage",
    "cellular organisms; XXX;",
    "cellular organisms; YYY;",
);

open my $INFILE, '<', 'data.txt';   #Standard way to open a file for reading (autodie above handles errors opening file)
open my $OUTFILE, '>', 'new_data.txt';   #Standard way to open a file for writing 

my $i = 0;

while(my $line = <$INFILE>) {   #Read the file line by line.
    chomp $line;  #Remove the newline at the end of the line.
    say {$OUTFILE} "$line\t$new_column[$i]";   #Write to the file.  $new_column[0] is the first element in @new_column.
    ++$i;
}

close $INFILE;
close $OUTFILE;

Perl has facilities that allow you (seemingly) to do inplace editing of a file--just like awk. A backup file will be created first so that you don't lose your original data if perl crashes while you are editing a file. If you open a file in write mode, the file is erased, and if perl then crashes, goodbye data.

The perl variable $^I is set to undef by default. If you set $^I to a string, then you turn on inplace editing. After turning on inplace editing, it takes a few more steps to set things up properly. After things are setup for inplace editing, any output sent to stdout gets written to the file.

Starting file:

$ cat data.txt
Taxon Id        Common Name     Scientific Name
9606    human   Homo sapiens
9483    white-tufted-ear marmoset       Callithrix jacchus

Perl script:

use strict;
use warnings; 
use 5.020;
use autodie;
use Data::Dumper;

my @new_column = (
    "Lineage",
    "cellular organisms; XXX;",
    "cellular organisms; YYY;",
);

{
    local $^I = ".bak";  #Blank string for no backup file.
    local @ARGV = "data.txt";   ####REQUIRED.  Cannot use: while (my $line = <$INFILE>)
                                ##             and get inplace editing

    my $i = 0;

    while (my $line = <>) {   #The "diamond" operator reads from the files specified in @ARGV
        chomp $line;
        say "$line\t$new_column[$i]";   #Written to file.
        ++$i;
    }
}

Ending file:

$ cat data.txt
Taxon Id        Common Name     Scientific Name	Lineage
9606    human   Homo sapiens	cellular organisms; XXX;
9483    white-tufted-ear marmoset       Callithrix jacchus	cellular organisms; YYY;

@vincent507cpu
Copy link

Thank you very much for your help! Actually I will create a new file to store the combined data. I don't quite understand the use of $^I in your code, could you please explain?

@7stud
Copy link
Author

7stud commented Feb 7, 2018

Perl has lots of predefined global variables that are used for a variety of different purposes. One of perl's predefined global variables is $^I. I explained what it does in the second paragraph. But, if you are going to create a new file for the modified data, then forget about $^I. I'll post another example instead.

@vincent507cpu
Copy link

I don't understand local $^I = ".bak"; #Blank string for no backup file., particularly the comment. (Sorry, English is my second language) If you don't need to set up a backup file, why do you write down local $^I = ".bak"? Just to turn on inplace editing?

@7stud
Copy link
Author

7stud commented Feb 8, 2018

You said that you wanted to write the modified data to a new file. In that case, inplace editing is irrelevant, and it is not something you need to know about. So, look at the new example I posted.

If you want to learn about inplace editing anyway, then there are three possibilities:

  1. The "diamond" operator will not do inplace editing: $^I = undef; (the default)
  2. The "diamond" operator will do inplace editing, but not create a backup file: $^I = '';
  3. The "diamond" operator will do inplace editing, and create a backup file: $^I = '.bak';

If you assign any string to $^I it turns on inplace editing.
If you assign a blank string to $^I no back up file will be created.
If you assign any non-blank string to $^I, a backup file will be created, and the name of the backup file will be: "originalname" . $^I. So if the original file name was'data.txt' and the string assigned to $^I was '.1.2.3hello', then the backup file would be named: data.txt.1.2.3hello. If the original file name was 'my_data', and '-!-hello' was assigned to $^I, then the backup file would be named: 'my_data-!-hello'.

local tells perl to temporarily change the value of the specified global variable for the duration of the current scope:

{

      #Braces delimit scopes
}

When the closing brace is encountered, perl magically restores all the variables declared to be local to the values they had before the opening brace was encountered. When using perl's predefined global variables, it's considered good practice to only change them for as long as you need them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment