drio · April 30, 2010 14:54
diff --git a/gistfile1.txt b/gistfile1.txt
 These are the set of key changes we suggest to implement in order to successfully assimilate
 the incoming HiSEQ instruments:

 1. Transfer process from Instruments to Cluster.

 Currently the two GAIIs are dumping to a single machine (slxdump). Then to the 
 cluster. We suggest removing the intermidiate machine and make the 
 instruments dump data directly to the cluster volumes. A similar process of what we 
 are doing for SOLiD.

 The only available protocol from illumina to transfer data is SMB. This means SAMBA 
 will have to be setup in the dumping servers.

 2. What data to transfer.

 The HiSEQ by default does not save images. This can be activated but it will require 
 30Tb of space available per Run. In addition, Illumina is not saving intensities by 
 default with the new software. If we want to save them, we will need around 4Tb per 
 each Run.

 We agree with Illumina's software default behaviour and we recommend keeping only 
 the base calls. This is still going to require 400G per RUN.

 3. Archiving Solution.

 We have to come up with a better way to archive data. The current archiving 
 solution protocol requires human interaction. That is a problem and it will be 
 even more problematic once the HiSEQs arrive. Moving towards a programmatic 
 approach will save time and avoid human errors. The old method of dropping 
 the locations in a file worked fine for us but we are open to any idea 
 the sysadmins may have.

 4. Computer Processing Requirements

 We are still trying to figure out how to improve the parallelization levels
 to efficiently use the computation resources. Illumina's docs state it will
 take 44 hours to compute 200G of data with their 3 (8cores x 32G) machines.
 Washu can compute 200G in 30 hours using 64 cores 8 (8cores x 32G) machines. 

 One thing is clear, we will need 8 core machine with 32 G. The ELAND step
 in the GAP spawns N processes at OS level instead of cluster level. 

 We should consider the possibility of moving the pipeline to Ardmore, but
 we will need the sysadmins feedback for this.
	These are the set of key changes we suggest to implement in order to successfully assimilate
	the incoming HiSEQ instruments:

	1. Transfer process from Instruments to Cluster.

	Currently the two GAIIs are dumping to a single machine (slxdump). Then to the
	cluster. We suggest removing the intermidiate machine and make the
	instruments dump data directly to the cluster volumes. A similar process of what we
	are doing for SOLiD.

	The only available protocol from illumina to transfer data is SMB. This means SAMBA
	will have to be setup in the dumping servers.

	2. What data to transfer.

	The HiSEQ by default does not save images. This can be activated but it will require
	30Tb of space available per Run. In addition, Illumina is not saving intensities by
	default with the new software. If we want to save them, we will need around 4Tb per
	each Run.

	We agree with Illumina's software default behaviour and we recommend keeping only
	the base calls. This is still going to require 400G per RUN.

	3. Archiving Solution.

	We have to come up with a better way to archive data. The current archiving
	solution protocol requires human interaction. That is a problem and it will be
	even more problematic once the HiSEQs arrive. Moving towards a programmatic
	approach will save time and avoid human errors. The old method of dropping
	the locations in a file worked fine for us but we are open to any idea
	the sysadmins may have.

	4. Computer Processing Requirements

	We are still trying to figure out how to improve the parallelization levels
	to efficiently use the computation resources. Illumina's docs state it will
	take 44 hours to compute 200G of data with their 3 (8cores x 32G) machines.
	Washu can compute 200G in 30 hours using 64 cores 8 (8cores x 32G) machines.

	One thing is clear, we will need 8 core machine with 32 G. The ELAND step
	in the GAP spawns N processes at OS level instead of cluster level.

	We should consider the possibility of moving the pipeline to Ardmore, but
	we will need the sysadmins feedback for this.