Skip to content

Instantly share code, notes, and snippets.

@aorjoa
Last active January 12, 2016 08:55
Show Gist options
  • Save aorjoa/b17126b10522e2a6397b to your computer and use it in GitHub Desktop.
Save aorjoa/b17126b10522e2a6397b to your computer and use it in GitHub Desktop.
copy local file to qfs

Test copy local file to QFS

create 20GB with command fallocate to /media/data_ssd/testfile20G.

=== Space usage ===
Total space	:	659.81 GB
Used space	:	0.00 bytes
Free space	:	626.05 GB 94.88%

Chunckserver

3 chunckserver
192.168.5.202:21001
192.168.5.203:21001
192.168.5.204:21001

$ df -h

/+++++++++++++++++++++++++++++++++++++++++++++++++++++/
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        14G   13G  689M  95% /
devtmpfs        937M  4.0K  937M   1% /dev
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            189M  684K  189M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            945M  144K  945M   1% /run/shm
none            100M   28K  100M   1% /run/user
/dev/sda1       220G   21G  189G  10% /media/data_ssd *** 20GB of fallocate /media/data_ssd/testfile20G is here! ***
/+++++++++++++++++++++++++++++++++++++++++++++++++++++/

3-Ways replication

Try to copy local file to QFS system on mode replication = 3. (It's look like some proble in the fisrt time, In second error haven't occurs)

$ /usr/bin/time -v cptoqfs -s 192.168.5.201 -p 20000 -r 3 -k /qfs/tmp/testfile20G -d /media/data_ssd/testfile20G

======= BEGIN QFS Logging =======
12-23-2015 22:52:20.576 INFO - (NetManager.cc:591) timer overrun 1450861690 seconds detected
12-23-2015 22:52:20.576 ERROR - (KfsNetClient.cc:589) PW 1,13,2,testfile20G  closing connection: 192.168.5.201:60687 to: 192.168.5.202 21001 due to inactivity timeout pending: read: 0 write: 2277339 ops: 8 auth failures: 0 error: 
12-23-2015 22:52:20.576 ERROR - (Writer.cc:1320) PW 1,13,2,testfile20G operation failure, seq: 1104947600562878279 status: -10110 msg:  op: write-prepare: chunkid: 34468 version: 1 offset: 58720256 numBytes: 1048576 checksum: 15728641 location: 192.168.5.202 21001 writeId: 2861784459341043837  location: 192.168.5.203 21001 writeId: 3780602366989424094  location: 192.168.5.204 21001 writeId: 1337298376634685395  current chunk server: 192.168.5.202 21001 chunkserver: partial data sent
Request:
WRITE_PREPARE
Cseq: 1104947600562878279
Version: KFS/1.0
Client-Protocol-Version: 114
UserId: 0
GroupId: 0
User: root
Chunk-handle: 34468
Chunk-version: 1
Offset: 58720256
Num-bytes: 1048576
Checksum: 15728641
Checksum-entries: 0
Reply: 1
Num-servers: 3
Servers:192.168.5.202 21001 2861784459341043837 192.168.5.203 21001 3780602366989424094 192.168.5.204 21001 1337298376634685395 
12-23-2015 22:52:20.576 INFO - (Writer.cc:1383) PW 1,13,2,testfile20G scheduling retry: 1 of 6 in 1 sec. op: write-prepare: chunkid: 34468 version: 1 offset: 58720256 numBytes: 1048576 checksum: 15728641 location: 192.168.5.202 21001 writeId: 2861784459341043837  location: 192.168.5.203 21001 writeId: 3780602366989424094  location: 192.168.5.204 21001 writeId: 1337298376634685395
======= END QFS Logging =======
======= BEGIN TIME Logging =======
	Command being timed: "cptoqfs -s 192.168.5.201 -p 20000 -r 3 -k /qfs/tmp/testfile20G -d /media/data_ssd/testfile20G"
	User time (seconds): 65.29
	System time (seconds): 437.60
	Percent of CPU this job got: 0%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 403017:26:39
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 29060
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2
	Minor (reclaiming a frame) page faults: 5248852
	Voluntary context switches: 221084
	Involuntary context switches: 75572
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
======= END TIME Logging =======

Check stat of file to ensure that processing on replication mode.

$ /usr/bin/time -v qfsshell -s 192.168.5.201 -p 20000 -q -- stat /qfs/tmp/testfile20G

======= BEGIN QFS Logging =======
File:             /qfs/tmp/testfile20G
ctime:            23979
mtime:            1450886779
Size:             21474836480
Id:               13
Replication:      3
Chunks:           320
Files:            0
Dirs:             0
Owner:            0
Group:            0
Mode:             644
MinTier:          15
MaxTier:          15
======= END QFS Logging =======
======= BEGIN TIME Logging =======
	Command being timed: "qfsshell -s 192.168.5.201 -p 20000 -q -- stat /qfs/tmp/testfile20G"
	User time (seconds): 0.04
	System time (seconds): 0.02
	Percent of CPU this job got: 92%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.07
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 4588
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 1323
	Voluntary context switches: 12
	Involuntary context switches: 44
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
======= END TIME Logging =======

Space usage

Total space : 659.81 GB Used space : 60.00 GB Free space : 565.97 GB 85.78%

=====================================================================================

Reed-Solomon 6,3 replication 1

$ /usr/bin/time -v cptoqfs -s 192.168.5.201 -p 20000 -S -r 1 -k /qfs/tmp/testfile20G -d /media/data_ssd/testfile20G

======= BEGIN TIME Logging =======
Command being timed: "cptoqfs -s 192.168.5.201 -p 20000 -S -r 1 -k /qfs/tmp/testfile20G -d /media/data_ssd/testfile20G"
	User time (seconds): 281.08
	System time (seconds): 319.90
	Percent of CPU this job got: 75%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 13:17.11
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 49448
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2
	Minor (reclaiming a frame) page faults: 7875496
	Voluntary context switches: 208448
	Involuntary context switches: 89355
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
======= END TIME Logging =======

Check stat of file to ensure that processing on Reed-Solomon mode.

$ /usr/bin/time -v qfsshell -s 192.168.5.201 -p 20000 -q -- stat /qfs/tmp/testfile20G

======= BEGIN QFS Logging =======
File:             /qfs/tmp/testfile20G
ctime:            1450889939
mtime:            1450890736
Size:             21474836480
Id:               15
Replication:      1
Chunks:           486
Files:            0
Dirs:             0
Owner:            0
Group:            0
Mode:             644��
MinTier:          15
MaxTier:          15
Stripe size:      65536
Data stripes :    6
Recovery stripes: 3
Type:             2
======= END QFS Logging =======
======= BEGIN TIME Logging =======
	Command being timed: "qfsshell -s 192.168.5.201 -p 20000 -q -- stat /qfs/tmp/testfile20G"
	User time (seconds): 0.04
	System time (seconds): 0.02
	Percent of CPU this job got: 90%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.07
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 4588
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 1323
	Voluntary context switches: 11
	Involuntary context switches: 42
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
======= END TIME Logging =======

Space usage

Total space	:	659.81 GB
Used space	:	30.00 GB
Free space	:	596.04 GB 90.33%

Free space RS 6,3 - Free space Replication 3 = 596.04-566.03 = 30.01GB

@aorjoa
Copy link
Author

aorjoa commented Dec 24, 2015

Test copy SAM 800MB

First of all we convert file such kind of .SAM (800MB) to .ADAM

3-Ways replication

$ /usr/bin/time -v cptoqfs -s 192.168.5.201 -p 20000 -r 3 -k /qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam -d /media/data_ssd/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam

======= BEGIN TIME Logging =======
Command being timed: "cptoqfs -s 192.168.5.201 -p 20000 -r 3 -k /qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam -d /media/data_ssd/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam"
    User time (seconds): 1.08
    System time (seconds): 3.08
    Percent of CPU this job got: 29%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:14.28
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 17364
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 1
    Minor (reclaiming a frame) page faults: 42471
    Voluntary context switches: 2974
    Involuntary context switches: 1216
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0
======= END TIME Logging =======

Space usage

Total space :   659.81 GB
Used space  :   466.31 MB
Free space  :   624.87 GB 94.71%

Reed-Solomon 6,3 replication 1

$ /usr/bin/time -v cptoqfs -s 192.168.5.201 -p 20000 -S -r 1 -k /qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam -d /media/data_ssd/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam

======= BEGIN TIME Logging =======
Command being timed: "cptoqfs -s 192.168.5.201 -p 20000 -S -r 1 -k /qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam -d /media/data_ssd/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam"
    User time (seconds): 3.23
    System time (seconds): 2.23
    Percent of CPU this job got: 62%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.80
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 20884
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 61908
    Voluntary context switches: 3324
    Involuntary context switches: 1753
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0
======= END TIME Logging =======

Space usage

Total space :   659.81 GB
Used space  :   238.24 MB
Free space  :   625.56 GB 94.81%

@MooxxNew
Copy link

Test copy SAM 800MB

3-Ways replication
/usr/bin/time -v cptoqfs -s 192.168.5.201 -p 20000 -r 3 -k /qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam -d /media/data_ssd/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam
====== BEGIN TIME Logging ======
Command being timed: "cptoqfs -s 192.168.5.201 -p 20000 -r 3 -k /qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam -d /media/data_ssd/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam"
    User time (seconds): 1.16
    System time (seconds): 2.97
    Percent of CPU this job got: 27%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.04
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 17364
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 42472
    Voluntary context switches: 2981
    Involuntary context switches: 1696
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0
====== END TIME Logging ======

Space usage
Total space :   659.81 GB
Used space  :   466.31 MB
Free space  :   625.59 GB 94.81%

adam-shell

measure the running time
scala> :paste
// Entering paste mode (ctrl-D to finish)

def time[T](block: => T): T = {
    val start = System.currentTimeMillis
    val res = block
    val totalTime = System.currentTimeMillis - start
    println("Elapsed time: %1d ms".format(totalTime))
    res
  }

// Exiting paste mode, now interpreting.

time: [T](block: => T)T

Counting K-mers with QFS
scala> :paste
// Entering paste mode (ctrl-D to finish)

time{
import org.bdgenomics.adam.rdd.ADAMContext
import org.bdgenomics.adam.projections.{AlignmentRecordField, Projection}

val ac = new ADAMContext(sc)
val reads = ac.loadAlignments(
  "/qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam",
  projection = Some(
    Projection(
      AlignmentRecordField.sequence,
      AlignmentRecordField.readMapped,
      AlignmentRecordField.mapq
    )
  )
)

val kmers = reads.flatMap(_.getSequence.sliding(10).map(k => (k, 1L))).reduceByKey(_ + _).map(_.swap).sortByKey(ascending = false)

kmers.take(10).foreach(println)
}

// Exiting paste mode, now interpreting.

(114977,AAAAAAAAAA)
(38703,NNNNNNNNNN)
(35542,CCCCCCCCCC)
(34159,GGGGGGGGGG)
(18949,CACACACACA)
(18435,ACACACACAC)
(18024,GTGTGTGTGT)
(17909,TGTGTGTGTG)
(12169,CTTTTTTTTT)
Elapsed time: 2187978 ms

Read-Solomon 6,3 Replication 1
 /usr/bin/time -v cptoqfs -s 192.168.5.201 -p 20000 -S -r 1 -k /qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam -d /media/data_ssd/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam
====== BEGIN TIME Logging ======
    Command being timed: "cptoqfs -s 192.168.5.201 -p 20000 -S -r 1 -k /qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam -d /media/data_ssd/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam"
    User time (seconds): 3.23
    System time (seconds): 2.74
    Percent of CPU this job got: 63%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.46
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 20884
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 2
    Minor (reclaiming a frame) page faults: 61911
    Voluntary context switches: 3828
    Involuntary context switches: 1852
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0
====== END TIME Logging =======

Space usage
Total space :   659.81 GB
Used space  :   238.24 MB
Free space  :   625.81 GB 94.85%

adam-shell

Counting K-mers with QFS
scala> :paste
// Entering paste mode (ctrl-D to finish)

time{
import org.bdgenomics.adam.rdd.ADAMContext
import org.bdgenomics.adam.projections.{AlignmentRecordField, Projection}

val ac = new ADAMContext(sc)
val reads = ac.loadAlignments(
  "/qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam",
  projection = Some(
    Projection(
      AlignmentRecordField.sequence,
      AlignmentRecordField.readMapped,
      AlignmentRecordField.mapq
    )
  )
)

val kmers = reads.flatMap(_.getSequence.sliding(10).map(k => (k, 1L))).reduceByKey(_ + _).map(_.swap).sortByKey(ascending = false)

kmers.take(10).foreach(println)
}

// Exiting paste mode, now interpreting.

(114977,AAAAAAAAAA)
(38703,NNNNNNNNNN)
(35542,CCCCCCCCCC)
(34159,GGGGGGGGGG)
(18949,CACACACACA)
(18435,ACACACACAC)
(18024,GTGTGTGTGT)
(17909,TGTGTGTGTG)
(12169,CTTTTTTTTT)
Elapsed time: 2207018 ms

Counting K-mers without QFS
scala> :paste
// Entering paste mode (ctrl-D to finish)

time{
import org.bdgenomics.adam.rdd.ADAMContext
import org.bdgenomics.adam.projections.{AlignmentRecordField, Projection}

val ac = new ADAMContext(sc)
val reads = ac.loadAlignments(
  "/media/data_ssd/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam",
  projection = Some(
    Projection(
      AlignmentRecordField.sequence,
      AlignmentRecordField.readMapped,
      AlignmentRecordField.mapq
    )
  )
)

val kmers = reads.flatMap(_.getSequence.sliding(10).map(k => (k, 1L))).reduceByKey(_ + _).map(_.swap).sortByKey(ascending = false)

kmers.take(10).foreach(println)
}

// Exiting paste mode, now interpreting.

(114977,AAAAAAAAAA)
(38703,NNNNNNNNNN)
(35542,CCCCCCCCCC)
(34159,GGGGGGGGGG)
(18949,CACACACACA)
(18435,ACACACACAC)
(18024,GTGTGTGTGT)
(17909,TGTGTGTGTG)
(12169,CTTTTTTTTT)
Elapsed time: 2197093 ms

@aorjoa
Copy link
Author

aorjoa commented Jan 12, 2016

QFS list file ADAM

# qfsshell -s 192.168.5.201 -p 20000 -q -- stat /qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam
this log

File:             /qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam
ctime:            5225
mtime:            5225
Size:             162987186
Id:               312
Replication:      1
Chunks:           0
Files:            54
Dirs:             1
Owner:            0
Group:            0
Mode:             755
MinTier:          15
MaxTier:          15

# qfsshell -s 192.168.5.201 -p 20000 -q -- stat /qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam/part-r-00025.gz.parquet
this command get log

File:             /qfs/tmp/HG00096.chrom20.ILLUMINA.bwa.GBR.exome.20120522.adam/part-r-00025.gz.parquet
ctime:            5239
mtime:            5239
Size:             3905765
Id:               336
Replication:      3
Chunks:           1
Files:            0
Dirs:             0
Owner:            0
Group:            0
Mode:             644
MinTier:          15
MaxTier:          15

It's seem like QFS look ADAM as normal directory and set only 1 replication. (but kind of parquet show 3 replication same as we expected)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment