Last active
October 2, 2018 06:39
-
-
Save icy/fb89097221b502239fb99cf671fcbf0d to your computer and use it in GitHub Desktop.
split.rb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env ruby | |
| # Author : Ky-Anh Huynh | |
| # Date : Sep 2018 | |
| # License: MIT | |
| # Purpose: Providing N-lines from STDIN, this script creates N/P files, | |
| # Purpose: each file contains P files from the input. | |
| BEGIN { | |
| $start_group = (ENV["START"] ? ENV["START"].to_i : 0) | |
| $idx = 0 | |
| $buffers = [] | |
| $max = 5000 | |
| $groups = $start_group | |
| def ship | |
| $groups += 1 | |
| puts "flushing buffer #{$groups}, size = #{$buffers.size}" | |
| File.open("s3sync-input/input.#{$groups}.txt", "w") { |f| | |
| f.puts $buffers.join | |
| } | |
| $buffers = [] | |
| $idx = 0 | |
| end | |
| } | |
| if ($idx <= $max) | |
| $buffers << $_ | |
| $idx += 1 | |
| ship if $idx == $max | |
| end | |
| END { | |
| ship | |
| puts "number of groups: #{$groups - $start_group}" | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment