Skip to content

Instantly share code, notes, and snippets.

@minaguib
Created September 11, 2017 19:39
Show Gist options
  • Save minaguib/1cbe29922b06d50755a2f580b8c343fa to your computer and use it in GitHub Desktop.
Save minaguib/1cbe29922b06d50755a2f580b8c343fa to your computer and use it in GitHub Desktop.
A quick linux sparse file primer
# Create a 10-Meg file
$ dd if=/dev/zero of=foo bs=1024 count=10240
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 0.0142283 s, 737 MB/s
# Check with ls = 10M
$ ls -la foo
-rw-rw-r-- 1 mina mina 10485760 Sep 11 15:32 foo
# Check on-disk usage = 10M
$ du -sh foo
10M foo
# Stop using du ; ls with -s can also show on-disk usage:
# 10M on disk, 10M presented
$ ls -lash foo
10M -rw-rw-r-- 1 mina mina 10M Sep 11 15:32 foo
# stat the file - 10M, 20480 blocks (of 512 bytes each, 20480 * 512 = 10M)
$ stat foo
File: ‘foo’
Size: 10485760 Blocks: 20480 IO Block: 4096 regular file
Device: 801h/2049d Inode: 419430546 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1004/ mina) Gid: ( 1005/ mina)
Access: 2017-09-11 15:32:36.428688277 -0400
Modify: 2017-09-11 15:32:36.443688551 -0400
Change: 2017-09-11 15:32:36.443688551 -0400
Birth: -
# We can actually list the 20480 blocks - depends on the FS - this is xfs:
$ xfs_bmap foo
foo:
0: [0..20479]: 132784608..132805087
# Let's punch 2 holes, range 0-1M then range 5M-6M
$ fallocate -p -o 0 -l 1M foo
$ fallocate -p -o 5M -l 1M foo
# On-disk usage is now 8M, but file still presents as 10M:
$ ls -lash foo
8.0M -rw-rw-r-- 1 mina mina 10M Sep 11 15:33 foo
# stat confirms we're only using 16384 blocks (*512 bytes each = 8M)
$ stat foo
File: ‘foo’
Size: 10485760 Blocks: 16384 IO Block: 4096 regular file
Device: 801h/2049d Inode: 419430546 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1004/ mina) Gid: ( 1005/ mina)
Access: 2017-09-11 15:32:36.428688277 -0400
Modify: 2017-09-11 15:33:57.049159380 -0400
Change: 2017-09-11 15:33:57.049159380 -0400
Birth: -
# Examining the block extents - note that holes no longer have associated underlying blocks on disk:
$ xfs_bmap foo
foo:
0: [0..2047]: hole
1: [2048..10239]: 132786656..132794847
2: [10240..12287]: hole
3: [12288..20479]: 132796896..132805087
# Trying to punch a hole that's not a multiple of 8 blocks is rounded down to 8 blocks:
# Less than 8 blocks:
$ fallocate -p -o 7M -l 3999 foo
# Is ignored - No new holes punched:
$ xfs_bmap foo
foo:
0: [0..2047]: hole
1: [2048..10239]: 132786656..132794847
2: [10240..12287]: hole
3: [12288..20479]: 132796896..132805087
# 9 blocks:
$ fallocate -p -o 7M -l 4608 foo
# Is rounded to 8 - new hole 8 blocks wide is punched:
$ xfs_bmap foo
foo:
0: [0..2047]: hole
1: [2048..10239]: 132786656..132794847
2: [10240..12287]: hole
3: [12288..14335]: 132796896..132798943
4: [14336..14343]: hole
5: [14344..20479]: 132798952..132805087
# Using a tool that's not aware of sparse files will yield NULL bytes to cover the holes when reading: - total 10M read
cat foo | wc -c
10485760
# Using a tool that seeks/writes to offsets where there's a hole:
$ dd if=/dev/zero of=foo conv=notrunc bs=512 count=1024 seek=0
1024+0 records in
1024+0 records out
524288 bytes (524 kB) copied, 0.00128071 s, 409 MB/s
# Causes the hole to be "vivified" by having disk blocks allocated to cover the writen range:
$ xfs_bmap foo
foo:
0: [0..1023]: 131060504..131061527
1: [1024..2047]: hole
2: [2048..10239]: 132786656..132794847
3: [10240..12287]: hole
4: [12288..14335]: 132796896..132798943
5: [14336..14343]: hole
6: [14344..20479]: 132798952..132805087
# Note that block allocation from disk happens in chunks of 8 blocks (*512 bytes each = 4096 bytes)
# whether you're writing/appending to a new file, or vivifying blocks in punched ranges
# Repeat same write as above, but instead write 1025 instead of 1024 blocks:
$ dd if=/dev/zero of=foo conv=notrunc bs=512 count=1025 seek=0
1025+0 records in
1025+0 records out
524800 bytes (525 kB) copied, 0.00124019 s, 423 MB/s
# Shows the chunk 0 grew to 1032 blocks instead of 1025 (since 1025 is not a multiple of 8):
$ xfs_bmap foo
foo:
0: [0..1031]: 131060504..131061535
1: [1032..2047]: hole
2: [2048..10239]: 132786656..132794847
3: [10240..12287]: hole
4: [12288..14335]: 132796896..132798943
5: [14336..14343]: hole
6: [14344..20479]: 132798952..132805087
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment