Created
September 13, 2017 05:57
-
-
Save gerrard00/1a2584adcc84163f4d8cc69e40022092 to your computer and use it in GitHub Desktop.
Xml Compression tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using System; | |
using System.Diagnostics; | |
using System.IO; | |
using System.Xml; | |
namespace Test | |
{ | |
class Test | |
{ | |
static void Main() | |
{ | |
var stopWatch = new Stopwatch(); | |
stopWatch.Start(); | |
using(var inputStream = File.OpenRead(@"c:\junk\supp2017.xml")) | |
{ | |
var doc = new XmlDocument(); | |
doc.Load(inputStream); | |
} | |
stopWatch.Stop(); | |
Console.WriteLine("Loading xml {0}", stopWatch.Elapsed.TotalMilliseconds); | |
Console.WriteLine("Done"); | |
} | |
} | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using System; | |
using System.Diagnostics; | |
using System.IO; | |
using System.IO.Compression; | |
using System.Xml; | |
namespace Test | |
{ | |
class Test | |
{ | |
static void Main() | |
{ | |
var stopWatch = new Stopwatch(); | |
stopWatch.Start(); | |
using(var inputStream = File.OpenRead(@"c:\junk\supp2017.xml.cmp")) | |
{ | |
using (DeflateStream decompressingStream = new DeflateStream(inputStream, CompressionMode.Decompress)) | |
{ | |
var doc = new XmlDocument(); | |
doc.Load(decompressingStream); | |
} | |
} | |
stopWatch.Stop(); | |
Console.WriteLine("Loading xml {0}", stopWatch.Elapsed.TotalMilliseconds); | |
Console.WriteLine("Done"); | |
} | |
} | |
} |
You convinced me that XmlDocument.Load()
is slow enough to be on par with the extremely slow disk I/O (especially for GB-sized documents), but still isn't slow enough to render compression worthless as you stated. So your reasoning is not completely wrong – I'm willing to remove the downvote from your answer. But this can be done only after editing it because 2+ days has elapsed. Thank you for the constructive discussion.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Indeed, the first run of the tests must have been much(?) slower than the subsequent runs. The question is whether that “much” is much or not so much, but should be clearly slower somewhat. Since it was not the case, it indicates that disk caching affected the tests.
I have downloaded the supp2017.xml file you used. (That's 572M, not 572K.) Then I used a different approach: I read the whole file into memory and measured the time of only
XmlDocument.Load()
on it:This reported 9.1-9.3 seconds on my computer. This is the cost of building the object graph in the given .NET environment, a work entirely in-memory. Now I should add the time it takes to load the file, with or without compression. In fact the question is not the sum, but the ratio of the two members of the sum to each other (graph-building vs loading). Since it's difficult to precisely measure the time it takes to load the file (due to the effect of the file system cache), I can use estimates, based on everyday practice. Actually we don't need the precise amount of time it takes to load the file, we only want to know whether is it much more or much less than 9s?
Clearly it depends on the speed of the media the XML file is being read from. I usually experience 100M/s reading speed with my HDD, therefore I estimate the cold loading of the 572M uncompressed file would be approx. 6s. When the file is compressed to 44M, it would load in 0.5s and then +3s decompression would incur (I measured it with a program similar to the above one, that read the compressed file to a
byte[]
array and then measured the time of decompression +XmlDocument.Load()
together).You are right that this “approx. 6s” is not much larger but instead smaller than the 9s cost of
XmlDocument.Load()
. Thus the cost of this in-memory work is not dwarfed by the disk I/O, and you are right that this in-memory work is the more expensive part of the process.Yet the lack of compression makes the whole process 20% slower (15s vs. 12.5s), and this gets more and more pronounced as the speed of the media decreases: