tfheen · September 20, 2010 08:56
diff --git a/gistfile1.txt b/gistfile1.txt
 08:07 < sky> phk: so we cannot actually stitch together multiple gziped items
 08:07 <@phk> sky, you lost me there ?
 08:08 < sky> phk: each esi fragment cannot be gzipped seperately
 08:08 <@phk> it sure can
 08:08 < sky> nope, the browsers aren't RFC compliant
 08:08 < sky> I tried
 08:08 < sky> curl does the right thing
 08:08 < sky> the browsers don't
 08:09 < victori> so I am guessing your proposing on the fly compression?
 08:09 < sky> my initial approach did that
 08:09 < sky> victori: yes
 08:10 < sky> phk: so we can store a gziped result of a ESI document, but then we need to invalidate it when an object is banned or TTL expires
 08:11 <@phk> sky, are you saying that the browsers do not grok Z_FULL_FLUSH / Z_FINISH ?
 08:11 -!- quodt [[email protected]] has joined #varnish
 08:12 < sky> phk: I am saying that as soon as you Z_FINISH you can't continue with another gzip stream
 08:12 <@phk> Ok, that's silly (of the browsers)
 08:12 <@phk> but you can get pretty much the same effect with Z_FULL_FLUSH
 08:12 < sky> yes, very, it is a violation of the gzip rfc
 08:12 <@phk> takes a bit more work etc.
 08:13 < sky> Z_FULL_FLUSH requires a continuous input stream of uncompressed data
 08:13 <@phk> no it doesn't
 08:13 <@phk> (but they don't tell you :-)
 08:14 < sky> you can't Z_FULL_FUSH one chunk, and then append it to another as far as I can tell and get the browser to read it
 08:14 <@phk> Z_FULL_FLUSH is the state you are in, right after the magic-string header.
 08:14 < sky> so you are saying compress each object using Z_FULL_FLUSH and then write them one at a time?
 08:15 <@phk> well, more than that.
 08:15 <@phk> to compress a ESI-component:  Gzip it, end with Z_FULL_FLUSH.  Strip the magic byte header.
 08:15 <@phk> To deliver an ESI doc:
 08:16 < sky> yes
 08:16 < sky> send gzip header, plus each component
 08:16 <@phk> send magic byte header, send N{ESI COMPONENTS}, send magic byte stop sequence.
 08:17 < sky> and on top of that, ungzip things from the backend if it is gziped
 08:17 <@phk> well, that takes you into the big kettle of fish
 08:19 < sky> do we store both gzipped and ungzipped copies?
 08:19 <@phk> that's a VCL decision
 08:19 <@phk> it affects storage use and working set size in a BIG way, so VCL has to decide.
 08:20 < sky> how? it doesn't have the notion of multiple varys right now
 08:20 <@phk> sky, what is the impact from doing delivery time gzip ?
 08:20 < sky> phk: I haven't put it in real production yet
 08:20 <@phk> ohh, so you want me to commit an untested patch ?  :-)
 08:20 < sky> but when we had apache in front of the server doing gip, it was negible
 08:21 < sky> I tend to run the latest committed version :)
 08:21 <@Mithrandir> sky: didn't we find out that machine of yours did like 2GByte/sec of gzip -6 ?
 08:22 <@Mithrandir> (it'll depend on data, obviously)
 08:22 < sky> or rather, once I commit to trunk, I can easily rebase it to 2.0 branch
 08:23 <@phk> so you're saying that simply doing delivery time gzip is feasible CPU wise with no concerns ?
 08:23 < sky> yes
 08:23 <@phk> ok, than we should do that, because doing the fetch thing is a nightmare.
 08:23 < sky> the one big downside is that we now store things uncompressed on disk
 08:24 < sky> i suggest the solution to that would be a compressed filesystem like btrfs or zfs
 08:24 <@Mithrandir> sky: have you tried btrfs in production?
 08:25 <@Mithrandir> doesn't it, like, fall over when you fill the disk and such, or has it gotten better now?
 08:25 <@phk> sky, compressed filesystems = NO-NO, that would just mean more compression/decompression load for the CPU
 08:25 < sky> phk: which is probably a worthwhile tradeoff considering we are limited on iobandwidth and have ton of cpu
 08:27 <@phk> I doubt it.
 08:27 <@Mithrandir> that's something we can leave to the sysadmin, though.
 08:27 <@phk> absolutely
	08:07 < sky> phk: so we cannot actually stitch together multiple gziped items
	08:07 <@phk> sky, you lost me there ?
	08:08 < sky> phk: each esi fragment cannot be gzipped seperately
	08:08 <@phk> it sure can
	08:08 < sky> nope, the browsers aren't RFC compliant
	08:08 < sky> I tried
	08:08 < sky> curl does the right thing
	08:08 < sky> the browsers don't
	08:09 < victori> so I am guessing your proposing on the fly compression?
	08:09 < sky> my initial approach did that
	08:09 < sky> victori: yes
	08:10 < sky> phk: so we can store a gziped result of a ESI document, but then we need to invalidate it when an object is banned or TTL expires
	08:11 <@phk> sky, are you saying that the browsers do not grok Z_FULL_FLUSH / Z_FINISH ?
	08:11 -!- quodt [[email protected]] has joined #varnish
	08:12 < sky> phk: I am saying that as soon as you Z_FINISH you can't continue with another gzip stream
	08:12 <@phk> Ok, that's silly (of the browsers)
	08:12 <@phk> but you can get pretty much the same effect with Z_FULL_FLUSH
	08:12 < sky> yes, very, it is a violation of the gzip rfc
	08:12 <@phk> takes a bit more work etc.
	08:13 < sky> Z_FULL_FLUSH requires a continuous input stream of uncompressed data
	08:13 <@phk> no it doesn't
	08:13 <@phk> (but they don't tell you :-)
	08:14 < sky> you can't Z_FULL_FUSH one chunk, and then append it to another as far as I can tell and get the browser to read it
	08:14 <@phk> Z_FULL_FLUSH is the state you are in, right after the magic-string header.
	08:14 < sky> so you are saying compress each object using Z_FULL_FLUSH and then write them one at a time?
	08:15 <@phk> well, more than that.
	08:15 <@phk> to compress a ESI-component: Gzip it, end with Z_FULL_FLUSH. Strip the magic byte header.
	08:15 <@phk> To deliver an ESI doc:
	08:16 < sky> yes
	08:16 < sky> send gzip header, plus each component
	08:16 <@phk> send magic byte header, send N{ESI COMPONENTS}, send magic byte stop sequence.
	08:17 < sky> and on top of that, ungzip things from the backend if it is gziped
	08:17 <@phk> well, that takes you into the big kettle of fish
	08:19 < sky> do we store both gzipped and ungzipped copies?
	08:19 <@phk> that's a VCL decision
	08:19 <@phk> it affects storage use and working set size in a BIG way, so VCL has to decide.
	08:20 < sky> how? it doesn't have the notion of multiple varys right now
	08:20 <@phk> sky, what is the impact from doing delivery time gzip ?
	08:20 < sky> phk: I haven't put it in real production yet
	08:20 <@phk> ohh, so you want me to commit an untested patch ? :-)
	08:20 < sky> but when we had apache in front of the server doing gip, it was negible
	08:21 < sky> I tend to run the latest committed version :)
	08:21 <@Mithrandir> sky: didn't we find out that machine of yours did like 2GByte/sec of gzip -6 ?
	08:22 <@Mithrandir> (it'll depend on data, obviously)
	08:22 < sky> or rather, once I commit to trunk, I can easily rebase it to 2.0 branch
	08:23 <@phk> so you're saying that simply doing delivery time gzip is feasible CPU wise with no concerns ?
	08:23 < sky> yes
	08:23 <@phk> ok, than we should do that, because doing the fetch thing is a nightmare.
	08:23 < sky> the one big downside is that we now store things uncompressed on disk
	08:24 < sky> i suggest the solution to that would be a compressed filesystem like btrfs or zfs
	08:24 <@Mithrandir> sky: have you tried btrfs in production?
	08:25 <@Mithrandir> doesn't it, like, fall over when you fill the disk and such, or has it gotten better now?
	08:25 <@phk> sky, compressed filesystems = NO-NO, that would just mean more compression/decompression load for the CPU
	08:25 < sky> phk: which is probably a worthwhile tradeoff considering we are limited on iobandwidth and have ton of cpu
	08:27 <@phk> I doubt it.
	08:27 <@Mithrandir> that's something we can leave to the sysadmin, though.
	08:27 <@phk> absolutely