darosior · March 27, 2026 18:43
diff --git a/2015-01-bitcoin-dev.log b/2015-01-bitcoin-dev.log
 2015-01-03 21:34:02	0|michagogo|Hmm
 2015-01-03 21:34:16	0|michagogo|My syncing seems to be stalled
 2015-01-03 21:34:21	0|sipa|:(
 2015-01-03 21:34:38	0|michagogo|It's been at 337077 for a very long time now
 2015-01-03 21:34:44	0|michagogo|(this is on v0.10.0rc1)
 2015-01-03 21:34:52	0|sipa|;;blocks
 2015-01-03 21:34:53	0|gribble|337339
 2015-01-03 21:35:20	0|michagogo|getblockchaininfo says I have all the headers
 2015-01-03 21:35:38	0|sipa|what does getchaintips say?
 2015-01-03 21:35:58	0|michagogo|https://www.irccloud.com/pastebin/MjcFrbB7
 2015-01-03 21:36:58	0|sipa|and getpeerinfo?
 2015-01-03 21:37:06	0|sipa|any blocks in flight?
 2015-01-03 21:37:10	0|michagogo|https://www.irccloud.com/pastebin/CzcL4eHu
 2015-01-03 21:37:11	0|michagogo|yes
 2015-01-03 21:37:24	0|michagogo|337078, from the first peer listed
 2015-01-03 21:40:11	0|michagogo|Any ideas?
 2015-01-03 21:42:56	0|michagogo|Ah
 2015-01-03 21:43:11	0|michagogo|Okay, I closed the connection to that peer and now I'm synced up
 2015-01-03 21:43:20	0|michagogo|;;blocks
 2015-01-03 21:43:21	0|gribble|337340
 2015-01-03 21:43:35	0|michagogo|Question is, though, why did this happen?
 2015-01-03 21:43:53	0|michagogo|I thought the whole point of headers-first was to not let this kind of thing happen
 2015-01-03 21:46:05	0|sipa|indeed, i don't understand it
 2015-01-03 21:48:27	0|moa|michagogo:  i saw it happen once also, back in Oct(?) using git head, got stuck waiting for one block in flight from one peer ... think I restarted and never saw it happen again
 2015-01-03 21:48:56	0|michagogo|sipa: does anything get logged that could help explain this?
 2015-01-03 21:49:11	0|moa|and was on testnet btw
 2015-01-03 21:50:28	0|michagogo|BTW, it took exactly 80 seconds to get up to date after I closed that socket
 2015-01-03 21:51:39	0|moa|how do you disconnect a single peer?
 2015-01-03 21:51:51	0|michagogo|moa: http://www.nirsoft.net/utils/cports.html
 2015-01-03 21:52:17	0|moa|ah, thought you have been referring to a bitcoin rpc command
 2015-01-03 21:52:21	0|michagogo|Ah, nope
 2015-01-03 21:52:34	0|moa|which i believe there is an issue out there somewhere for ..
 2015-01-03 21:56:25	0|morcos|you gon't think this and #5588 could be related to https://github.com/bitcoin/bitcoin/pull/5463 ?
 2015-01-03 21:56:39	0|dfletcher|is walletnotify fired if a fork affects local wallet transactions? and what would the hint that this happened in the json? confirmations would go back to zero or go negative or something? or the tx just dissapears?
 2015-01-03 22:01:08	0|morcos|sipa: not sure if you talked more with ajweiss, but we think #5463 or something that addresses that problem would be good to get in for 0.10
 2015-01-03 22:14:54	0|phantomcircuit|sipa, i would guess that there's some peers which stall sending the block, is there a timeout in asking a different peer for that block?
 2015-01-03 22:21:05	0|Jouke|Hmm, with 10rc1 I was connected to a very slow node and it slowed the syncing a lot (starting from block 0), but an hour later I lost connection to that vps, so couldn't investigate further.
 2015-01-03 22:21:38	0|gmaxwell|Jouke: How do you know that it "it slowed the syncing a lot"
 2015-01-03 22:22:16	0|Jouke|looking at peerinfo, I saw there were blocks inflight from that node
 2015-01-03 22:22:19	0|gmaxwell|By design it cannot slow the syncing more than not having a connection in that slot.  (not that there might not be a bug, but thats why I'm skeptical of the claim without some explination as to why you believe that)
 2015-01-03 22:22:42	0|gmaxwell|Jouke: That doesn't mean that it was making it slower, it adapatively requests fewer blocks from slower peers.
 2015-01-03 22:23:17	0|gmaxwell|Jouke: there is a rolling window of 1000 blocks in flight, and if it finds itself with the window full and waiting on a peer, it will disconnect the peer that has stalled the window.
 2015-01-03 22:23:21	0|Jouke|But it was waiting for that peer to finish seeding those blocks that were in flight
 2015-01-03 22:23:41	0|gmaxwell|Jouke: why do you believe it was waiting?
 2015-01-03 22:24:03	0|Jouke|because for minutes it was the only peer with those blocks in flight
 2015-01-03 22:24:30	0|gmaxwell|Jouke: that doesn't mean it was waiting. It downloads blocks out of order.
 2015-01-03 22:25:24	0|Jouke|there were no other blocks in flight at other peers and getinfo was "stuck" while those blocks were in flight
 2015-01-03 22:26:39	0|gmaxwell|Jouke: sounds like a bug then, since by design it shouldn't be able to get into that state. If you observe something like that again, please capture the getpeerinfo output and the debug.log.
 2015-01-03 22:27:27	0|Jouke|yeah, wanted to, but then isp just blocked all trafic on that vps
 2015-01-03 22:27:52	0|Jouke|($89 dollar per three year vps)
 2015-01-03 22:28:44	0|gmaxwell|huh. did they refund your money?
 2015-01-03 22:28:49	0|phantomcircuit|gmaxwell, is there a timeout for pulling a block from a specific peer?
 2015-01-03 22:29:30	0|gmaxwell|phantomcircuit: at runtime? yes. IIRC 10 minutes. During initial sync it's primarily controlled by your other peers and the download window.
 2015-01-03 22:29:57	0|gmaxwell|phantomcircuit: it can't be too short or you'll make it easy to partition the network with DDOS attacks, and make it impossible to run a node on a slow link.
 2015-01-03 22:30:55	0|gmaxwell|(presumably we could make the timeout configurable; for those who know they are on a fast link... though I think in general the main p2p protocol should be optimizing for anti-partitioning and robustness, not latency. Use the relay network protocol to minimize latency.)
 2015-01-03 22:31:45	0|sdaftuar|i think pr 5463 tries to introduce a timeout, but currently stall detection is just based on a window of certain number of blocks...?
 2015-01-03 22:32:29	0|Jouke|Yeah, it was not a huge problem, on an other new vps it worked just fine.
 2015-01-03 22:33:44	0|phantomcircuit|gmaxwell, if you already have all the headers is it really an issue?
 2015-01-03 22:35:20	0|phantomcircuit|bbl
 2015-01-03 22:35:49	0|gmaxwell|sdaftuar: oh... hm. I thought that made it in.
 2015-01-03 22:36:53	0|sdaftuar|not yet, i have a guess that this issue is what causes 5588 (just posted there).  not totally sure without seeing more logs, but i think it fits the evidence
 2015-01-03 22:37:11	0|gmaxwell|sdaftuar: we should still continue when the next block on the network shows up, however.
 2015-01-03 22:38:18	0|sdaftuar|not sure i follow -- i think the headers continue to build up, but we'll never request the block from anyone else, so the tip won't update right?
 2015-01-03 22:39:13	0|sdaftuar|(never request the stuck block, i agree the code would continue to download new blocks on the chain)
 2015-01-03 22:41:49	0|gmaxwell|hm. May be so. Headers first may have regressed our prior protection against getting stuck. (we'd eventually request the block from another peer who told us about a successor block.)
 2015-01-03 22:46:25	0|michagogo|00:23:01 <gmaxwell> By design it cannot slow the syncing more than not having a connection in that slot.  (not that there might not be a bug, but thats why I'm skeptical of the claim without some explination as to why you believe that)
 2015-01-03 22:46:38	0|michagogo|gmaxwell: I was held up on one block inflight from one peer for 2 hours
 2015-01-03 22:46:50	0|michagogo|block 337077
 2015-01-03 22:47:09	0|michagogo|(see 21:30 UTC in here)
 2015-01-03 22:47:31	0|gmaxwell|michagogo: I was referring to during initial download sync, not time at the tip.
 2015-01-03 22:47:43	0|michagogo|gmaxwell: hmm?
 2015-01-03 22:47:48	0|michagogo|This was initial sync
 2015-01-03 22:47:54	0|michagogo|Well, catch-up on starting the node
 2015-01-03 22:47:55	0|gmaxwell|The initial download "can't be worse than just not having the peer" doesn't apply unless there are at least 1000 more headers.
 2015-01-03 22:48:02	0|michagogo|ah
 2015-01-03 22:49:08	0|gmaxwell|Basically it uses all your other sync peers as a dynamic measurement of if a particular peer sucks, instead of having a hardcoded definition of suckyness.
 2015-01-03 22:49:27	0|gmaxwell|But when you're near the tip and there aren't multiple blocks in flight it's not possible to do that.
 2015-01-03 22:49:42	0|michagogo|gmaxwell: there was just one in flight, actually, as far as I could tell
 2015-01-03 22:49:58	0|michagogo|seems a bit broken to me
 2015-01-03 22:50:22	0|michagogo|As soon as I killed the socket to that one peer, I heard my computer fan get louder
 2015-01-03 22:50:27	0|michagogo|80 seconds later I was synced up
 2015-01-03 22:50:42	0|gmaxwell|michagogo: I just said, that hurestic cannot work when there aren't at least 1000 blocks ahead of you.
 2015-01-03 22:50:56	0|michagogo|gmaxwell: yeah
 2015-01-03 22:51:01	0|gmaxwell|It's not broken, it just doesn't apply there.
 2015-01-03 22:51:14	0|michagogo|gmaxwell: well, something's broken about the mechanism as a whole
 2015-01-03 22:51:18	0|michagogo|(the syncing)
 2015-01-03 22:51:40	0|michagogo|One peer delaying a block for some reason shouldn't freeze sync for 2+ hours :-/
 2015-01-03 22:52:21	0|gmaxwell|michagogo: well what do you want it to do?  Lets imagine tha tour internet connection is very slow such that it takes you two hours to download the block. If you terminate fetching you'll never make progress once you reach that point.
 2015-01-03 22:52:36	0|michagogo|gmaxwell: uh
 2015-01-03 22:52:45	0|michagogo|if it takes 2 hours to download a block you have bigger problems
 2015-01-03 22:53:14	0|gmaxwell|(the old behavior is that we'de use announcement of the next block on the network as a beacon to make progress, but it sounds like we lost that behavior)
 2015-01-03 22:53:36	0|michagogo|Even if it didn't terminate, I would expect it to simultaneously try for that block from another peer or something after a couple minutes
 2015-01-03 22:53:41	0|michagogo|or... something
 2015-01-03 22:53:58	0|gmaxwell|michagogo: today it does because there is a maximum block size, so you can say e.g. if you can't download a block in 20 minutes you're screwed anyways. But that doesn't apply where people want to do things like remove or greatly increase the maximum.
 2015-01-03 22:54:18	0|gmaxwell|michagogo: simultaneously means that it now takes 4 hours to fetch that block. :P
 2015-01-03 22:54:34	0|michagogo|gmaxwell: um, in any case you're screwed if it takes 2 hours to fetch a block
 2015-01-03 22:54:46	0|michagogo|Because if you assume that they're coming in every 10 minutes you'll never catch up
 2015-01-03 22:54:52	0|gmaxwell|michagogo: :( did you read what I wrote where it starts with "today"
 2015-01-03 22:54:59	0|michagogo|I did
 2015-01-03 22:55:17	0|michagogo|But if block sizes are increased to the point where it takes more than 10 minutes on average to download a block...
 2015-01-03 22:55:23	0|michagogo|you're still screwed
 2015-01-03 22:55:32	0|michagogo|Unless you mean one huge megablock among many small ones
 2015-01-03 22:55:35	0|Jouke|10 minutes seems like a goot time to switch a peer
 2015-01-03 22:55:39	0|gmaxwell|michagogo: not all blocks are the average size.
 2015-01-03 22:55:56	0|gmaxwell|(even now there is a 2:1 difference between th average and the maximum)
 2015-01-03 22:56:15	0|michagogo|Hm, do we have any mechanism to learn about the size of a block?
 2015-01-03 22:56:21	0|michagogo|(before fetching it)
 2015-01-03 22:56:33	0|gmaxwell|michagogo: not in advance, which is annoying. If we replace getheaders with something else, that should be fixed)
 2015-01-03 22:57:06	0|gwillen|if you know a rough estimate of the current average transaction rate and size, you can make estimates like "I am currently downloading the bytes of this block more slowly than new bytes of transactions are being created"
 2015-01-03 22:57:18	0|gwillen|"therefore I better give up and try something else"
 2015-01-03 22:57:39	0|michagogo|Somehow, I don't think it's a good idea to release an 0.10.0 that will stall for 2+ hours waiting on a peer to send a block when catching up
 2015-01-03 22:57:48	0|michagogo|ACTION goes to file an issue
 2015-01-03 22:58:01	0|michagogo|What information should I be providing to maximize helpfulness?
 2015-01-03 22:58:11	0|gmaxwell|Getheaders sends a tx_count with the headers but it's always zero.
 2015-01-03 22:58:21	0|michagogo|gmaxwell: m(
 2015-01-03 22:58:22	0|gmaxwell|michagogo: yea sure, I consider this blocking.
 2015-01-03 22:58:37	0|gwillen|another thing you could try would be estimating the user's connection speed
 2015-01-03 22:58:38	0|michagogo|What is that tx_count supposed to mean?
 2015-01-03 22:58:39	0|gmaxwell|My argument above was only trying to make the point that it's not just a trivial thing to handle.
 2015-01-03 22:58:48	0|michagogo|gmaxwell: Okay, fair enough
 2015-01-03 22:58:51	0|gwillen|and get mad if you are currently downloading at less than, say, half that
 2015-01-03 22:59:00	0|michagogo|But I feel like even a 10-15 minute timeout, right now, would be fine
 2015-01-03 22:59:11	0|michagogo|If we introduced larger blocks, we could... change that.
 2015-01-03 22:59:12	0|Jouke|gwillen: but it could be your own connection
 2015-01-03 22:59:54	0|gwillen|well, I handwaved away the process of figuring out your own connection speed
 2015-01-03 22:59:55	0|gmaxwell|(rebroad previously, back in 0.9.x -- where we wouldn't get stuck -- was trying in his usual fashion to get us take a patch that kick (ban?) peers after it took a minute or two to fetch a block.. so I'm a bit defensive about over simplifications of this. :) )
 2015-01-03 23:00:13	0|Jouke|If we introduce lager blocks, you should still  have less then 10 minutes time to get blocks
 2015-01-03 23:00:30	0|michagogo|Jouke: right, that's something that would need to be figured out
 2015-01-03 23:00:48	0|gmaxwell|Jouke: on average, but that doesn't speak to any particular block.
 2015-01-03 23:00:55	0|firelegend|why do nodes with newer protocol send an addr message without the timestamp?
 2015-01-03 23:01:03	0|michagogo|gmaxwell: so what information, if any, should I include in the issue?
 2015-01-03 23:01:06	0|gmaxwell|Also, the failure mode if the network is too fast for you should be clean, not be one that lets you get partitioned.
 2015-01-03 23:01:15	0|gmaxwell|michagogo: I think we already have an issue open.
 2015-01-03 23:01:20	0|michagogo|Ah, okay
 2015-01-03 23:01:21	0|michagogo|ACTION looks
 2015-01-03 23:01:26	0|Jouke|gmaxwell: right :)
 2015-01-03 23:01:31	0|gmaxwell|michagogo: https://github.com/bitcoin/bitcoin/issues/5588
 2015-01-03 23:02:04	0|gwillen|gmaxwell: it seems like the failure mode where the network is inherently too fast for you is going to be extremely rare
 2015-01-03 23:02:17	0|gwillen|at lest in the near term
 2015-01-03 23:02:49	0|michagogo|gmaxwell: I'm not sure this is the same issue
 2015-01-03 23:03:04	0|Jouke|maybe if you are only connected through tor? (don't have any experience with that use case)
 2015-01-03 23:03:10	0|gmaxwell|michagogo: pretty sure its exactly the same issue.
 2015-01-03 23:03:14	0|gmaxwell|Jouke: nah, tor is pretty fast.
 2015-01-03 23:03:28	0|gmaxwell|Jouke: keep in mind that the maximum rate for the network is 14kbit/sec.
 2015-01-03 23:03:59	0|michagogo|gmaxwell: hm? I don't see anything there about a wrong or stale block
 2015-01-03 23:04:17	0|michagogo|(no failures logged, etc, the UpdateTips just stop)
 2015-01-03 23:05:14	0|michagogo|What getchaintips showed as active was in the main chain, and it did have the chain up to date headers-only
 2015-01-03 23:05:55	0|gmaxwell|gwillen is indeed correct. But I think that its likely that people who are not thoughtful or are not long term focused will force onto the network a hard fork that makes the maximum size effectively unlimited. It would be disingenuous for me to sit quietly when people propose protocol behavior that will result in network failures with much larger blocks and then later turn around and use those short
 2015-01-03 23:06:01	0|gmaxwell|comings to argue against allowing larger blocks...
 2015-01-03 23:06:24	0|gwillen|heh, *nods*
 2015-01-03 23:06:41	0|michagogo|And looking at the log, I do see block files being switched
 2015-01-03 23:07:21	0|michagogo|Maybe the underlying issue is the same, but our two cases *seem* to be different as far as I can tell
 2015-01-03 23:07:35	0|gmaxwell|michagogo: you see block files being switched because it downloaded all th pending headers beyond that one. There just weren't enough of them to trigger disconnection of the stalled peer.  Ultimately I think the behavior is the same: you're waiting on a block to show up, and you'll wait forever.
 2015-01-03 23:09:57	0|Jouke|gmaxwell: in my debug log of that latest node it says that there were peers stalling and they were disconnected. Are those peers really disconnected from the node, or only for the download stage?
 2015-01-03 23:10:43	0|Jouke|Can one get info on old peers?
 2015-01-03 23:10:49	0|moa|so block size is a potentially network splitting problem as well as de-decentralisation?
 2015-01-03 23:14:39	0|Jouke|I think all my tor peers were disconnected because they were too slow.
 2015-01-03 23:23:39	0|Jouke|Ok, found in the logs as well. But indeed, all the tor nodes I was connected to during initial sync were disconnected.
 2015-01-03 23:31:01	0|gmaxwell|Jouke: actually disconnected.
 2015-01-03 23:32:41	0|gmaxwell|moa: yes, but there isn't a clear bright line. Like, pretty obviously almost everyone can keep up with 1MB blocks... and pretty obviously only a large corporations could keep up with 10 gigabyte blocks. Exactly what the decenteralization vs operating cost curve is... is unknown. And the costs change over time as bandwidth/storage/cpu change in price.
 2015-01-03 23:33:42	0|moa|sounds like an optimisation problem
 2015-01-03 23:35:16	0|Jouke|I am nou hosting a full node at a cost of 2 euro's per month. (still have to see how reliable it actually is)
 2015-01-03 23:36:06	0|Jouke|That is the cheapest node I have atm.
 2015-01-03 23:36:48	0|justanotheruser|maybe we can extrapolate on the number of full nodes vs block size and extrapolate to estimate the number of full nodes at any block size
 2015-01-03 23:37:12	0|justanotheruser|s/can extrapolate on/can take/
 2015-01-03 23:57:42	0|gmaxwell|justanotheruser: Doubt it works like that. :)
 2015-01-04 00:07:37	0|justanotheruser|extrapolation doesn't usually work with bitcoin in general does it :P
 2015-01-04 00:09:26	0|moa|extrapolation doesn't usually work in general.

 ...

 2015-01-07 20:30:32	0|ajweiss|so it looks to me like if block download timeouts need a backoff, the same would need to happen to ping timeouts
 2015-01-07 20:30:54	0|ajweiss|since ping replies get backed up behind blocks and there's a hard timeout at 20m
 2015-01-07 20:36:22	0|gmaxwell|ajweiss: ideally it would be better if we were distinguishing block in flight (e.g. tarpit) vs sending nothing.
 2015-01-07 20:36:38	0|sipa|?
 2015-01-07 20:39:06	0|Luke-Jr|gmaxwell: we want to ban/disconnect on tarpit too..?
 2015-01-07 20:39:46	0|sipa|what is tarbit?
 2015-01-07 20:39:54	0|gmaxwell|Luke-Jr: after some limit, but if its not even sending anything we can be more agressive.
 2015-01-07 20:40:17	0|gmaxwell|sipa: e.g. a tarpit is a peer that is sending block data but very slowly; as opposed to not sending anything at all.
 2015-01-07 20:41:06	0|Luke-Jr|sipa: in the TCP sense, a tarpit is a peer that refuses to let you close/reset an established connection
 2015-01-07 20:42:01	0|ajweiss|i don't see the benefit, a hopeless transfer is a hopeless transfer, no?
 2015-01-07 20:42:15	0|gmaxwell|ajweiss: the benefit of what?
 2015-01-07 20:42:34	0|ajweiss|if we're gettin 'em slower than they're makin 'em, it just isn't gonna work out regardless
 2015-01-07 20:42:49	0|ajweiss|distinguishing tarpit v. nothing
 2015-01-07 20:42:57	0|gmaxwell|ajweiss: thats not true. ::sigh:: as I said the other day, a _single_ block taking longer than average is fine.
 2015-01-07 20:43:27	0|gmaxwell|And it's preferable to give up early if we can safely do so, which is why distinguishing makes sense.
 2015-01-07 20:44:01	0|ajweiss|i briefly considered something like that, but that looked like monkeying around in the network stack too much
 2015-01-07 20:44:06	0|teward|gmaxwell: you mean bitcoin-qt? If that's the case it never said that anywhere
 2015-01-07 20:44:17	0|teward|gmaxwell: responding to the time delta issue i had earlier
 2015-01-07 20:44:20	0|teward|(it never said that)
 2015-01-07 20:44:55	0|ajweiss|ideally we'd see it's an incoming block and track the incoming data rate as it comes, but i'm not sure if it'd be worth retrofitting the network stuff
 2015-01-07 20:45:02	0|gmaxwell|ajweiss: if you were hyper agressive at killing connections right at the limit of the average time, its possible for a network with adequate capacity to simply stop working all on its own when chance synchronization causes delays and a storm of repeated transmissions that aren't successful. (though it's unlikely unless things really are right at the limit)
 2015-01-07 20:45:26	0|gmaxwell|ajweiss: yea probably not worth doing that right now. Though we could tell by virtue of responses to pings.
 2015-01-07 20:45:36	0|ajweiss|yeah but i'm not arguing for average, we have a hard max, no?
 2015-01-07 20:45:52	0|ajweiss|i suppose that will go away in future, but for now...
 2015-01-07 20:46:20	0|gmaxwell|ajweiss: A maximum will not go away while I'm still involved with Bitcoin.
 2015-01-07 20:46:59	0|gmaxwell|That isn't the point I'm making. Just assume for a minute that all blocks are the same size, and that the network has just enough capacity to handle that.
 2015-01-07 20:47:24	0|gmaxwell|ajweiss: if there is just a blurp in one part of the network where the capacity goes too low, then the whole thing can collapse.
 2015-01-07 20:47:46	0|gmaxwell|Their a theoretical weakness but one that ought to be mitigated.
 2015-01-07 20:48:22	0|ajweiss|sure
 2015-01-07 20:48:45	0|gmaxwell|s/Their/It's/
 2015-01-07 20:48:57	0|ajweiss|that mitigation will also involve adjusting the ping stuff too
 2015-01-07 20:49:15	0|gmaxwell|Indeed. It's a really good spot there.
 2015-01-07 20:49:59	0|ajweiss|incidentally, under severe bandwidth constraints i also saw where it would get into connect -> timeout no messages in 60s disconnect loops
 2015-01-07 20:51:21	0|gmaxwell|yes, I'm not actually too concerned about that in that once you're up you're up. Many tcp stacks fail after 2 minutes in connect in any case.
 2015-01-07 20:52:02	0|ajweiss|yeah and my tests used trickle, which will act differently there anyway
 2015-01-07 20:54:44	0|gmaxwell|ajweiss: so it's funny, there was previously a PR that decreased the ping timeout to 1 minute. I was pretty unhappy about that, but didn't think of the head of line blocking on block transmission...  Wumpus argued for 20 minutes. I was happy with 5. I guess we dodged a bullet on that. :-/
 2015-01-07 20:55:01	0|ajweiss|hahaha yeah
 2015-01-07 20:55:05	0|ajweiss|i wasn't expecting it

 ...
No results found