Skip to content

Instantly share code, notes, and snippets.

@jleskovar
Last active January 12, 2019 19:15
Show Gist options
  • Save jleskovar/dfc545148398d81715da02f61bf39b91 to your computer and use it in GitHub Desktop.
Save jleskovar/dfc545148398d81715da02f61bf39b91 to your computer and use it in GitHub Desktop.
btcd watchdog
#!/bin/bash
POST_INIT_SYNC_DELAY=60
POLL_DELAY=60
STALL_THRESHOLD=5
if [ -z `pidof btcd` ]; then
echo "Starting btcd"
nohup btcd &
sleep $POST_INIT_SYNC_DELAY
fi
stalls=0
while true; do
start=`btcctl --notls getinfo | jq -r .blocks`
sleep $POLL_DELAY
end=`btcctl --notls getinfo | jq -r .blocks`
echo "Processed $((end - start)) blocks in the last $POLL_DELAY seconds"
if [[ "$start" == "$end" ]]; then
if (( stalls > STALL_THRESHOLD )); then
echo "Too many stalls detected. Restarting btcd..."
kill `pidof btcd`
sleep 10
nohup btcd &
stalls=0
else
syncnode=`btcctl --notls getpeerinfo | jq -r '.[] | select(.syncnode == true) | .addr' | cut -f1 -d:`
if [ -z "$syncnode" ]; then
echo "Stall detected, but no syncnode found. Restarting btcd..."
kill `pidof btcd`
sleep 10
nohup btcd &
stalls=0
else
echo "Stall detected! Evicting potentially bad node $syncnode"
btcctl --notls node disconnect $syncnode
stalls=$(( stalls + 1 ))
fi
fi
fi
done
@Sjors
Copy link

Sjors commented Dec 14, 2017

For OSX you'll need a replacement for pidof, e.g. brew install pidof.

I also had to remove the --notls bit, otherwise I'd get net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02\x16"

@Sjors
Copy link

Sjors commented Dec 15, 2017

This also won't work if you have multiple instances of btcd running, e.g. one for testnet and one for mainnet, because pidof btcd will just pick the first one.

@guggero
Copy link

guggero commented Jan 14, 2018

thanks, very useful! I've been trying to sync my btcd for three days now. Hopefully with the watchdog it will now work without interruptions.

@bajohns
Copy link

bajohns commented Jan 21, 2018

This is working very well for me; thanks for posting

@xelawafs
Copy link

@Sjors. I have both btcd mainnet and testnet running. By first one do you mean the service started first of the two? It seems to be working fine for me with mainnet so far. I had testnet already synced at 100%, shut btcd down, restarted on mainnet then resumed testnet

@adiack
Copy link

adiack commented Apr 20, 2018

Works like a charm, thank you. In my case I only had to remove --notls .
./watchdog_btcd.sh

+ POST_INIT_SYNC_DELAY=60
+ POLL_DELAY=60
+ STALL_THRESHOLD=5
++ pidof btcd
+ '[' -z 5465 ']'
+ stalls=0
+ true
++ jq -r .blocks
++ btcctl getinfo
+ start=384672
+ sleep 60
++ btcctl getinfo
++ jq -r .blocks
+ end=384672
+ echo 'Processed 0 blocks in the last 60 seconds'
Processed 0 blocks in the last 60 seconds
+ [[ 384672 == \3\8\4\6\7\2 ]]
+ ((  stalls > STALL_THRESHOLD  ))
++ btcctl getpeerinfo
++ jq -r '.[] | select(.syncnode == true) | .addr'
++ cut -f1 -d:
+ syncnode=217.23.8.80
+ '[' -z 217.23.8.80 ']'
+ echo 'Stall detected! Evicting potentially bad node 217.23.8.80'
Stall detected! Evicting potentially bad node 217.23.8.80
+ btcctl node disconnect 217.23.8.80
2018-04-20 09:28:00.697 [INF] SYNC: Lost peer 217.23.8.80:8333 (outbound)
2018-04-20 09:28:00.697 [INF] SYNC: Syncing to block height 519094 from peer 83.248.113.248:8333
+ stalls=1
+ true
++ jq -r .blocks
++ btcctl getinfo
+ start=384672
+ sleep 60
2018-04-20 09:28:00.977 [INF] SYNC: New valid peer 5.15.98.67:8333 (outbound) (/Satoshi:0.16.0/)
2018-04-20 09:28:01.391 [INF] SYNC: Processed 1 block in the last 7m29.19s (2 transactions, height 384673, 2015-11-21 19:38:21 +0000 UTC)
2018-04-20 09:28:11.851 [INF] SYNC: Processed 3 blocks in the last 10.46s (1207 transactions, height 384676, 2015-11-21 19:47:05 +0000 UTC)
2018-04-20 09:28:24.364 [INF] SYNC: Processed 6 blocks in the last 12.51s (3072 transactions, height 384682, 2015-11-21 20:19:26 +0000 UTC)
2018-04-20 09:28:36.536 [INF] SYNC: Processed 2 blocks in the last 12.17s (3743 transactions, height 384684, 2015-11-21 20:55:52 +0000 UTC)
2018-04-20 09:28:52.387 [INF] SYNC: Processed 4 blocks in the last 15.85s (2171 transactions, height 384688, 2015-11-21 21:24:00 +0000 UTC)

@githorray
Copy link

I was having issues with the script being able to ban stalled ipv6 hosts. It is easier to ban by node id than ip.

syncnode=`btcctl --notls getpeerinfo | jq -r '.[] | select(.syncnode == true) | .id'

@neogeno
Copy link

neogeno commented Aug 9, 2018

This helped a lot

@ccdle12
Copy link

ccdle12 commented Oct 3, 2018

Thank you, very helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment