insidethesim.eth on Farcaster

Content pfp

https://ethstaker.cc

0 reply

0 recast

0 reaction

Thomas pfp

@aviationdoctor.eth

Woke up to a flurry of failed attestations from my solo validator. The NUC’s SSD finally gave up the ghost after three years of intense I/O. Not the most pleasant start of a Sunday, but I was expecting it at some point, and had a spare SSD on hand. Swapped the disks, flashed Ubuntu Server, clean-installed eth-docker, and less than one hour of tinkering later, I am now syncing from checkpoint, which should complete by tomorrow. Running a solo validator does take some effort, and the rewards of ~2.7% APY aren’t much, but putting the de- in decentralization is priceless

3 replies

1 recast

48 reactions

InsideTheSim 🎩🍪 pfp

InsideTheSim 🎩🍪

@insidethesim.eth

How over the top would it be to have a failover setup where you have say 4 drives and any 2 can fail? Idea would be to have time to replace without downtime.

1 reply

0 recast

2 reactions

Thomas pfp

@aviationdoctor.eth

For e.g. 4 drives allowing 2 to fail, you'd need a RAID 6 or higher setup. I am not aware that this can be done with the kind of intensive I/O and high-performance SSDs required for keeping up with the Ethereum chain head, to be honest. Maybe with a super powered NAS, but I'm skeptical. Would at the very least have to be a prohibitively expensive system. I guess you could rsync the primary disk to its mirror backup every 24 hours, so that disk #2 would be no more than 12 hours behind the chain head on average if disk #1 fails. Practically speaking, it would add that much more read load to disk #1 whenever rsync is active, and not only shorten its lifespan even further, but possibly throttle the SSD given the I/O load from node operations, unless you stopped the node for the duration of the rsync and missed a bunch of attestations each day. Alternatively, you could also run a second node (RPC, not validating — virtually everyone who ever got slashed made that mistake of getting a validating backup node online concurrently). Then, all you would need to do if node #1 fails is import your keys into node #2, which is a short SCP command away. But, that means buying and maintaining twice as much computing CapEx, twice as much bandwidth and power consumed, etc. Honestly I don't think any of this is worth mitigating the few hours, even few days of downtime that happens every few years when a node dies. The missed attestation penalties are quite small.

2 replies

0 recast

3 reactions