Varun Srinivasan pfp
Varun Srinivasan
@v
Here's a deep dive into what caused this very painful outage. Tap into the thread for details. When you link a token to a cast like $DEGEN, it creates an embed. Our server hydrates this embed with data like the image, token creator and stores it in our database so we can show it to you quickly. When we launched token news, we added a lot of info to the token object like the casts that were included in the news. We didn't realize that our hydrator was including all the news and news casts into this object every time we created a token link. We recently ended up in a recursive state because some of the casts in the token news had token links themselves. Now a cast with token link would trigger our server to include the news object, which contained casts, some of which contained the same token link and so on..... These casts would quickly balloon in size to 5 to 10 MB each. Our feed generator tries to fetch all the casts you want to read, order them and then compress them and put them into redis. This compression made the CPU on the feed workers stall. It got very bad very fast where as soon as a worker would come up, it would start picking jobs off the queue and stall immediately. We couldn't even SSH into the box to figure out what was going on. The way we ended up handling it is shutting off various parts of the feed generation until we could figure out a few problematic lines of code. We also scaled back the processor and forced it to run very slowly so we could get into the boxes and profile things, and both of these threads eventually led to us finding the culprits and fixing them.
27 replies
25 recasts
253 reactions

bertwurst pfp
bertwurst
@bertwurst.eth
1 reply
0 recast
11 reactions

Naomi pfp
Naomi
@afrochicks
BERT
1 reply
0 recast
3 reactions

bertwurst pfp
bertwurst
@bertwurst.eth
I got to "Our server hydrates" and tapped out.
2 replies
0 recast
5 reactions