Kasra Rahjerdi
@jc4p
hey if you're wondering where the public data dump for April is, I need help figuring out the implications of my decisions with the Snapchain setup -- would love any ideas. Short of it: Snapchain prefers a 24/7 running instance, can't afford to run a heavy box 24/7, stuck on filtering 90GB of data on 8GB of RAM
10 replies
4 recasts
28 reactions
draftcode
@draftcode
have you thought about Redis to store what you to access under a ttl renewed by any criteria?
1 reply
0 recast
1 reaction
Kasra Rahjerdi
@jc4p
@christopher has a great solution with this (where you can like, do extra filtering at the redis consumption layer) https://github.com/officialunofficial/waypoint -- my main issue rn stems from doing lifetime dumps instead of incremental dumps, if i had a queue of the last day's likes or something it would be a lot easier
2 replies
0 recast
2 reactions
christopher
@christopher
That's how Waypoint and the Redis Streams work. We build the last day's worth of messages, then you create a specific consumer group (likes-consumer-group) and then provision N number of consumers to burn it down pub-sub style.
2 replies
0 recast
1 reaction
Kasra Rahjerdi
@jc4p
yeahhh that's very smart. i just... don't need persistent synchronized data.. i just want a monthly dump, but maybe that's the best move
3 replies
0 recast
1 reaction
christopher
@christopher
We have a Postgres dump every 30 minutes, and that's been good enough for data modeling and pipelines.
2 replies
0 recast
1 reaction
draftcode
@draftcode
i guess more the key on that use case is more on the definition and computing the deltas, then is just drop on a memcache of the life.
0 reply
0 recast
1 reaction