Varun Srinivasan
@v
We're making the Warpcast spam dataset public. Over 400,000 accounts have been processed by our model, which determines the accounts that are most likely to generate inauthentic content or unwanted notifications. https://github.com/warpcast/labels
32 replies
61 recasts
212 reactions
Varun Srinivasan
@v
Developers can use this to protect their apps from spammy users. Spam labels are provided as a JSONL file which follows the FIP: Labels specification (still in review). Data will be updated weekly with the latest labels.
2 replies
0 recast
12 reactions
Varun Srinivasan
@v
While we've taken a lot of care to correct mistakes, its possible that a small number of legitimate accounts are misclassified. If you notice this, please reply to this thread or DM me. We will use these reports to improve the model.
7 replies
0 recast
12 reactions
C O M P Ξ Z 🧬
@compez.eth
I think parsing a file like this wouldn't be interesting in terms of resource consumption! I'll be converting Essen to an API soon.
1 reply
0 recast
0 reaction