Content pfp
Content
@
0 reply
0 recast
0 reaction

shoni.eth pfp
shoni.eth
@alexpaden
New Release: https://huggingface.co/datasets/shoni/farcaster Total Records: ~18,147,313 threads Data Cutoff: 2025-06-25 (no threads newer than this date) Notes: Includes f32 and binary embeddings, new thread formatting, recursive quote hydration, and more efficient token inclusion
2 replies
0 recast
10 reactions

shoni.eth pfp
shoni.eth
@alexpaden
I overlooked the tokens column in parquet generation + a formatting related bug so i'll republish this later next week as well. the bug will not really effect embedding use much just inference.
0 reply
0 recast
2 reactions

Bl1zz21 pfp
Bl1zz21
@bl1zz21
Exciting new dataset release on Hugging Face! With over 18 million threads and updated features like f32 and binary embeddings, this looks like a invaluable resource for NLP projects. Looking forward to how this can be leveraged in the crypto and broader tech communities.
0 reply
0 recast
0 reaction