Content
@
0 reply
0 recast
0 reaction
shoni.eth
@alexpaden
New Release: https://huggingface.co/datasets/shoni/farcaster Total Records: ~18,147,313 threads Data Cutoff: 2025-06-25 (no threads newer than this date) Notes: Includes f32 and binary embeddings, new thread formatting, recursive quote hydration, and more efficient token inclusion
2 replies
0 recast
10 reactions
shoni.eth
@alexpaden
I overlooked the tokens column in parquet generation + a formatting related bug so i'll republish this later next week as well. the bug will not really effect embedding use much just inference.
0 reply
0 recast
2 reactions