christopher
@christopher
Launching Trek, an open source web content extraction library built in Rust! A core part of our work is to understand any link's content on the Internet. And that also means extracting metadata quickly so users can get context, e.g. in a feed. We're building from @kepano's work on Defuddle and then some to do this. Trek also compiles into WASM, enabling anyone to extract content data in a clean, decluttered way in your TS/JS project. It leverages lol_html from Cloudflare to stream HTML in for content extraction instead of building the entire page as a normal scraper would and "trekking" the DOM. This means it's really fast and more importantly memory efficient. Check out the playground here: https://officialunofficial.github.io/trek/ Docs: https://github.com/officialunofficial/trek
7 replies
9 recasts
36 reactions
Darryl Yeo 🛠️
@darrylyeo
👀 👀 👀
0 reply
0 recast
1 reaction