Launching Trek, an open source web content extraction library built in Rust!

A core part of our work is to understand any link's content on the Internet. And that also means extracting metadata quickly so users can get context, e.g. in a feed.

We're building from @kepano's work on Defuddle and then some to do this.

Trek also compiles into WASM, enabling anyone to extract content data in a clean, decluttered way in your TS/JS project.

It leverages lol_html from Cloudflare to stream HTML in for content extraction instead of building the entire page as a normal scraper would and "trekking" the DOM. This means it's really fast and more importantly memory efficient.

Check out the playground here: https://officialunofficial.github.io/trek/

Docs:

making /obsidian • writing stephango.com

Launching Trek, an open source web content extraction library built in Rust!

A core part of our work is to understand any link's content on the Internet. And that also means extracting metadata quickly so users can get context, e.g. in a feed.

We're building from @kepano's work on Defuddle and then some to do this.

Trek also compiles into WASM, enabling anyone to extract content data in a clean, decluttered way in your TS/JS project.

It leverages lol_html from Cloudflare to stream HTML in for content extraction instead of building the entire page as a normal scraper would and "trekking" the DOM. This means it's really fast and more importantly memory efficient.

Check out the playground here: https://officialunofficial.github.io/trek/

Docs: https://github.com/officialunofficial/trek

cool! is there anything you learned building this that could be incorporated into defuddle?

I think having a config TOML with obvious sensible defaults would help a lot. Noticed that Defuddle would show certain elements like navigation or menus when the thresholds weren't met during declutter.

Other than that not much!

I think having a config TOML with obvious sensible defaults would help a lot. Noticed that Defuddle would show certain elements like navigation or menus when the thresholds weren't met during declutter.

Other than that not much!

https://github.com/kepano/defuddle/blob/9677af23a3c8e7f14349c6c557e30f7179d667ca/src/scoring.ts#L328