@aviationdoctor.eth
Last night, I fired up ChatGPT and quickly vibe-coded a Python script that recursively consolidates all ~500 .md files in my Obsidian vault into one, and strips out all non-semantic elements (such as callout formatting, internal links, transclusions, and attachments) using just enough RegEx voodoo.
I then created a dedicated GPT and uploaded the resulting ~50MB file as its knowledge base. I also primed the model to look for answers only within the file and say candidly if it couldn’t find any (rather than hallucinating one).
I can now trivially refresh the knowledge base by running the script again if I make enough changes to my notes in the future. File generation takes milliseconds.
Within less than an hour of tinkering from start to finish, I was able to engage with my vault like never before. No API needed, no extra cost beyond the OpenAI subscription, and no privacy breach like existing Obsidian LLM plugins incur.
The trick is to really streamline the file to keep only the semantically meaningful structure, like # headers and styling, so the LLM’s token parsing and chunking is most effective.
Next step might be to build a plugin to generate that consolidated output directly from Obsidian, with optional filters for certain tags, properties, or folders.
I love how I was able to use the LLM to refine the idea first, then generate the code, ingest the resulting file, and converse with my notes. Proper Swiss Army multitool.