
Founder @ Rampage Esports/Startups/Research/STEM; I suck at math, M is for Memes. Tech optimist, indoor plant serial killer, digital n̶o̶mad 🖤 ☕&🎬 ibaikov.com
2 Followers
Did you know there are invisible zero pixels wide symbols? There are some in this line. 'Did' has 5. Some unicode characters can’t be seen in most text editors and websites. They can be used for fingerprinting, and affect lots of stuff including LLMs!
There are 13 unicode characters like this, the most known are: zero width space, non-joiner and word joiner. Some use it to watermark text and if somebody posts a copy it would be obvious that they stole it. You can also make multiple versions of a text and encode IDs or usernames of the people you are sending it to.
These symbols affect parsers, URL encoding and might affect SEO. Links might be corrupted if these characters are there. Surprisingly, there are symbols that are one character long, map as two tokens and are invisible.
This leads to hidden token overspending attack: placing invisible characters will consume more tokens than anticipated. An invisible string, seemingly 0 characters long, can max out input tokens.