Founder @ Rampage Esports/Startups/Research/STEM; I suck at math, M is for Memes. Tech optimist, indoor plant serial killer, digital n̶o̶mad 🖤 ☕&🎬
ibaikov.com
2 Followers
Recent casts
Wrote a thread on zero width characters and LLMs, hope you'll find it interesting
0 replies
0 recasts
0 reactions
Did you know there are invisible zero pixels wide symbols? There are some in this line. 'Did' has 5.
Some unicode characters can’t be seen in most text editors and websites. They can be used for fingerprinting, and affect lots of stuff including LLMs!
1 reply
0 recasts
0 reactions
Top casts
Did you know there are invisible zero pixels wide symbols? There are some in this line. 'Did' has 5.
Some unicode characters can’t be seen in most text editors and websites. They can be used for fingerprinting, and affect lots of stuff including LLMs!
1 reply
0 recasts
0 reactions
There are 13 unicode characters like this, the most known are: zero width space, non-joiner and word joiner.
Some use it to watermark text and if somebody posts a copy it would be obvious that they stole it. You can also make multiple versions of a text and encode IDs or usernames of the people you are sending it to.
1 reply
0 recasts
0 reactions
These symbols affect parsers, URL encoding and might affect SEO. Links might be corrupted if these characters are there.
Surprisingly, there are symbols that are one character long, map as two tokens and are invisible.
1 reply
0 recasts
0 reactions
This leads to hidden token overspending attack: placing invisible characters will consume more tokens than anticipated. An invisible string, seemingly 0 characters long, can max out input tokens.