Igor (baikov)

Igor

Founder @ Rampage Esports/Startups/Research/STEM; I suck at math, M is for Memes. Tech optimist, indoor plant serial killer, digital n̶o̶mad 🖤 ☕&🎬 ibaikov.com

2 Followers

Recent casts

Wrote a thread on zero width characters and LLMs, hope you'll find it interesting

  • 0 replies
  • 0 recasts
  • 0 reactions

D​i​​​​d y​o​u know there are invisible zero pixels wide symbols? There are some in this line.​ 'D​i​​​​d' has 5. Some unicode characters can’t be seen in most text editors and websites. They can be used for fingerprinting, and affect lots of stuff including LLMs!

  • 1 reply
  • 0 recasts
  • 0 reactions

Top casts

D​i​​​​d y​o​u know there are invisible zero pixels wide symbols? There are some in this line.​ 'D​i​​​​d' has 5. Some unicode characters can’t be seen in most text editors and websites. They can be used for fingerprinting, and affect lots of stuff including LLMs!

  • 1 reply
  • 0 recasts
  • 0 reactions

There are 13 unicode characters like this, the most known are: zero width space, non-joiner and word joiner. Some use it to watermark text and if somebody posts a copy it would be obvious that they stole it. You can also make multiple versions of a text and encode IDs or usernames of the people you are sending it to.

  • 1 reply
  • 0 recasts
  • 0 reactions

These symbols affect parsers, URL encoding and might affect SEO. Links might be corrupted if these characters are there. Surprisingly, there are symbols that are one character long, map as two tokens and are invisible.

  • 1 reply
  • 0 recasts
  • 0 reactions

This leads to hidden token overspending attack: placing invisible characters will consume more tokens than anticipated. An invisible string, seemingly 0 characters long, can max out input tokens.

  • 1 reply
  • 0 recasts
  • 0 reactions

Onchain profile

Ethereum addresses

    Solana addresses