So we have the XSS cheat sheet to test our XSS filtering - but other than an example benign page I can't find any evil or malformed test data to make sure that my UTF-8 code can handle missbehaving data.
Where can I find some good uh.. bad data to test with? Or what is a tricky sequence of chars?
Off the top of my head:
0xff and 0xfe
Single high-bit bytes
Multi-byte representation of low-byte characters - A good way of smuggling nulls past early checks
Byte-order marks - Are you going to ignore them?
NFC vs. NFD