The company I work for has a requirement to protect some area where articles are rendered, I've implemented some procedures to protect web-scraping but the problem remains for manual scraping.
The anti web scraping bot protection mechanism seems to be working good so far, but I see clients trying the manual scraping.
What I have tried to protect the article contents:
- Set a copy eventhandler on article's wrapper element to prevent copy. -> Clients can make use of userscripts (greasemonkey, etc) to efficiently bypass this by removing the eventhandler or simply making scripts to copy the contents and save to a file
- Developer console protection -> useless
- Redirect if F12 pressed -> useless
Seems like protecting HTML is undoable (unless someone tells me otherwise) so I'd like to know other ways to display text and render it totally UNABLE to copy.
Things I've thought:
- JS detection mechanisms to diagnose if the user has any sort of userscript running, in other words, if there's no malicious JS code being injected and executed to extract the text
- Transforming the article's HTML into a PDF and displaying it inline with some sort of anti text-select/copy (if this even exists).
- Transforming the article's HTML into chunks of base64 images which will render the text completely unable to select and copy
Are there any good ways to prevent my content from being stolen while not interfering much with user experience? Unfortunately flash applets are not supported anymore, it used to work charms that era.
EDIT: Cmon folks, I just need ideas for at least make end user's efforts a bit harder, i.e. you can't select text if they're displayed as images, you can only select image's themselves.
Thanks!
As soon as you ship HTML out of your machine, whoever gets it can mangle it at leisure. You can make it harder, but not impossible.
Rethink your approach. "Give information out" and "forbid it's use" somewhat clashes...