I need to generate a href to a URI. All easy with the exception when it comes to reserved characters which need percent-encoding, e.g. link to /some/path;element should appear as <a href="/some/path%3Belement"> (I know that path;element represents a single entity).
Initially I was looking for a Java library that does this but I ended up writing something myself (look below for what failed with Java, as this question isn't Java-specific).
So, RFC 3986 does suggest when NOT to encode. This should happen, as I read it, when character falls under unreserved (ALPHA / DIGIT / "-" / "." / "_" / "~") class. So far so good. But what about the opposite case? RFC only mentions that percent (%) always needs encoding. But what about the others?
Question: is it correct to assume that everything that is not unreserved, can/should be percent-encoded? For example, opening bracket ( does not necessarily need encoding but semicolon ; does. If I don't encode it I end up looking for /first* when following <a href="/first;second">. But following <a href="/first(second"> I always end up looking for /first(second, as expected. What confuses me is that both ( and ; are in the same sub-delims class as far as RFC goes. As I imagine, encoding everything non-unreserved is a safe bet, but what about SEOability, user friendliness when it comes to localized URIs?
Now, what failed with Java libs. I have tried doing it like
new java.net.URI("http", "site", "/pa;th", null).toASCIISTring()
but this gives http://site/pa;th which is no good. Similar results observed with:
javax.ws.rs.core.UriBuilder- Spring's UriUtils - I have tried both
encodePath(String, String)andencodePathSegment(String, String)
[*] /first is a result of call to HttpServletRequest.getServletPath() in the server side when clicking on <a href="/first;second">
EDIT: I probably need to mention that this behaviour was observed under Tomcat, and I have checked both Tomcat 6 and 7 behave the same way.
The ABNF for an absolute path part:
pcharincludes sub-delims so you would not have to encode any of these in the path part::@-._~!$&'()*+,;=I wrote my own URL builder which includes an encoder for the path - as always, caveat emptor.