Before I ask this question I will provide some info on what I am actually trying to do:

I need to refactor a large amount of GSP files in my grails project. After I tried writing my own groovy script for that -- and realizing that it is way too much for my current skill level in any language -- I found this article, which helped me a lot with parsing html content.

After a while I had put together my own script to parse an html file, serialize it again and save it to a new file. This is my script:

import groovy.xml.*

@Grab(group='org.ccil.cowan.tagsoup',module='tagsoup', version='1.2' )

def tagsoupParser = new org.ccil.cowan.tagsoup.Parser()
tagsoupParser.setFeature(tagsoupParser.namespacesFeature, false)

def slurper = new XmlSlurper(tagsoupParser)
def xmlFile = 'list.gsp'
def htmlParser = slurper.parse(xmlFile)

/*

TODO: Manipulation code goes here

*/

def outputBuilder = new StreamingMarkupBuilder()
String result = XmlUtil.serialize(outputBuilder.bind{ mkp.yield htmlParser }) 

result =  result.replaceAll(/<\?.+\?>/, '')

def newFile = new File('neu.html')

newFile.text = result

Note that I do not want an XML prolog in my GSP files; therefore, I remove it using regex (that is not my question, but if anybody knows a more "groovy" way to do this, please let me know!)

Also, I set namespacesFeature to false, since namespaces are useless for my purpose.

Because that worked like a charm with HTML files, I thought I am ready to loop over my folder recursively and find all GSP files with the name list.gsp and refactor them automatically. But when I tried to test it with one list.gsp, the serialization fails because of the unbound prefix g for the element g:set:

The prefix "g" for element "g:set" is not bound.

Now, I kind of understand that what I am trying to do is not the regular purpose of XML parsing and serializing. But in my case, I do not only want the to disable the namespaces feature, but also want the parser to ignore all GSP tags and treat them as regular opening and closing tags; in other words, ignore the double dots in any tag.

The other thing I am concerned of is expression language, such as <%@ page import="<class>" %>. Right now I'm just getting the exception mentioned earlier, but this will probably need to be resolved as well.

Any help is highly apreciated.

0

There are 0 best solutions below