How to parse XML and avoid parsing errors of namespace by setting a default one?

156 Views Asked by At

Background

I'm trying to parse some manifest XML files of Android APK files

The problem

I've noticed that in some cases, the XML files aren't valid, meaning that many times there is no namespace being defined, or that for some XML attributes, the namespace isn't used at all.

For example, this is a tiny part of Chrome APK manifest file (it's much longer):

<manifest
    versionCode="495157437">

    <uses-sdk
        minSdkVersion="29"/>
    
</manifest>

You can see that there is no namespace being defined here, and the attributes don't use one either. In fact, this is what the IDE also generates out of the APK file, so I've reported this here.

In some cases, the XML-attribute would use a namespace that isn't defined

I want to be able to parse such APK, assuming that there is a default namespace (of xmlns:android="http://schemas.android.com/apk/res/android" ) and there should be a default usage of it, too as the prefix of each attribute (meaning android:).

So, for the above example, it would be as if it's as such:

<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    android:versionCode="495157437">

    <uses-sdk
        android:minSdkVersion="29"/>
    
</manifest>

In some other weird cases, I saw something like this for an XML attribue:

<manifest http://schemas.android.com/apk/res/android:versionCode="495157437" >

What I've tried

Originally I tried to generate the namespaces on my own so that the input of the parsing would be correct from the beginning, but this became quite complex, so I'm hoping to find a solution that is flexible with the input, so that I would tell it which is the default namespace to use, and it will be as such for all attributes that don't use a namespace.

The code I use to parse the XML from a given string is:

fun getXmlFromString(input: String): XmlTag? {
    val factory = XmlPullParserFactory.newInstance()
    factory.isNamespaceAware = true
    val xpp = factory.newPullParser()
    xpp.setInput(StringReader(input))
    ...

Sadly, sooner or later, the XmlPullParser would have an exception about such weird cases.

I still wish to know about other weird issues if they exist, so I let isNamespaceAware to stay true.

The questions

  1. Is such a thing possible for XmlPullParser ? I want to capture all these cases, and then treat them using a default namespace in case no valid namespace is found.

  2. Alternatively, or even better: Is there some way to fix the XML before parsing it, so that I will tell it what to do with invalid/missing namespaces declaration/usages ?

Note that while this is related to Android, this question could benefit all that use Java/Kotlin.

0

There are 0 best solutions below