Excel IFilters, Concatenated Strings xlsx / xls

334 Views Asked by At

I have an application that uses the delivered MS Office IFilters to extract text content from Excel files.

I have an issue with .xlsx files and concatenated strings. The IFilter extracts text, but not concatenated strings.

xls returns concatenated strings (I know they are different file formats and that .xlsx is essentially a zip file with the data being stored as xml). Essentially though, xls returns concatenated strings, xlsx does not.

An example is: A1=ABC, H2=123, G3=XYZ, D1=Concatenate(A1, H2, G3)

xls IFilter returns the concatenated string as ("ABC123XYZ"), the same as it appears visually in the file, xlsx does not return the concatenated values.

If the cells are adjacent, it may appear that xlsx is returning the concatenated values, but it is not, only the cell values are returned.

I have tried unzipping the xlsx and parsing the .xml files, but again, it does not return the concatenated string.

I'm really after suggestions as how best to handle this. Ultimately I need to be able to extract the concatenated strings from xlsx.

Is my only option to convert the file to xls before extracting the text? Is there an easy way to do this dynamically with no real performance hit and without actually saving the file? Would I be better off 'extracting' the text using Microsoft.Office.Interop.Excel and somehow copying and pasting into a listview? Seems like either would be a huge performance hit.

Any help and advice is gratefully received!

0

There are 0 best solutions below