I have a OneNote document in archive format, downloaded from SharePoint using CSOM. I am trying to extract the plain text content from it using IFilter, but the process fails with FILTER_E_UNKNOWNFORMAT (0x8004170C) error. I have OneNote installed and the IFilter for .one files is registered properly. If I open that document in OneNote, it displays a small banner that offers conversion to editable format. After the conversion, I am able to load the file into IFilter and read the text. I am also able to do the same with notebooks created locally. I'd like to find a method to achieve the same result programmatically, without user interaction. I've tried to use OneNote interop libraries to convert the notebook into PDF, then extract the text from PDF file, but it seems like an overkill to me.
Microsoft.Office.Interop.OneNote.IApplication app = new Microsoft.Office.Interop.OneNote.ApplicationClass();
try
{
app.OpenHierarchy("d:\\Note.one", string.Empty, out string hierarchyId, Microsoft.Office.Interop.OneNote.CreateFileType.cftNone);
app.SyncHierarchy(hierarchyId);
app.Publish(hierarchyId, $"d:\\Note.pdf", Microsoft.Office.Interop.OneNote.PublishFormat.pfPDF);
}
finally
{
System.Runtime.InteropServices.Marshal.ReleaseComObject(app);
}
I know I can access the OneNote document's content directly, without converting to PDF, but I want to avoid using interop libraries if possible. Does somebody have experience with reading OneNote documents programmatically, or does somebody know a tool that makes the above mentioned conversion? Or is there a different way of downloading OneNote documents from SharePoint that does not produce such archived files? Any suggestions would be appreciated.