I'm using IFilter to index some MS Office docs. Loading from file is ok, all works great, like in all manuals and samples:
HRESULT hr_f = LoadIFilter(filename, 0, (void **)&pFilter);
However, using BindIFilterFromStream API failed, and i cant figure out how to use it properly.
HRESULT hr_ss = BindIFilterFromStream(spStream/*my IStream* impl*/, 0, (void **)&pFilter);
I implemented the IStream interface, only the method (except IUnknown's) invoked during initialization is:
HRESULT StreamFilter::Stat(STATSTG * pstatstg, DWORD grfStatFlag)
{
//Microsoft Office Ifilter from Windows Registry
const IID CLSID_IFilter = {
0xf07f3920,
0x7b8c,
0x11cf,
{ 0x9b, 0xe8, 0x00, 0xaa, 0x00, 0x4b, 0x99, 0x86 }
//{f07f3920-7b8c-11cf-9be8-00aa004b9986}
};
LARGE_INTEGER pSize;
int fl = GetFileSizeEx(_hFile, &pSize);
memset(pstatstg, 0, sizeof(STATSTG));
pstatstg->clsid = CLSID_IFilter;
pstatstg->type = STGTY_STREAM;
pstatstg->cbSize.QuadPart = pSize.QuadPart;
return S_OK;
}
After that hr_ss is E_FAIL and IFilter is NULL.
There are case Using IFilter in C#, and those method works great only for *.pdf in c++ too, but not for MSO docs...
I figured out how to init
IFilterproperly, here is the code:And getting the text from your Document is like usual sample from MSDN
You do not need to implement your own IStream, just initiliaze it from your buffer...