Initiliaze IFilter from IStream

569 Views Asked by At

I'm using IFilter to index some MS Office docs. Loading from file is ok, all works great, like in all manuals and samples:

HRESULT hr_f = LoadIFilter(filename, 0, (void **)&pFilter);

However, using BindIFilterFromStream API failed, and i cant figure out how to use it properly.

HRESULT hr_ss = BindIFilterFromStream(spStream/*my IStream* impl*/, 0, (void **)&pFilter);

I implemented the IStream interface, only the method (except IUnknown's) invoked during initialization is:

HRESULT StreamFilter::Stat(STATSTG * pstatstg, DWORD grfStatFlag)
{
   //Microsoft Office Ifilter from Windows Registry
   const IID CLSID_IFilter = {
       0xf07f3920,
       0x7b8c,
       0x11cf,
       { 0x9b, 0xe8, 0x00, 0xaa, 0x00, 0x4b, 0x99, 0x86 }

       //{f07f3920-7b8c-11cf-9be8-00aa004b9986}
   };
   LARGE_INTEGER pSize;
   int fl = GetFileSizeEx(_hFile, &pSize);
   memset(pstatstg, 0, sizeof(STATSTG));
   pstatstg->clsid = CLSID_IFilter;
   pstatstg->type = STGTY_STREAM;
   pstatstg->cbSize.QuadPart = pSize.QuadPart;

   return S_OK;
}

After that hr_ss is E_FAIL and IFilter is NULL.

There are case Using IFilter in C#, and those method works great only for *.pdf in c++ too, but not for MSO docs...

1

There are 1 best solutions below

0
tomato On

I figured out how to init IFilter properly, here is the code:

HRESULT hr = LoadIFilter(L".doc", 0, (void **)&pFilter);
IPersistStream *stream;
HRESULT hr_qi = pFilter->QueryInterface(&stream);

std::ifstream ifs(filename, ios::binary);
std::string content((std::istreambuf_iterator<char>(ifs)),
    (std::istreambuf_iterator<char>()));

IStream *comStream;
HGLOBAL hMem = ::GlobalAlloc(GMEM_MOVEABLE, content.size());
LPVOID pDoc = ::GlobalLock(hMem);
memcpy(pDoc, content.c_str(), content.size());
::GlobalUnlock(hMem);
HRESULT hr_mem = ::CreateStreamOnHGlobal(hMem, true, &comStream);
HRESULT hr_stream_load = stream->Load(comStream);

And getting the text from your Document is like usual sample from MSDN

if (SUCCEEDED(hr))
{
  DWORD flags = 0;
  HRESULT hr = pFilter->Init(IFILTER_INIT_INDEXING_ONLY |
                             IFILTER_INIT_APPLY_INDEX_ATTRIBUTES |
                             IFILTER_INIT_APPLY_CRAWL_ATTRIBUTES |
                             IFILTER_INIT_FILTER_OWNED_VALUE_OK |
                             IFILTER_INIT_APPLY_OTHER_ATTRIBUTES,
                             0, 0, &flags);
  if (FAILED(hr))
  {
     pFilter->Release();
     throw exception("IFilter::Init() failed");
  }

  Start();

  STAT_CHUNK stat;
  while (SUCCEEDED(hr = pFilter->GetChunk(&stat)))
  {
     if ((stat.flags & CHUNK_TEXT) != 0)
        ProcessTextChunk(pFilter, stat);

     if ((stat.flags & CHUNK_VALUE) != 0)
        ProcessValueChunk(pFilter, stat);
  }

  Finish();

  pFilter->Release();      
}
else
{
  throw exception("LoadIFilter() failed");
}

You do not need to implement your own IStream, just initiliaze it from your buffer...