Many text files are not indexed by default

117 Views Asked by At

Recoll does not index many text files by default. It seems to only index files where the mimemap explicitly includes the mime type, but not other "obvious" file types.

Examples:

  • yaml files -- file -i shows text/plain; charset=us-ascii but using recollindex -e -i /path/to/foo.yaml shows recoll detecting it as application/x-yaml via xdg-mime, which isn't an officially registered mime type -- but if recoll uses xdg-mime, one would think it would know how to deal with all the possible return values from xdg-mime
  • awk scripts -- same thing, with application/x-awk this is in the default mimeconf.
  • perl scripts -- same thing, with application/x-perl this is in the default mimeconf.
  • shell scripts -- same thing, with application/x-shellscript this is in the default mimeconf.
  • kotlin and other source code files -- recoll sees it as text/x-kotlin -- again a non-standard type via xdg-mime, but one that begins with text/ so Recoll should know it is text -- but still doesn't index it
  • readme files -- same thing, with text/x-readme

Now, this can be worked around on a case-by-case basis by adding into ~/.recoll/mimeconf something like:

[index]
application/x-yaml = internal text/plain
text/x-kotlin = internal text/plain
text/x-readme = internal text/plain

but doing this one file type at a time seems silly. Is there a way to say

  1. index everything with mime type text/* as text/plain, unless recoll already has a more specific parser for the type
  2. index obvious textual data (e.g. if file -i returns text/plain) as text/plain, again unless recoll already has a more specific parser for the type

If it matters, I'm using recoll packaged by Fedora.

0

There are 0 best solutions below