I am trying to update existing documents in a (Sentry-secured) Solr collection. The updates are accepted by Solr, but when I query, the document seems to have disappeared from the collection.
What is going on?
I am using Cloudera (CDH) 5.8.3, and Sentry with document-level access control enabled.
When using document-level access control, Sentry uses a field (whose name is defined in
solrconfig.secure.xml, but the default issentry_auth) to determine which roles can see that document.If you update a document, but forget to supply a
sentry_authfield, then the updated document doesn't belong to any roles, so nobody can see it - it becomes essentially invisible! This is easily done, because thesentry_authfield is typically not a stored field, so won't be returned by any queries.You therefore cannot just retrieve a document, modify a field, then update the document - you need to know which roles that document belongs to, so you can supply a properly-populated
sentry-authfield.You can make the
sentry_authfield a "required" field, in the Solr schema, which will prevent you from accidentally omitting it.However, this won't prevent you from supplying a blank
sentry-authfield (or supplying incorrect roles), either of which will also make the document "disappear".Also note that you can update a document that you do not have document-level access to, provided you have write-access to the collection as a whole, and you have the ID of the document. This means that users can (deliberately or accidentally) over-write or delete documents that they cannot see. This is a design choice, made so that users cannot find out whether a particular document ID exists, when they do not have document-level access to it.
See the Cloudera documentation: