Exclude certain object fields from search?

How do I exclude certain object data from Pawtucket search? 

Specifically, I would like 'Notes (internal use only)' and 'Archivist Note' data to not be indexed and used on the public search.

I've looked on these forums, Googled, looked through search.conf and search_indexing.conf and can't figure it out, apologies.

Thank you for any assistance!

Comments

  • From https://docs.collectiveaccess.org/wiki/Search_Indexing_Configuration


    For convenience all configurable metadata elements specific to your installation are indexed using the special _metadata field. This obviates the need for you to enumerate each metadata element individually. If you need to not index certain elements, you can specify individual elements to index using keys starting with _ca_attribute_ followed by element codes (ex. metadata element "description" would be listed as "_ca_attribute_description").

  • Hi Jeff,

    Thank you very much for your quick response, much appreciated. Can I please clarify, then, what I need to do is:


    1) Edit pawtucket2/app/conf/search_indexing.conf


    2) Remove '_metadata = { },' from here:

    ca_objects = {

    fields = {

    _metadata = { },


    3) In its place, add a line for every attribute I DO want included, but don't include ones I don't. E.g.:

    ca_objects = {

    fields = {

    _ca_attribute_description = { },

    _ca_attribute_caption = { },

    _ca_attribute_unitdate = { },

    (etc).


    Is that correct? Thank you.

  • Honestly I haven't had a need to exclude search fields so haven't tested it out. Initially I thought it would be

    _metadata = {_ca_attribute_description,_ca_attribute_caption....}

    but guess there's only one way to find out. Remember to rebuild your search indices.

  • Hi Jeff,

    Thanks for the reply. It's a bit disappointing we can't be more certain as we have over 100,000 objects are rebuilding search indices takes several hours.

    There's nobody else at CA who might know better?

    Anyway, I'll give it a go now and see what happens. Thank you again for your quick help, it's appreciated.

  • edited November 17

    Hopefully someone provides an answer, but you can test it out without reindexing by updating the code and then adding a new item.

    https://docs.collectiveaccess.org/wiki/Search_Indexing_Configuration

    Rebuilding the search index

    Changes to search_indexing.conf take effect immediately for all subsequent indexing. Any items indexed prior to the change will not reflect the configuration modifications. To update the entire search index to reflect the new configuration, rebuild the index using "Rebuild search indices" web interface under Manage > Administrate > Maintenance; or reindex using the command-line caUtils rebuild-search-indices command

  • Hi Jeff,

    I adjusted the _metadata entry in search_indexing.conf in both the Providence and Pawtucket directories to:

    _metadata = {_ca_attribute_description, _ca_attribute_caption, _ca_attribute_unitdate, _ca_attribute_tag, _ca_attribute_project_phase, _ca_attribute_education_themes, _ca_attribute_projects},

    And successfully ran a rebuild of indexes. However, the problem persists. For example, we have put in a dummy text entries in the 'Notes (internal use only) and Archivist Note of one of our objects - see attached image. Theoretically, this should not be searchable based on the above.

    However, when you search for these terms at https://archive.kaldorartprojects.org.au/ you land on Kaldor Public Art Project : Item : Gregor Schneider, 'Die Familie Schneider', Email Announcement, Mar–Apr 2006 [P16-F01-S02-0005] (kaldorartprojects.org.au) which is the object with this dummy text, clearly showing it has been indexed.

    Can you please advise how to fix this or point us in the direction of someone who can as we feel we are executing the instructions as described and can't find any other documentation.

    Thank you.

    390 x 406 - 33K
  • Hi Pete, sorry I do not know and will not be able to test it out, I hope someone else can help.

  • Hi,

    _metadata is shorthand to index all metadata elements. If you want to omit metadata elements from indexing then you'll need to remove the _metadata entry entirely and enumerate all of the elements you do want to index, each requiring it's own _ca_attribute_<element code> entry (Ex. _ca_attribute_education_themes).

    Note that reconfiguring indexing this way will make the excluded element(s) unsearchable on both the back-end and in Pawtucket. If you just want to exclude the metadata elements in Pawtucket you can set excludeFieldsFromSearch to a list of element codes in search and browse definitions in your Pawtucket theme search.confand browse.conf. The search config file controls behavior in the "multisearch" results view. Any search with refine controls is actually handled as a browse, so you'll probably want to set it in both places.

    There's also a companion restrictSearchToFields option that can be set in either or both config files. This lets you enumerate the elements you want to search, rather than the ones you don't, which can be useful when you're looking to exclude most metadata elements from search.

    Seth

  • Hi Seth,

    Thank you for the detailed reply. Firstly, it's good to know I was on track with syntax re _metadata.

    However, using the excludeFieldsFromSearch sounds like the exact solution we're looking for, thank you. Understood re search.conf and browse.conf, no problems.

    I couldn't find any examples or documentation about this so if I may I'd like to confirm syntax before executing. Are you saying it would be something like this:


    (search.conf)

    multisearchTypes = {

    objects = {

    displayName = _(Objects),

    table = ca_objects,

    restrictToTypes = [],

    view = Search/ca_objects_search_subview_html.php,

    excludeFieldsFromSearch = [internal_notes, archivistnote, note],

    (etc.)


    (browse.conf)

    browseTypes = {

    objects = {

    displayName = _(Objects),

    labelSingular = _("object"),

      labelPlural = _("objects"),

    table = ca_objects,

    restrictToTypes = [],

    availableFacets = [],

    excludeFieldsFromSearch = [internal_notes, archivistnote, note],

    (etc.)

    Thank you.

  • Hi Seth or Jeff,

    Just wondering if anyone could please confirm or deny the above is correct syntax or point me in the direction of a support document please?

    Thank you!

Sign In or Register to comment.