How to search for exact field values?

I am trying to search for objects that exactly match a search term in the preferred_labels. I am using the following query:

ca_objects.preferred_labels.name:30412

This will return all objects where 30412 is included in the preferred_labels, e.g. "30412", "abc 30412", etc. but I am only interested in exact matches, i.e. those with "30412".

What's more curious is that I also get results where the search term is contained in another related object in a completely different field. E.g. for the above search, an object is returned that has as different preferred_label and does not contain 30412 in any of its fields, but is linked with another object that has 30412 in a completely different field. Could this behavior only be related to some index not up to date issues or could this also be caused by something else?

Comments

  • The index is full text; the engine will find the term or terms occurring in a field but won't do whole field exact matching. There is an internal API call that does do the sort of field-exact search you describe, but it's not exposed in the current web service API. It will be available in the new GraphQL-based API that is currently in development.

    If you're getting results that include indexing from related records, it's likely due to related-record indexing. That is, the indexer can index an object with content from related records if so configured. For most relationships related indexing is (usually) intuitive. Indexing the names of related entities against the objects they've created allowing objects to be pulled by creator name makes sense in almost all use cases. Indexing objects using object <=> object relationships makes sense when related objects are treated as synonyms, but can be confusing if your data isn't like that. I guess your data is not like that.

    We've had object-object related indexing enabled by default for several versions now (4 or 5 years), so your set up probably uses it. To disable it create a local copy of search_indexing.conf (if you haven't already) and remove the following object indexing entries (note that both begin with related = { and can be found by searching for that text)

    related = {
    	fields = {
    		idno = { STORE, DONT_TOKENIZE, INDEX_AS_IDNO, BOOST = 100 }
    	}
    }
    
    

    and

    related = {
    	fields = {
    		name = { BOOST = 100, INDEX_ANCESTORS, INDEX_ANCESTORS_START_AT_LEVEL = 0, INDEX_ANCESTORS_MAX_NUMBER_OF_LEVELS = 4, INDEX_ANCESTORS_AS_PATH_WITH_DELIMITER = . }
    	}
    } 
    

    Then reindex and see what you get.

  • Wow, great, thank you for the detailed explanation! I will look into this configuration.

    Prior to that I also played around with the indexing, but ran into issues during the re-indexing . I posted this in a separate thread: https://collectiveaccess.org/support/index.php?p=/discussion/300854/indexing-process-aborts-abruptly#latest

Sign In or Register to comment.