Search doesn't find stuff anymore, and returns unreliable results

I've been called to debug the following issue with a CA providence installation:

Since recently (without any change from the admin side that they are aware of), the search stopped showing entries that are definitely there. The database already contains several thousands of entries and is in active use since a few years.

The first thing that was done was rebuilding the search-indices in the Administration menu: Since then, the search seems completely broken.

**For example:**

1) Searching for "*" (entities) returns thousands of entries, but when searching for e.g. a name that is displayed in that very search result - we get: "Your search found no entities".

2) Searching for eg objects, it returns partial results (again, we know that there's more, but it's not displayed). Sorry for not being clearer, we haven't found a pattern. We just know the results are not right/complete.


Seems like the search index was broken/frozen somehow, somewhere.

Is there a way to completely reset and reset the search engine?


Thanks in advance.

Comments

  • If you're reindexing in the browser user interface it's possible that the connection is being broken at some point (indexing is often a long-running operation) and killing the process mid-way. The first thing I would recommend is to try reindexing on the command-line using caUtils to be sure it's reindexing to completion.

    To do this run caUtils rebuild-search-index

    Assuming you're in the support/ directory of the install the command would be bin/caUtils rebuild-search-index

    caUtils will display a progress bar during indexing and print a message that includes the total time to index upon completion, so you'll know it's running and know when it's done. If the system is large indexing may take a while (> 1 hour). In those cases you may want to run the reindexing command in screen or tmux to ensure nothing interrupts it.

  • Thanks for the prompt reply!

    Will try the rebuild-search-index commandline. Is there anything I should be aware of, like would you recommend a DB-backup first, or is the index-rebuild always safe?


    And is there anything else I could do to completely wipe-n-reset whatever search data CA has prepared so far?

    Thanks!

  • Reindexing will only modify the index, not data, and should be safe. Backups are never a bad idea, but certainly not required with this command.

    The reindexing process truncates the two tables the index uses: ca_sql_search_words and ca_sql_search_word_index. If you want to empty them manually you can do that on the MySQL command line (Eg. TRUNCATE TABLE ca_sql_search_words; TRUNCATE TABLE ca_sql_search_word_index;) but it really won't make a difference and I wouldn't bother. I'm pretty sure the issue is the indexing not running to completion, not incorrect indexing lingering somehow.

  • Roger that and thanks for the how-to-do-it-manually background infos.

    I ran the re-indexing (took slightly over 1 hour), and now the search results look fine again :)

    Unfortunately, some issues still persist. These issues were the reason why we started the re-indexing (from the browser-UI, which probably bailed out mid-execution as you guessed):

    1) When creating a new Entity (eg Person), it doesn't show up when trying to use it in a relationship to an Object.

    2) Newly created entries (objects, etc) don't show up in the search results.


    Could it be that something that triggers continuous updates of the search index (when editing/creating/deleting entries) is not started/executed directly? Anything I could check?


    Thanks again.

  • Indexing is performed in the background by default. In some environments it may fail to trigger as PHP doesn't have real process management, so we rely upon the server being able to connect to itself. This usually works ok, but in some environments it's not possible for a variety of reasons (networking, security, et al)

    You should try disabling background indexing by setting disable_out_of_process_search_indexing = 1 in app.conf. This will cause all indexing to occur in-process, which will slow down responses on save somewhat, but ensure that all indexing is performed immediately.

    After you change this setting reindex again, just to be sure that everything is up to date.

  • So do I understand it correctly that setting disable_out_of_process_search_indexing=1 sounds like a workaround, rather than a fix?

    Is there any documentation I could use to check/debug why the default background-indexing "may fail to trigger"? This is a default Ubuntu server installation, so I'm hoping it'd be possible to find-and-fix the issue with the non-working indexing for newly created entries... :)

    However, I'll try the workaround to see if it "would" fix our issue.


    Thanks again.

  • edited June 1

    disable_out_of_process_search_indexing reverts to how indexing used to work. It's not a work around. Background can speed up response times when saving records with many relationships, but the ultimate result is the same with or without background processing.

    To trigger a background indexing process the application connects to itself locally. Parameters for this connection are automatically derived from the environment, but can fail if for whatever reason the server cannot connect to itself. Eg. the hostname used doesn't resolve locally.

    You can try manually setting the parameters, overriding the automatically derived ones using the app.conf settings:

    # Hostname to use when triggering out of process indexing

    # By default the site hostname configured in setup.php is used but you can override it

    # here if the hostname resolvable on the server differs from that used for incoming requests

    out_of_process_search_indexing_hostname =


    # Socket protocol to use when triggering out of process indexing. May be set to tcp or tls

    # By default tcp is used when the incoming request is http and tls is used when https is employed.

    # You can override it here to use a constant protocol.

    out_of_process_search_indexing_protocol =


    # Port to use when triggering out of process indexing

    # By default the port used is 80 when the incoming request is http or 443 when https is used.

    # You can override it here to use a constant port.

    out_of_process_search_indexing_port =


    # Disable verification of SSL certificate when connection uses https.

    # With some server/network configurations it may not be possible to validate the certificate

    # when connecting internally to trigger indexing. You can disable certificate checks here if

    # you need to. No data is transferred over the connection used to trigger indexing, but disabling

    # certificate checks is never a great idea. It is recommended that disabling be done only as a

    # last resort.

    out_of_process_search_indexing_dont_verify_ssl_cert = 0

    The connection created is local and serves only to kick off a process, so disabling SSL certificate checking isn't really a problem (there's no data to snoop on), but if it resolves the problem it implies the server is using outdated certificates and that should be fixed. I've seen this happen on machines that haven't been updated in a long while.

    If the hostname used for the system externally doesn't resolve locally then either change things so it does, or set one in out_of_process_search_indexing_hostname that resolves locally. You can check if a hostname resolves using the nslookup or dig commands. Adding a local hostname to the /etc/hosts file on the server is a quick way to get things resolving locally, although it may be seen as bad form.

  • Roger that!

    I've now set "disable_out_of_process_search_indexing=1" and first tests seem to show it's behaving right again now. Thanks!

    Also thank you for the insights, and we'll check the configuration of the machine for eventual "quirks" that may cause the regular configuration to misbehave.

  • Given that it worked for years before exhibiting this behavior, I'd guess that there has been a networking change. Have a good weekend.

Sign In or Register to comment.