Providence SearchIndexing performance does not improve and does not seem to be scalable

edited April 2020 in General Support

Hello,
we have 2 VMs (one for the app, one for mySQL).
Centos 7, PHP 7.2, Apache 2.4, mySQL 5.7

The server app now has 4 CPUs, 16 GB of RAM.
The db server has 4 CPUs, 32 RAM (before the test was 8 GB of RAM).

We have an indexing time of about 8/9 days with the default search_indexing.conf and these contents:

606283 objects, 0 lots of objects, 54910 entities, 8268 positions, 3032 occurrences, 0 collections and 130346 representations of objects.

We also have a local/conf/search_indexing.conf, but for this test we used the default to check the impacts related to the different parameters.

We set the mysql variables:
innodb_buffer_pool_instances = 8
innodb_buffer_pool_size = 4294967296
innodb_log_file_size = 1073741824

We have found that in /app/lib/Search/SearchIndexer.php there is this block of code:
if (! ($ vn_i% 500)) {// Preload attribute values ​​for the next 500 articles to be indexed; improves index performance
$ va_id_slice = array_slice ($ va_ids, $ vn_i, 500);

We also found that in /app/lib/Utils/CLIUtils.php there is, for reindexing, this block:
ini_set ('memory_limit', '4000m');
set_time_limit (24 * 60 * 60 * 7); / * maximum indexing time: 7 days :-) * /

Even by improving these parameters and blocks, also by increasing the RAM, the process does not seem to resize or reduce the indexing time.

We are using SQLSearch, we are currently unable to switch to ElasticSearch.

You can help us?

Comments

  • It shouldn't take that long, although I can't really say why you're seeing these times based upon these settings. The stock answer to this is to use Elastic, which I know is not the answer you want.

  • edited April 2020

    We have tried in the past to connect it to ElasticSearch, but without obtaining significant improvements between the two indexing modes (with same CPU and RAM machines) and with other problems.
    The hierarchy of our datas also is a critical factor, because we have a lot of nested levels (more then twenty in some cases).
    With a custom search_indexing.conf we redured to 45/50 hours, but now we have the necessity to index all the metadata, not just some fields.
    What else could we try? Excluding ElasticSearch?

  • You would have to look at the specifics of your set up and find the bottleneck. I can't really say any more than that based on current information.

  • It seems the reindexing process launch just a single threat.
    As a turns out, the db vm has 1 CPU with 100% of usage but doesn't use other CPUs. There is a way to allow the process of reindexing to be multithread?

Sign In or Register to comment.