Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

In this Discussion

Welcome to the CollectiveAccess support forum! Here the developers and community answer questions related to use of the software. Please include the following information in every new issue posted here:

  1. Version of the software that is used, along with browser and version

  2. If the issue pertains to Providence, Pawtucket or both

  3. What steps you’ve taken to try to resolve the issue

  4. Screenshots demonstrating the issue

  5. The relevant sections of your installation profile or configuration including the codes and settings defined for your local elements.


If your question pertains to data import or export, please also include:

  1. Data sample

  2. Your mapping


Answers may be delayed for posts that do not include sufficient information.

Providence SearchIndexing performance does not improve and does not seem to be scalable

edited April 7 in Troubleshooting

Hello,
we have 2 VMs (one for the app, one for mySQL).
Centos 7, PHP 7.2, Apache 2.4, mySQL 5.7

The server app now has 4 CPUs, 16 GB of RAM.
The db server has 4 CPUs, 32 RAM (before the test was 8 GB of RAM).

We have an indexing time of about 8/9 days with the default search_indexing.conf and these contents:

606283 objects, 0 lots of objects, 54910 entities, 8268 positions, 3032 occurrences, 0 collections and 130346 representations of objects.

We also have a local/conf/search_indexing.conf, but for this test we used the default to check the impacts related to the different parameters.

We set the mysql variables:
innodb_buffer_pool_instances = 8
innodb_buffer_pool_size = 4294967296
innodb_log_file_size = 1073741824

We have found that in /app/lib/Search/SearchIndexer.php there is this block of code:
if (! ($ vn_i% 500)) {// Preload attribute values ​​for the next 500 articles to be indexed; improves index performance
$ va_id_slice = array_slice ($ va_ids, $ vn_i, 500);

We also found that in /app/lib/Utils/CLIUtils.php there is, for reindexing, this block:
ini_set ('memory_limit', '4000m');
set_time_limit (24 * 60 * 60 * 7); / * maximum indexing time: 7 days :-) * /

Even by improving these parameters and blocks, also by increasing the RAM, the process does not seem to resize or reduce the indexing time.

We are using SQLSearch, we are currently unable to switch to ElasticSearch.

You can help us?

Comments

  • It shouldn't take that long, although I can't really say why you're seeing these times based upon these settings. The stock answer to this is to use Elastic, which I know is not the answer you want.

  • edited April 7

    We have tried in the past to connect it to ElasticSearch, but without obtaining significant improvements between the two indexing modes (with same CPU and RAM machines) and with other problems.
    The hierarchy of our datas also is a critical factor, because we have a lot of nested levels (more then twenty in some cases).
    With a custom search_indexing.conf we redured to 45/50 hours, but now we have the necessity to index all the metadata, not just some fields.
    What else could we try? Excluding ElasticSearch?

  • You would have to look at the specifics of your set up and find the bottleneck. I can't really say any more than that based on current information.

  • It seems the reindexing process launch just a single threat.
    As a turns out, the db vm has 1 CPU with 100% of usage but doesn't use other CPUs. There is a way to allow the process of reindexing to be multithread?

  • Not currently.

Sign In or Register to comment.