Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Welcome to the CollectiveAccess support forum! Here the developers and community answer questions related to use of the software. Please include the following information in every new issue posted here:

  1. Version of the software that is used, along with browser and version

  2. If the issue pertains to Providence, Pawtucket or both

  3. What steps you’ve taken to try to resolve the issue

  4. Screenshots demonstrating the issue

  5. The relevant sections of your installation profile or configuration including the codes and settings defined for your local elements.


If your question pertains to data import or export, please also include:

  1. Data sample

  2. Your mapping


Answers may be delayed for posts that do not include sufficient information.

Creating a set for large number of records

edited March 17 in Troubleshooting
Since I accidentally used "overwrite_on" instead of "merge_on" upon import, a few thousand records got duplicated in the occurrence. I need to delete the duplicate records, so am trying to make a set using the UI, but it's taking forever and it might time out. Can this be done faster on the backend? If so, how? If not, is there a way of finding duplicate (identical) records and delete them?

Thank you.

Comments

  • I'm not sure i understood everything of your situation but here are a few ideas.

    Are all your imports from the same day ? If not, you could search on the cration / modification date and create a set from all the results.
    If any constant data is different from your original import, you could also try to filter on that criteria to get all the duplicates and then create a set.

    When your set is created, you can delete all the record at once. If you fear a time out, you could use the taskqueue (after setting it to 1 in the setup.php and activate a cron job to process the taskqueue (something like */15 * * * * /var/www/html/CA/support/utils/processTaskQueue.php), which will process the taskqueue every 15 minutes.

    I hope it helps.

    John

  • There is a deduplication utility available through caUtils on the backend, it is documented here: http://docs.collectiveaccess.org/wiki/Deduplication.

    By default this only checks for duplicate records so you can review before actually deleting anything, but I would recommend backing up your database before running this just to be safe!
Sign In or Register to comment.