Importer has no errors or warnings, yet fails to import everything

I thought I had the last of our imports run with a fair bit of success (barring a continuing refusal of CollectiveAccess to import dimensions, no matter what I try). We started trying to import our photos, and had a number of them fail due to not having an object to attach to.

Went back to the spreadsheets we imported from, and the objects in question are in those spreadsheets. Then we did some testing and found there were two spreadsheets where 100% of records imported. But there were spreadsheets where 80% of records imported, and some with much less, including one case where more than 70% of the data failed to import, despite no errors/warnings/informational messages. From the logs, everything looked fine (although with more than 9,000 records in some of these spreadsheets, we didn't look line by line to see what it was doing in the logs).

Running the importers again, it's picking up some of the missing records. On rerunning, it will pick up anywhere from a tiny fraction of the records to most.


I'm attaching two of our mappings. Archaeo imported 100% of the records, while Natural History imported less than 30%. Anyone got a moment to look these over and tell me if there's something amiss in the mappings that's the problem?

And if not the mappings, what?

Comments

  • Some further testing suggests that perhaps it has imported things, but is not passing them to the ca_objects database, or somehow ElasticSearch is failing to index them (even upon reindexing, and/or restarting ElasticSearch completely). It's also doing something weird with object identifiers.

    Note this, Object Id 8263. (It's a weird number. Lots of our numbers are, and admittedly, the weird object identifiers seem to be more likely to have problems, but some of them have imported just fine.)

    When it initially imported this, it gave it the identifier 9383. Later in the day, during the same import, it updated it to 8263. Later in the same import, it edited it again to 8740. Later in the same import, to 8755. Running a subsequent import, as we were trying to hunt down the missing items, it seems to have repeated the process. The CollectiveAccess assigned identifier, 38544, remains the same throughout. These other numbers -- 9383, and 8740, do not appear anywhere in this object's record.

    The result of all this is sometimes we can find things, but sometimes we can't. So we try to import them again, and sometimes that solves it (changing the number to one we'll find), and sometimes it doesn't. We're diving into the non-CA server logs to see if there's anything there that explains what's going on. The script we're checking our imports against is looking at the ca_objects table.

  • Update: it is completely ignoring the existing record policy. Whether it is set as none, or as merge_on_idno_and_preferred_labels, it merges records anytime it finds the same object id, or the same preferred label. We're on 1.7.11, but I don't see any evidence to suggest 1.7.12 fixes this.

Sign In or Register to comment.