Export all metadata from collective access
I am very new to this and do not know much about CollectiveAccess. I apologize for being so new but my boss needs an export of all the metadata in our collective access system in a CSV so we can make bulk changes that can be reuploaded to the system. Can someone please advise on how I can do this and what I need to do to be able to export all of the metadata into a CSV?
Thank you.
Comments
You can create a display with the fields you need. Keep in mind that exporting fields in related records and repeating fields in a CSV will require "flattening" of data. If you are exporting collection objects and want to include related entities, for example, you'll have to decide how you want to represent those entities in the object rows. If there's minimal data in the related/repeating data items then serializing them into a single column might work. Or you may consider exporting each of the various record types as a separate file with identifiers as references to link rows in one file to rows in another.
Seth, Thank you for your advice I think this will work. How would I do a search of all items within the collective access database to be able to export the metadata using the display that I have now created?
Search on the wildcard: *
I thought I tried that but was getting an error. I will try again. I am getting a new error from running the export now though I believe since this a large export of data. This is running a VM with 6 cores decimated and 24GB of ram. Should I increase the specs of the machine? Or would I need to adjust the config.php
Exporting a ton of data in the web UI is going to hit timeouts. How much is there? If it's a ton you can either disable timeouts and hope the connection between browser and server stays up for the duration, or write a script to dump the data.
Yes, it's a large amount of data about 26,000 objects. How would I disable the timeouts? Do I need to comment something out?
You can try changing max_execution_time to -1 in php.ini and see if that works. If you don't want to change it in php.ini you can also put this line in your setup.php:
max_execution_time(-1);
If you're still having problems it might be time to write a script or write an export mapping to be run on the command line. Which one you'd do depends upon the form you need the data in ultimately.
Okay, thank you I will try changing the execution time. Since I would even know how to create a script to export the data.
It looks like it was already set but for some reason, it is still timing out at 7200 seconds.
Then you might want to set it explicitly in setup.php.
Did you ever solved this? Since I normally to do smaller exports using the displays I just thought that was the easiest way for me. So I spent a lot of time doing a display with lots of extra formatting for a bigger export. But now I'm realizing that it's not really working when doing it with some 30 000 objects. I'm getting timeout issues "PHP Fatal error: Maximum execution time of -1"? I'm guessing I'll have to redo all the work with the proper export tool but if you solved it somehow It would be great to hear about!
Okey. I did a work around. I understand that I write for no one here but since I myself often find abandoned threads on issues I'm looking for answers on I figured that I could share my solution here. Maybe someone sometime find themself in the same situation.
My system timed out at 7200 seconds just as dlomet described even though I set the time limit to unlimited. So I figured I needed to split up the export somehow. I did in parts of 10 000 objects each time. My way of doing this was to create sets and then use them as base for a new search which yielded fewer results so the export wouldn't time out.
First I did a super simple display with just one column and did a search with this one (* in my case since I wanted all objects exported). I did this because my exporting display was too complex and the search result page couldn't show more than a couple of hundred of objects before it hanged. I added the option of viewing 10000 search results per page in app.conf. So with the simple display chosed and 10000 objects per result page I used the set tool (create set from checked) and created a set for each page. I then did an object advanced search for the set codes with the export display. But now with only 12 objects per page to reduce loading time. I then used the export tool to get the tab separated file I wanted. This got me 4 files with 10000 objects which I could merge and edit further in excel.
That's a clever way to do it, but it does sound like a bit of a hassle – sorry. I guess one way to make this easier would be to add a command line option for running an export of search results. Generally, there's no time limit on command line processes.
Yes! Definitely a hassle :) But still it was faster than doing a new export mapping. At least for me in this case. But of course it would be better to do it the proper way and then run it from caUtils.
Would you mind creating an issue for this on GitHub? https://github.com/collectiveaccess/providence/issues
Sure!