Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Welcome to the CollectiveAccess support forum! Here the developers and community answer questions related to use of the software. Please include the following information in every new issue posted here:

  1. Version of the software that is used, along with browser and version

  2. If the issue pertains to Providence, Pawtucket or both

  3. What steps you’ve taken to try to resolve the issue

  4. Screenshots demonstrating the issue

  5. The relevant sections of your installation profile or configuration including the codes and settings defined for your local elements.


If your question pertains to data import or export, please also include:

  1. Data sample

  2. Your mapping


Answers may be delayed for posts that do not include sufficient information.

Where are media files actually stored?

I have a CA system up and running successfully, with content managers actively adding to it.  I have another (Node.js) site which dips into the same database, presenting a read-only view of some of the data.  I'm trying to figure out how to pull images out of CA's MySQL database.  I figured images would be stored in the ca_object_representations table.  I see the media and media_metadata BLOB fields in that table, but it doesn't appear that the images are actually stored in the BLOBs...the byte lengths don't match.  I see files on disk in /media/collectiveaccess/images/0, so maybe the images aren't stored as BLOBs after all.  But, if so, what's the mapping between database field to filesystem path?  That is, given a record from ca_object_representations, how do I either do a SQL select to pull the binary data out of MySQL, or map some field to a filesystem path?  I did a test upload of an image, and now see the new file in /media/collectiveaccess/images/0 named 20897_ca_object_representations_media_1_original.png.  But where does the 20897 prefix come from? 

Thanks and sorry if the above is riddled with dumb questions.  Feeling pretty lost at the moment :-\

Comments

  • Media is stored under the media/ directory. You can reconstruct the file path using serialized data stored in the ca_object_representations table within the "media" field. The numeric filename prefix is a random number prepended onto the file name to prevent automated sucking of files by url guessing. It is present in the serialized metadata.


    seth
  • Ah, thanks! I poked around and found that the serialized data in the media and metadata fields are either gzipped or base64 encoded.  In my case, it's gzip compressed.  For anyone else wanting to extract data from these fields with Node.js, it's pretty easy thanks to existing Node.js zip and PHP serialization libraries.  It boils down to this:

    const zlib = require('zlib');
    const PHPUnserialize = require('php-unserialize');
    const media = ... // load this from the appropriate record in ca_object_representations 
    zlib.unzip(media, (err, buffer) => {
    if (err) {
    // deal with error
    }
    else {
    console.log(JSON.stringify(PHPUnserialize.unserialize(buffer), null, 3));
    }
    });
Sign In or Register to comment.