Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Welcome to the CollectiveAccess support forum! Here the developers and community answer questions related to use of the software. Please include the following information in every new issue posted here:

  1. Version of the software that is used, along with browser and version

  2. If the issue pertains to Providence, Pawtucket or both

  3. What steps you’ve taken to try to resolve the issue

  4. Screenshots demonstrating the issue

  5. The relevant sections of your installation profile or configuration including the codes and settings defined for your local elements.


If your question pertains to data import or export, please also include:

  1. Data sample

  2. Your mapping


Answers may be delayed for posts that do not include sufficient information.

Removing watermarks?

Would it be possible to remove (or change) watermarks from CA images once loaded? Removing WATERMARK items from "rule_original_image" in "app/conf/media_processing.conf" does not work for me...

This is an excerpt from my "app/conf/media_processing.conf" file, in case this can help...:

...
rule_original_image = {
WATERMARK = {
image = /var/www/collectiveaccess/app/watermarks/my_watermark_image.png, width = 72, height = 85, position = south_east, opacity = 0.4
},
...

Thanks for your help!

Comments

  • Since the watermark is applied to the derivative images themselves at upload time removing the rule will only affect watermarks for images uploaded after the change. This is kind of the point of watermarks: they are actually part of the image and (hopefully) difficult to remove.

    The only way to change watermarks globally would be to reprocess all media in the database. Use the reprocessMedia.php script in support/utils to do this. If you decide to use the script pay attention to file permissions. You will need to run it as the web server user or temporarily change the permissions of the existing media files or else you will get access denied errors.
  • Thanks for your answer. It sounds absolutely rational.
    I just gave a try to "support/utils/reprocessMedia.php" script (run as apache user), but I get a lot of warnings like this one:

    PHP Warning: exif_read_data(54748_ca_object_representations_media_344_original.jpg): Incorrect APP1 Exif Identifier Code in /var/www/collectiveaccess/app/lib/core/Plugins/Media/GD.php on line 342

    In the end, my "tile-pics" always have the watermark, and my "original" pics are no more readable (I had to restore the db and the media tree).

    Any suggestion?
    Thanks again for your help!

    P.S.: If it can help, my collectiveaccess collection is online at "http://ks.koinesistemi.it/collectiveaccess", user: seth, pass: htes
    --
    Marcolino
  • The warning is most likely due to problems with the EXIF data in the files you are uploading. We should probably suppress that warning since there's nothing we can do about it and it only affects the metadata extracted from the file, not the processing of the file itself.

    When you say the "original" is not readable what do you mean? The image is not there, or it is corrupt? Can you send me one of these images to work with?
  • I mean, when I got in the default editor page for the object, in the left panel, "more informations", I usually see a small image, and after reprocessMedia it is empty; more, if I open the flash viewer and choose the "original" version, I see an empty image (white background); choosing the "tilepic" version, I see the image correctly (with watermark).
    You can see one of the images here:

    http://ks.koinesistemi.it/collectiveaccess/media/baron_gamba/images/3/98301_ca_object_representations_media_308_original.jpg

    The watermark is the "KS" on the bottom right of the image.
    Now I did a database and media-tree restore.
  • What SVN revision of CA are you running? It looks like it's really old?
  • You are right. "http://ks.koinesistemi.it/collectiveaccess" is my current on-line version, which I do not dare to upgrade, since it is currently being populated... You can check "http://ks.koinesistemi.it/collectiveaccess2/", which is at revision 5547. The behaviour is the same.
  • Hi,

    We set up a test environment and tried to reproduce the issue you are reporting. Unfortunately we were not able to reproduce the problem. It all worked fine with the JPEG you reference in the URL above as test media. This is not a surprise... we use reprocessMedia all the time on very large image collections (> 20,000 images) and has proven reliable, at least on the Linux and Mac boxes we run on.

    My only guess right now is that maybe you have permissions set wrong and some files aren't getting written because of that? Can you try running reprocessMedia as root? If it's permissions then running as root will fix that (and indicate that we're not doing proper error checking somewhere).

    Also, I noticed you have the watermark set in your "original" version. This is probably not what you want. By putting a watermark on the original you're modifying it, which has two implications:

    1. You will never be able to generate derivatives without the watermark configured at upload time since the watermark is burned into the original file as stored in CA at upload time. If you change watermarks later (or remove them altogether) and run reprocessMedia, it's going to use the now-watermarked original.

    2. Every time you run reprocessMedia you will burn in the watermark. If the original is a lossy format this means degradation of quality every time you run reprocessMedia. CA is smart enough to not open and re-save media versions that don't require processing. If you leave the "original" version with no processing rules configured it will be passed through untouched. But if you put rules in there, like the watermarking, then we have to process and re-save. With JPEG originals that means re-compression.

    The conclusion: you should probably put the watermarks on the derivatives and not the original.
  • I see.
    This meeans:
    a) I will not be able to change the watermark, since it's buried on oll my images... I did hope an untouched copy was saved in the database...
    b) if the "original" image should not be watermarked, and since it is obviously available on the web interface (which is possibly public), which is the use for wartermarking?

    Thanks for your help.
    My best regards.
  • I hope so too. Note this is not a bug. Someone *might* want to do this, but usually they wouldn't :-(

    You don't need to keep an original around - there's nothing wrong with watermarking it, throwing it away, or whatever. I know of users who have their installs discard the original after processing to save disk space. But if you don't keep an original in CA, then you can't regenerate derivatives at full quality and you can't, of course, download a full quality version from CA.

    If you want to have a watermarked "original" but also keep a true original, then the best thing to do is create another version ("watermarked_original"?) that has the watermark applied, but leave "original" as it is - with no processing rules defined.
  • I see.
    My concern is to avoid people to be able to access original not-watermarked images, to avoid possible images stealing.
    If I leave an original version in CA, users will be able to access it (see http://ks.koinesistemi.it/collectiveaccess/media/baron_gamba/images/3/89338_ca_object_representations_media_306_original.jpg), which, if it has no processing rules applied, will be accessible from the web (not for a noob, but, with some trial and error work... :-)
    Of course I suppose the person who did the image upload to my CA installation still has the originals... Would it be possible for me to ask for those images and retry to "reprocessMedia"? I suppose it would not be possible, since there is not any correlation with images original filenames and CA images filenames... Though - just thinking - I could change the script to get the original image - from my originals directory I should recover - by comparing the original image *content* with each one of ".../originals-directory/*.jpg": of course images content will be unique, I hope... ;-)
    Do you think this approach could work?
    My best regards.
    --
    Marcolino
  • You can of course always go ahead and change the storage location of the "original" version via media_processing.conf and media_volumes.conf to a directory that is not accessable via web (which should solve your problem). Moving existing things to a different location is a little bit harder to pull off but it's possible. You could also do some Apache config magic to deny requests for the original versions. However, as Seth stated above, you should not watermark the originals.
  • Each image stored in CA is prefixed with a random number between 0 and 99999 to make "sucking" of images more difficult. Note this doesn't make it impossible - just more work. The intention is to make the effort to automatically download an entire collection prohibitive, not necessarily an individual image.

    If you are really concerned about someone sucking down originally, you can always write a file matching regex in Apache that prohibits access althogether, or limits access to a few trusted ip addresses. This is not difficult to do.

    In terms of figuring which originals go with which records, you can't easily compare content (say, with md5 hashes) since the images in your database are different than the originals now. What you can do is compare the "original" file name (the name of the file when it was imported) with the files you have from your client. If each filename is unique (a pretty good bet they are) and they haven't changed since import then you're in business.
  • Thanks for your answers.
    I suppose the apache regex should be a better solution than watermarking, to avoid image sucking: I simply didn't think about it :-(.
    About my current problem to recover originals, of course the filenames of the original collection did not change, and they are unique. But, do you mean the original filename (which I don't know because it was not me to do the images import) can be matched to the CA images filenames?
  • Yes, CA should have the name of the original file on the user's harddrive when it was uploaded. Look at the representation list in the object editor on the back-end (Providence). The original name should be displayed next to the file type, size and other information.

    (Note that it is possible that this information will not be in the database. Not all browsers reliably relay this informations, but all the modern ones seem to)
  • Ok, thanks, I see the original file name (for example: "309 AC.tif") in the editor.
    But if this information is not in the database, where do I get it from my script? Is it inside the image metadata?
  • Anybody? ;-)
  • If your script uses a database query then something like this:

    `

    <?php<br />
    require_once("./setup.php");

    require_once(__CA_LIB_DIR__."/core/Db.php");

    $o_db = new Db();

    $qr_reps = $o_db->query("SELECT * FROM ca_object_representations ORDER BY representation_id");

    while($qr_reps->nextRow()) {

    $va_info = $qr_reps->getMediaInfo('media');

    print "The original filename is ". $va_info['ORIGINAL_FILENAME']."\n";

    }

    ?>

    `

  • O.k. perfect.
    So, it *is* in the database.
    I was confused by your sentence "Note that it is possible that this information will not be in the database": in which cases this can happen?
    Thanks again!
  • Some older browsers didn't reliably transmit original filename data.
Sign In or Register to comment.