Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Welcome to the CollectiveAccess support forum! Here the developers and community answer questions related to use of the software. Please include the following information in every new issue posted here:

  1. Version of the software that is used, along with browser and version

  2. If the issue pertains to Providence, Pawtucket or both

  3. What steps you’ve taken to try to resolve the issue

  4. Screenshots demonstrating the issue

  5. The relevant sections of your installation profile or configuration including the codes and settings defined for your local elements.


If your question pertains to data import or export, please also include:

  1. Data sample

  2. Your mapping


Answers may be delayed for posts that do not include sufficient information.

Large media import

benben
edited November 9 in Troubleshooting

Is there a command-line option for media import as there is for metadata, so that I can use tmux or something similar to ensure that my browser/client's connection doesn't impact the process

When doing media import in-browser, what happens if my client/laptop loses its connection to the server? Is the client connection irelevant once the process is started? If so, what/where are the relevant logs I can monitor on the server to keep tabs on progress?

Comments

  • Ok I found the import-media command for caUtils.

    I'm getting the following error "You must specify a directory to import media from" There is no help info or documentation anywhere that I can find that explains the propper usage or syntax.

    Here's what I'm doing: $ sudo /var/www/html/support/bin/caUtils import-media [path here]

  • edited November 12

    ETA: I misread part of your question and realized it as soon as I hit send. I'm leaving the original response in the case it helps others.

    Going back to your original question, have you enabled background the background queue? Here's the link from the instructions in the installation guide.

    You will probably still need to be connected to the server in order to upload the files, as far as I know.

    One thought would be to upload the batches of images using ftp/sftp, perform a media import via the GUI, and then let have the server run the processing at a later time if the files are too large. Let me try this option out and let you know if it works.

  • benben
    edited November 12

    Yes, I've enabled the background queue.

    The media already lives on the server in a folder, awaiting import.

    I have tried three import tactics so far with limited success:
    1. Import via the browser from a folder on the server, not in the background – fails (silently, no error messages) after about 3 large videos – not ideal but at least it kind of works?
    2. Import via the browser with background processing enabled – seems more ideal for my use case of a large volume of large ish videos – I've enabled background processing, and it adds them to the queue, but when I run the "process queue" command, it finishes almost instantly, but doesn't do anything.
    3. Import via the CLI using the above commands – this strikes me as really the most ideal for my use case as I would assume the CLI tool has some kind of progress output, so I could leave this running in a tmux session on the server for as long as it needs, and check back in later – I can't get this to work at all. When I use the above syntax, it fails with the error message about needing to specify a directory but with no example of the correct syntax. caUtils doesn't appear to be documented at all. Any pointers, other than me sifting through the 1,000s of lines of code to figure it out?

  • @ben Have you tried running bin/caUtils import-media help ? This is the output:

    Help for "import-media":
    
        Import media from a directory or directory tree.
    
    Options for import-media are:
    
        --source (-s)            Data to import. For files provide the path; for database, OAI and other
                                 non-file sources provide a URL.
    
        --username (-u)          User name of user to log import against.
    
        --log (-l)               Path to directory in which to log import details. If not set no logs will
                                 be recorded.
    
        --log-level (-d)         Logging threshold. Possible values are, in ascending order of important:
                                 DEBUG, INFO, NOTICE, WARN, ERR, CRIT, ALERT. Default is INFO.
    
        --add-to-set (-S)        Optional identifier of set to add all imported items to.
    
        --log-to-tmp-directory-as-fallback Use the system temporary directory for the import log if the application
                                 logging directory is not writable. Default report an error if the
                                 application log directory is not writeable.
    
        --include-subdirectories Process media in sub-directories. Default is false.
    
        --match-type             Sets how match between media and target record identifier is made. Valid
                                 values are: STARTS, ENDS, CONTAINS, EXACT. Default is EXACT.
    
        --match-mode             Determines how matches are made between media and records. Valid values are
                                 DIRECTORY_NAME, FILE_AND_DIRECTORY_NAMES, FILE_NAME. Set to DIRECTORY_NAME
                                 to match media directory names to target record identifiers; to
                                 FILE_AND_DIRECTORY_NAMES to match on both file and directory names; to
                                 FILE_NAME to match only on file names. Default is FILE_NAME.
    
        --import-mode            Determines if target records are created for media that do not match
                                 existing target records. Set to TRY_TO_MATCH to create new target records
                                 when no match is found. Set to ALWAYS_MATCH to only import media for
                                 existing records. Default is TRY_TO_MATCH.
    
        --allow-duplicate-media  Import media even if it already exists in CollectiveAccess. Default is
                                 false – skip import of duplicate media.
    
        --import-target          Table name of record to import media into. Should be a valid
                                 representation-taking table such as ca_objects, ca_entities,
                                 ca_occurrences, ca_places, etc. Default is ca_objects.
    
        --import-target-type (-t)Type to use for all newly created target records. Default is the first type
                                 in the target's type list.
    
        --import-target-idno (-i)Identifier to use for all newly created target records.
    
        --import-target-idno-mode (-m)Sets how identifiers of newly created target records are set. Valid values
                                 are AUTO, FILENAME, FILENAME_NO_EXT, DIRECTORY_AND_FILENAME. Set to AUTO to
                                 use an identifier calculated according to system numbering settings; set to
                                 FILENAME to use the file name as identifier; set to FILENAME_NO_EXT to use
                                 the file name stripped of extension as the identifier; use
                                 DIRECTORY_AND_FILENAME to set the identifer to the directory name and file
                                 name with extension. Default is AUTO.
    
        --import-target-access (-a)Set access for newly created target records. Possible values are 0 (not
                                 accessible to public), 1 (accessible to public), 2 (restricted public
                                 access). Default is 0 (not accessible to public).
    
        --import-target-status (-w)Set status for newly created target records. Possible values are 0 (new), 1
                                 (editing in progress), 2 (editing complete), 3 (review in progress), 4
                                 (completed). Default is 0 (new).
    
        --representation-type (-T)Type to use for all newly created representations. Possible values are
                                 after_treatment (Image AT), analysis (Analysis), archical (Archival),
                                 before_treatment (Image BT), collection_item (Conservation),
                                 collection_management (Collection Management), diagram (Diagram),
                                 during_treatment (Image DT), non_treatment (Image Non-Treatment), other
                                 (Contextual), primary (Primary), publication (Publication). Default is .
    
        --representation-idno (-I)Identifier to use for all newly created representation records.
    
        --representation-idno-mode (-M)Sets how identifiers of newly created representations are set. Valid values
                                 are AUTO, FILENAME, FILENAME_NO_EXT, DIRECTORY_AND_FILENAME. Set to AUTO to
                                 use an identifier calculated according to system numbering settings; set to
                                 FILENAME to use the file name as identifier; set to FILENAME_NO_EXT to use
                                 the file name stripped of extension as the identifier; use
                                 DIRECTORY_AND_FILENAME to set the identifer to the directory name and file
                                 name with extension. Default is AUTO.
    
        --representation-access (-A)Set access for newly created representations. Possible values are 0 (not
                                 accessible to public), 1 (accessible to public), 2 (restricted public
                                 access). Default is 0 (not accessible to public).
    
        --representation-status (-W)Set status for newly created representations. Possible values are 0 (new),
                                 1 (editing in progress), 2 (editing complete), 3 (review in progress), 4
                                 (completed). Default is 0 (new).
    
        --remove-media-on-import (-R)Remove media from directory after it has been successfully imported.
                                 Default is false.
    
  • In general all caUtils commands are documented by invoking the command followed by "help". You can get a list of all commands by running caUtils followed by "help"

    Note that import-media is basically a CLI version of the web UI for media imports. Just about every option in the web UI is available in the CLI version and operates similarly. If you're doing fuzzy-ish matching of file names against record identifiers, the same app.conf config used to define matching behaviors for the web UI is used for the CLI as well.

    Also, when you run things on the command line you should mind permissions. The media directories must be writeable by the web server. The user you're running caUtils as may not have enough privs on some systems. Running as the web server user via sudo or some other mechanism may be called for.

    I hope this helps.

  • That helps (pun intended)! I wasn't able to find that anywhere in the docs and had been trying -h.

  • Ok, gave it my first shot, and I'm getting the following error:

    CollectiveAccess 1.7.8 (158/RELEASE) Utilities
    (c) 2013-2019 Whirl-i-Gig
    
    PHP Fatal error:  Uncaught Error: Call to a member function get() on null in /var/www/html/app/lib/Utils/CLIUtils.php:4340
    Stack trace:
    #0 /var/www/html/support/bin/caUtils(167): CLIUtils::import_media(Object(Zend_Console_Getopt))
    #1 {main}
      thrown in /var/www/html/app/lib/Utils/CLIUtils.php on line 4340
    

    Here's my command:

    sudo /var/www/html/support/bin/caUtils import-media --source /mnt/075b500f-49ed-4d7c-b83d-6594a9c1be82/import/CLItest/ --username ben  --log ~/ --log-level DEBUG --add-to-set "batch_1" --include-subdirectories --match-mode DIRECTORY_NAME --import-mode ALWAYS_MATCH --import-target ca_objects --import-target-type "Video"
    
  • Hmm. This isn't an issue with current code. I'll check the release and try to reproduce.

Sign In or Register to comment.