Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

In this Discussion

Welcome to the CollectiveAccess support forum! Here the developers and community answer questions related to use of the software. Please include the following information in every new issue posted here:

  1. Version of the software that is used, along with browser and version

  2. If the issue pertains to Providence, Pawtucket or both

  3. What steps you’ve taken to try to resolve the issue

  4. Screenshots demonstrating the issue

  5. The relevant sections of your installation profile or configuration including the codes and settings defined for your local elements.


If your question pertains to data import or export, please also include:

  1. Data sample

  2. Your mapping


Answers may be delayed for posts that do not include sufficient information.

WARC file format

Hi

Recently one of our archivers ask to store a page from the Internet. So I had a look around potential options for it.

I finally came to WARC, the Web Archiving format used on archive.org and most libraries (see https://en.wikipedia.org/wiki/Web_ARChive)

I would like to know your experience about this. I will try to find a way to include WARC in CollectiveAccess, so I will also appreciate any feedback on it.

Regards.

Comments

  • WARC is the defacto standard for archiving web resources. Widely used open source software such as WebRecorder (https://webrecorder.io) can create WARC files, so it's a fairly easy format to work with.

    CA can accept WARC files currently as binary data, without any parsed metadata or web-based preview. This is workable, I guess, but definitely not great.

    We've been discussing integrating support for WARC parsing + preview using a WebRecorder-related project for self-hosted embedding of WARC files. See https://webrecorder.org/2019/11/06/self-hosted-archival-embeds.html for more information on this. We've met the developer of this and it appears very doable to integrate it all into CA.

    We agree that WARC is the way to go and it would be great to have full support for it in CA. If you want to work on this, let's coordinate efforts. Feel free to contact me directly if you want to discuss further.

    seth

Sign In or Register to comment.