Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Welcome to the CollectiveAccess support forum! Here the developers and community answer questions related to use of the software. Please include the following information in every new issue posted here:

  1. Version of the software that is used, along with browser and version

  2. If the issue pertains to Providence, Pawtucket or both

  3. What steps you’ve taken to try to resolve the issue

  4. Screenshots demonstrating the issue

  5. The relevant sections of your installation profile or configuration including the codes and settings defined for your local elements.


If your question pertains to data import or export, please also include:

  1. Data sample

  2. Your mapping


Answers may be delayed for posts that do not include sufficient information.

Importer ca_entities: Split name string into surname, forename, etc

I've read the Data Importer documentation, but I couldn't find how to do this (only how to join cells):

My source data has people's names in a single spreadsheet cell.
Like this, for example:

  • "Adorno, Theodor W."
  • "Adzhibegashvili, Aleksandr N."
  • "Agamben, Giorgio"
  • "Ahmed, Sara"

The format is consistent as "surname, forename [middle name]".
Can I have the importer split this somehow?
Or shall I rather do this as preprocessing in the spreadsheet application?

Thank you very much in advance!

Comments

  • edited October 20

    Would the importer option "applyRegularExpressions" work for this?

    Its documentation sounds promising:

    match: a regular expression applied to source data values; replaceWith: if a match is found, it will be replaced with whatever is contained in "replaceWith".

  • i think it might be more efficient to split the column in your raw data. there are usually a variety of fairly quick methods to do this sort of thing, depending on the spreadsheet application you are using.

  • @nobody
    I think you're probably right. The regex sounded promising, but "\w" doesn't catch umlauts or diacritics, etc - and then there's "van xxx" - which causes issues with middle-name false-positives...

    :smile:

    So pre-splitting it in the spreadsheet it is!

Sign In or Register to comment.