For Developers

Code

The MIT-licensed code is available on GitHub. Technologies at play include Apache Spark to group occurrence records by raw entries in recordedBy and identifiedBy and to import into MySQL, Neo4j to store the scores between similarly structured people names, Elasticsearch to aid in the searching of people names once parsed and cleaned, Redis to coordinate the processing queues, and Sinatra/ruby for the application layer.

Parse Names

Ruby gem

A stand-alone ruby gem, dwc_agent (code) may be used to parse people names and additionally score given names for structural similarity. It also includes a command-line executable dwcagent that combines parsing and cleaning then produces JSON as output.

$ gem install dwc_agent
$ irb
> parsed = DwcAgent.parse "Lepschi BJ; Albrecht DE"
  => [#<Name family="BJ" given="Lepschi">, #<Name family="DE" given="Albrecht">]
> DwcAgent.clean parsed[0]
  => {:title=>nil, :appellation=>nil, :given=>"B.J.", :particle=>nil, :family=>"Lepschi", :suffix=>nil}
> DwcAgent.similarity_score "J.R.", "Jill R."
  => 2
$ dwcagent "Lepschi BJ; Albrecht DE"
[{"title":null,"appellation":null,"given":"B.J.","particle":null,"family":"Lepschi","suffix":null},{"title":null,"appellation":null,"given":"D.E.","particle":null,"family":"Albrecht","suffix":null}]