1. Original data file was a four column CSV, the second column contained a topic and scope statement, which was split into parts based on the pattern "{person 1} to {person 2}: scope note". Naturally this is not very systematic as the data had entries which didn't fit the pattern.
  2. I could have spent some time looking for obvious spelling differences and tidying up the data, but I will leave that for people with greater knowledge of the Middle Ages.
  3. A rough and ready conversion to an RDF model to capture the relationships between a letter, who wrote it and to whom it was sent.
  4. arbor.js was used to build the visualisation of the graph. any nodes which had unkown or bad data are still visualised.
  5. Final step was to throw some prettiness at it with bootstrap

What could be better?

  1. Where one person has written to (or recieved from) multiple people, only 1 of the possible letter details are shown when hovering over the node. This is because the from (or to) nodes will have been overwritten by the last letter read from the data.
  2. If the names of the people writing the letter could be acurately matched to their Name Authority FIle records, it would mean a better picture of the relationships. For example Hubert de Burgh, Justicar also spelt Hubert de Burgh, Justicer. My guess is these are one and the same.
  3. Full text of the letters would be nice for some entity extraction, there may be mentions in the letters of other people (one type of entity to look for at least). Though I guess there would be issues with this because of the language used in the originals.
  4. At the moment the data is not in Kasabi, I plan to do this and use the SOLR search indexes to provide me with some facets that I can use to perhaps add more context navigation to the user interface.