Posts Tagged: data

May 12

Working the internal data universe

When I started first writing code, I wrote in a typically haphazard way. (actually I still do this, but now have half an eye on well formed code style)

But what I definitely used to do was think about how to store persistent data that the application needed in a relational database backend. I wrote a CMS from scratch to host this website on (long since decommissioned), and a system for working out how much parents of nursery children owed based on the time their children had been in the nursery compared to contracted time (there were penalties for late collection). Both of these web apps required persistant data in the backend, and at the time my first recourse to a data store was a relational database.

Working with a relational database didn’t sit well with my style of fast iteration to get a problem solved, then tidy up afterwards, to make sure that the code was workable and readable.

Having to constantly add columns to a table as new peices of information were required to be stored, for either optimisation or to support some new feature, was a pain. It broke code on more than one occasion, and created hard to pin down bugs that were usually the result of using a select statement that called for named columns that were now either renamed or moved or gone.

I wasn’t agreeing my schema up front, because I didn’t know exactly what data I would need to solve the problems. I admit I could have used some coding strategies that would have abstracted the database layer from the application layer, but even though I tried to implement MVC, it seemed to just create more code and dependancies.

Contrast that to how I think about solving a code problem today.

Graph thinking; graph databases; the graph; are all things I didn’t know about when I started writing code. Now, when I am thinking of solving a problem I think about exactly two things.

  • What data does my application use?
  • What do I need to show to the user?

The answer to the first question is not a database schema, nor is it something that will break my code if changed. It is a description of the data my app will interact with, but described as a graph so that I can just add more description as appropriate.

The answer to the second question, is that each bit of code knows about a small part of the graph that it needs in order to function correctly. Each function will know exactly which bits of the graph it needs, and how to get it. If the shape of the graph changes it will either flag an error if something critical is missing, or simply ignore the changes and continue working as before.

I said earlier that I didn’t agree my scheme up front because I didn’t know what data I would need to solve my problems. I realise now that if I had simply described the things my application would be interested in, I would have had more than enough data to get going with. I would also have been able to add new data back into the graph thus persisting the solutions to the problems I was solving.

But I couldn’t do that with a relational database because of the key difference between using relational databases and graph databases: I am no longer forced into using the assumption that the edge of my relational database is the edge of the world, and that anything outside of my database does not exist.

There are challenges to this way of working too, but I find it a far more natural way of working. I even find that new opportunities for data display or analysis become apparent as the data morphs into shape.

In short: working with graph data is more like working with the data universe that is inside my head. And that view of the world is exactly that, a view of the real world.