Notes from the Cultural Heritage and the Semantic Web day

I promised a few people that I would take copious notes at the British Museum Semantic Web event last week. To be honest, I don’t really understand the semantic web.  Shocking for a museums web geek! I know that Linked data is a good thing, but I couldn’t really tell you why, or how on earth you go about doing it.  So I went along with the hope that I would become well informed and least to be able to do more than nod and smile when someone mentions semantic webness. The whole point of the event was to focus on projects that are already using semantic web technology. I’m quoting from the programme here “By presenting a more practical insight into the use of the semantic web in the sector it is hoped that the current gap between the technologists and others who stand to benefit from the technology can be bridged.”

I don’t quite know if they managed that.  It did seem like it was semantic web people talking to other semantic web people. But perhaps that is how it has to happen at first for people in the know to discuss and ponder before it can filter down successfully to the rest of us.

First up was Wendy Hall.  Wendy showed us lots of pictures of conferences she attended in the 80s and 90s…

But then said some interesting things like “Scruffy Works”.  Suggesting that you need to let links fails to make it scale.  You shouldn’t aim for your linked data to be perfect first time round, there has to be room for experiment and exploration. The network is everything, and open and free standards are hugely important was also one of her key points.  Wendy also talked about 5star data; W3 has a handy mug to explain the 5 start system of good linked data.

Two keys questions came out of Wendy’s talk: who successful have Cultural Heritage practitioners been in working to develop and use semantic web tech? And where are the starting points?

Kenneth Hamma taught me the wonderful word bumbulum. And discussed zen and the art of the internet following up with talking about The wrong containers.  The notion of ‘my collection’ silos of information, gatekeepers and information containers.    These don’t really work in the physical world why have we as museums extended this reasoning and practice to the Web? That the semantic web breaks this tradition.  There has to be this action of ‘letting go’.  And encouraged us to Imagine what people will do with museum data if you let it go, and allow all museum information to be joined up.  This of course isn’t without its challenges and starting being open and free with data is difficult when you have to take the jump and ‘let go’.

John Sheridan from The National Archives spoke brilliantly about and    I like the fact that the TNA is beginning to be classed as a sort of semantic knowledge base, which operated the UK government website archive.  Which is the 2nd most used web archive in the world. John spoke about developing standards for responsible publishing of key types of data, showing commitment to publishing in open standards, and the National Archives have taken this opportunity to publish data in Linked Data form and make it available via the website.  This in turns makes it easy for people to consume date in a programmatic way; developing Linked Data APIs with the facility to deliver data in multiple formats, as well as native linked data.

John also spoke about having appropriate standards for different levels, one thing I really liked was the idea concept of ‘re-use where we can, create where we must’.  John also demonstrated data cleansing with  Google Refine particularly because non coders types like myself can publish RDF data by clicking a few buttons without having to use any complicated </>’s.

Hugh Glaser from Seme4 started with the idea of time and location being very important.  But what is more important is Knitting everything together in order for it it make sense.  Firslt mentioning the BBC’s dynamice semantic publishinhg of the World Cup coverage using RDF.  Hugh then went on to talk about the classic data fusion problem, existing at  many museums and other organisations where many separate silos exist within the organisation.  The British Museum Collection Online (COL) is a prime example.  The cataloguing data is in one database, the conservation data will be in another, the acquisition data somewhere else, and the science data in yet another.  Using some very clever ontology all of that data is now tabbed at the bottom of catalogue entries.  Now I found this fascinating, why? Well I’ve done some work on the Info seeking behaviour of users of the BM’s COL and not one of them mentioned the linked data.  Nor did I notice any mention of this on the COL itself.  My worry is that not enough people understand how linked data works, myself included, and that nifty things like all the data from lots of different databases about the Rosetta Stone being linked together in one place, is being overlooked, and possibly more importantly not being shouted about.

Hugh went on to demonstrated the Resist Knowledge Base (RKB) and RKB explorer, which is a knowledge enabled infrastructure which displays semantic relationships of individuals.

Hugh then stated that linked data was bringing ‘added value’ because of the more sophisticated services, in an open system means you don’t have to do everything yourself.  Added value to whom? How for example do semantic relationships deal with provenance data, object biographies, mapping of historical data?

Atanas Kirakov had a brilliant analogy for the Semantic web… it is like teenage sex.  Lots of people talk about it, not that many actually do it, and for those that does it is a less than satisfactory experience.

Linked Data is hard for people to comprehend, and its sheer diversity is problematic. Linked Data Web is unreliable, most of the servers are slow because dealing with distributed data on the web is slow, leading to high down times.

I liked Altanas’ talk, it was straight to the point.  Linked data is a good idea.  He does believe that linked data adds value to proprietary data through better description whilst being able to make data more open..  But in practice it isn’t well used because there are no well established opinions about what exactly linked data can ‘buy’ businesses.  There is a need to facilitate better data integration, and provide additional public information which can help alignment and linking info up.

Jonathan Whitson Cloud started his talk ‘it would be a shame to come to the BM and not talk about objects’ and then used a couple of lovely looking cuneiform tablets as examples.  Explaining it all started with structured data…

Jonathan went on to discuss the conservation and scientific research documentation project.  Stating that adding taxonomy afterwards is quite tricky. Showing that there are lots of concerns around sharing data, but at least there has been a lot of talking about it. Different types of people have different types of issues about sharing data and then linking it up:

The British Museum has its reputation to uphold, and likes being the first to do things.   The conservators are concerned about data quality, data protection, academic process and ownership, personal as well as institutional reputation.  The scientists are concerned about academic process and ownership, data content, previous failed systems, effort vs reward, data quality, personal and institutional reputation.   Documentation and IS are concerned about data quality, hierarchies and thesauri used. The list could go on.  A lot to think about when producing a business case for sharing data.  However the process of moving towards sharing data acts as a catalyst for data cleaning and structuring.  BM collections data is being structured and stored semantically by the end of Feb 2011. I did notice that with all this talk of people, not once was the ‘end user’ mentioned.  It’s all well and good restructuring data for internal sharing and a more cohesive organisation, with nice linked data.  But what does that mean for the web user who wants to find out about a specific object and its location in the museum?  I have a big scribble in my note pad (laptop battery fail) simply saying WHAT ABOUT THE USERS?

Leif Isaksen gave an interesting talk about the past, present and future of semantic web in cultural heritage. Technology for data integration is not tech which is changing society, not the volume of data but a positive feedback loop of information exchange is what is important.  Where culture is the textural and material artefacts we chose to exchange information about. However society has an overwhelming interest in popular culture which is sidelining cultural heritage.  Leif then went on to describe a semantic ecosystem:

  • Entity services ( British Museum, London)
  • Ontology services (building, city, is located in)
  • Data services ( British Museums, is located in London)

Dominic Oldman spoke about The Research Space, which aims to support scholarly research online, VRE anyone?

Research space is an environment which aims to generate new knowledge by collaboration.  By creating a research collaboration and digital publication environment, bringing together data collaboration and research tools into one space.  Blogging tools, forums, and wikis alongside the RDF imported data.

So by the end of it all, I was quite confused and I still don’t know my SPARQL from my CIDOC, sounds like a Saturday night involving sequins and hiccups to me. But I am glad I went, particularly as I am not alone in my lack of understanding this whole Semantic web thing.  But if more people like John Sheridan and Atanas Kiryakov can hold more sessions explaining this pesky much talked about but not much done linked dataitus it would go a long way to solving some of the ‘Buy In’ issues and make people feel less dumbfounded by it all.