Some diagrams to stimulate further thought and discussion on what needs to be done architecturally to realise the power of public information.
Diagram 1 – the ‘traditional approach’
The emphasis of much web development to date has been on the presentation of the data to the public.
The assumption was that a particular website would be the unique interface to a particular set of data.
This meant that little or no thought might have been given to how anyone else would use the data set in question.
Sometimes the data and any analysis of it could be unpicked from such a site but in many instances this would be extremely difficult.
Diagram 2 – a Power of Information architecture
If we want to realise the power of much public information then we need to start thinking differently about the way we treat data sets.
There is a need for an Access Layer to the data.
This must address all the issues that are necessary to enable use of the data. These typically include technical issues such as file formats, intellectual property issues such as copyright, and commercial issues such as pricing where applicable.
With access to the data enabled then multiple players may create their own analyses of it.
There is a need for a further Access Layer to the output of the analysis activity. This must again address any technical, intellectual property and commercial issues.
With the Access Layers in place there is scope for multiple web presentations of the data. Additional value can be generated through the ability to interact with a community around the data.
The full realisation of the power of the information is therefore realised when all layers are in place with the architecture designed to offer opportunities for interaction.
Diagram 3 – Example – Hansard in the old world
It is helpful to illustrate this transition with the example of Hansard, the record of the UK Parliament’s proceedings.
www.parliament.uk has offered an integrated approach to presenting Hansard for a number of years. The Hansard data is wrapped up with Parliament’s own analysis output and presented to the public in an official website.
The Parliamentary site receives a lot of traffic and has evolved over the years to become more accessible.
But it did not offer all the functionality wanted by civic activists who have been working on alternative solutions to do more with Hansard and other Parliamentary content.
These alternative solutions have produced a more open architecture for this data.
Diagram 4 – Example – Hansard in the new world
An access layer has been created for Hansard with a screen scraper and Click-Use license to address both technical and copyright issues.
The scraped data goes through an analysis process at publicwhip.org.uk.
Access to the output of this analysis process is offered by means of XML data under a Creative Commons license. An API has been produced to make it very easy to get this data.
TheyWorkForYou.com provides a very good and popular presentation layer for this content.
The data as reworked by TheyWorkForYou is also commonly presented in many other places on the web such as MPs’ personal sites.
There is a comment facility built into TheyWorkForYou to provide a layer of interaction around the content.
It is also cited in many blogs that generate their own interaction as well as featuring in mainstream media stimulating further discussion.
The new architecture now provides a platform for more innovation around the Hansard data set with very low barriers to doing this.
Richard Allan, Task Force Chair
5 responses to “More Architecture”
How do you convince those managing all three layers that the access layers are beneficial? I could see people actually fearing interface competition. I would love to have access to pure data, create better interfaces, then share them. The fact is its hard, and counter to the desired of the interface orginator.
As an example. I am in Canada and making a miniapp that will take your postal code and tell you how useful it would be for you to vote strategically. I have had to create a spider which crawls this interface: http://www.elections.ca/scripts/pss/FindED.aspx
Instead of using this file http://www.statcan.ca/bsolc/english/bsolc?catno=92F0193X which costs $2500
Now Elections Canada has an interest in maintaining control of both the interface and the data. I wonder if we could all create spiders for the public good. Could we share datasets we don’t own? Like the youtube of data?
Pingback: The Red Thread « Get Shouty
Pingback: Access Layer Components « Power of Information Task Force
Excellent post. I made liberal use of it (with attribution) in the essay: ‘Government Transparency through Open Data and Open Source’, submitted to Open Source Business Resource for publication this February. A version of the essay is available for comment on scribd:
Pingback: Why we need the data, the whole data, and nothing but the data | FollowTheMoney.eu