As a team we have worked on a wide range of energy systems components, from offshore wind farm design optimisation, to satellite imagery for solar forecasting, analysis of consumer tariffs, and more recently granular energy attribute certificates. Creating software and doing data science in these areas has given us a strong appreciation of the energy system's needs and challenges around improving the sector's digitalisation. So it’s been with great interest that we’ve watched and engaged with recent digitalisation programmes run across BEIS, Energy Systems Catapult, UKRI and Ofgem. During this time we have also been developing our thinking on how to effectively and efficiently create a single view of the energy system that links assets and gives real-time visibility, whilst minimising the changes required to existing data infrastructure.
Building off the back of the Modernising Energy Data Access and other flexibility innovation programmes, BEIS has launched the Digital Spine project with the aim of achieving just that. So this seems like the perfect opportunity to give our vision for what we think part of the solution could look like. What we have been working on goes beyond just defining what a digital spine is, and starts to describe the system architecture we think will best deliver this solution.
At Future Energy Associates we believe that there is a huge opportunity around integrating existing datasets and semantically linking the data silos currently operated by market participants. For example, there are multiple ways that OFGEM, BEIS, National Grid, and Elexon refer to the same power plants - forcing data consumers to repeat the same time-consuming matching exercises rather than focusing on generating value and new innovations from that data. We will create a way of linking entities across multiple datasets, allowing users to combine the attributes across entities leading to better insights into the energy ecosystem.
But what does this mean in practise?
Let’s say you’ve been tasked with researching the revenues for the nuclear plant Sizewell B and investigating the impact of maintenance periods. You’ll need to combine generation data published by Elexon, capacity market information from National Grid, maintenance and downtime as published under the EU REMIT scheme, and for further context get technical details of the site from the International Atomic Energy Agency. The problem is, none of these organisations refers to Sizewell B using the same identifiers, and even worse, they don’t all refer to the same things - some point to the two turbines (Elexon, National Grid, ENTSO-E) whilst others point to the single reactor unit (IAEA). For one plant this might be possible to overcome, but as soon as you have to scale your analysis to tens or hundreds the matching exercise becomes a significant time sink, wasting time that could be spent on value add analysis.
Sizewell B identifiers across different organisations
The energy spine framework we propose would have a URL for each of these ids, where the information behind each URL would include “same as” relationships linking to the other dataset, and a subscriber could then download the data relevant to the plant (Sizewell B) and the information on it across all datasets -. identifiers. A subscriber could then download the JSON-LD data behind each of these URLs and merge them within an RDF knowledge graph. Finally, they could carry out a query to find all IDs related to e.g. IAEA reactor 263, followed by a separate query to identify capacity market auctions, REMIT events, and power generation linked to any of those related IDs.
A central aim of our framework is to build a spine that is lightweight and flexible, whilst utilising existing open-source technologies and standards rather than introducing additional interfaces for end-users to learn. The digital spine should not require the introduction of a new organisation that has to centralise all of this data - ownership of the spine should stay decentralised across the data stakeholders (Elexon, National Grid, etc). As such this is not a case of developing a whole new platform but instead working with data publishers to evolve their current technology to better enable integration and interoperability. This also leads to the conclusion that the framework should be open source so that all actors can understand and deploy it for their own use cases.
Core Abstract Components
- Definitions of datasets (OpenAPI and Frictionless)
- Definitions of entities (Raw sources defined by the publisher and bespoke subscriber definitions built on top)
- SDKs (software development kits - i.e. clients) which can validate and parse raw publisher data into a database
- SDKs for converting annotated tabular data linked to JSON-LD definitions into an RDF knowledge graph
At the core of the architecture is the use of existing open-source frameworks and tools - for standardisation and validation of metadata we propose using OpenAPI and the Frictionless data standard. We also plan to reuse accepted definitions of entities already developed across the industry to describe data, including the Common Information Model for defining the network and Schema.org for more general entities such as people and companies. This means entities and their interactions are all described in a familiar way that is also machine-readable, enabling automated integration and merging of datasets between different publishers.
In our framework there are two distinct roles that interact with the digital spine. Publishers of data, who hold primary responsibility for defining, schemas, asset metadata, and entity creation, and subscribers who ingest and transform publisher data into combined SQL databases and/or knowledge graphs. The relationships between entities on the system can be described by either the publisher or subscriber and will take the form of triplets (original entity
: relationship
: matching entity
), where the relationship can be defined as “same as”, “parent”, or “child” e.g. power station X
: same as
: power station Y
. This means that publishers can retain their existing view of the system, whilst subscribers can merge and extend those views.
Publishers
- Define raw entity definitions
- Annotate tabular data (API or csv)
- Provide feeds for dataset updates
Subscribers
- Utilise ingestion SDKs to process data into database and RDF triplet representations
- Define relationships between publisher entities
- Build tools and generate insights from the knowledge graph
Lightweight software development kits (SDKs) will be created for data ingestion, enabling simple handling and conversion of publisher data from JSON-LD (linked data) into other machine and human-readable formats. This will be achieved by:
- Converting entity JSON-LD definitions into SQLModel classes
- Handling saving of raw data into an SQL database via SQLModel
- Converting the tabular representation of the data into a triplet form
- Adding the triplets defined in the entity relationships (from subscriber or publisher)
- Merging the triplet sets into a single knowledge graph
Data security can be handled much as it is currently, where the publishers create data security policies and processes through the use of registered users, authenticated APIs etc. This framework would enable the publisher to make the metadata open but keep data private as necessary using authentication. The presumed open approach that BEIS and others are following fits well with this framework; however, each use case would require its own consideration of appropriate disclosure in consultation with users. A further benefit is that existing access control solutions such as those proposed by IceBreakerOne can work alongside our data linkage framework.
What has been presented so far is a conceptual framework that we think will deliver the required flexibility across a range of use cases in the energy system. The exact ecosystem of publishers and subscribers will vary depending on the use case. In each, the technical and organisational capacity of stakeholders will determine how each cluster of the digital spine is developed. We have started describing some of these use cases (more details to follow…) and are now beginning to develop this framework and deliver some tangible results.
Exciting times ahead! - Please reach out if you are interested in finding out more or would like to participate in a pilot of the framework
Useful resources:
- Further information on using metadata to markup energy data (OSUKED)
- Earlier thinking about creating a semantic web of energy data (Open Climate Fix)