Semantic: The next good thing you do with data
In a recent article, Gartner master data management (MDM) expert Andrew White asked the question “Why MDM might be the last good thing you do with data”, and explained that “we need to focus less on data and application integration, and less on API design, and focus first more on semantic exposure for interoperability. Once we have a shared language, the effort and methods of data exchange between hubs and applications will become simpler”.
One goal of Semantic and Linked Data technologies, as presented in the recent Atos analysis “Journey 2018”, is to provide such shared language and methods to expose the semantic of data and reduce data exchange burden. As it becomes more and more mature, these technologies might be a credible alternative to standard MDM approaches.
Let’s take a very simple example where we need to integrate 2 people databases, one using a field called “last_name” and the other “FamilyName”. One solution could be to select one term in a ‘reference’ dataset, or use a third term such as “lastName”, but it’s not satisfactory because we don’t have a clear description of the semantic of the field. A better approach is to use a term defined in a kind of “shared language” called an ontology, like the FOAF vocabulary that provides a well-tested semantic model to define people and organization. This term, uniquely referenced by a URI (Uniform Resource Identifier), can replace the original terms, or we can write a rule that says that these terms are equivalent, and use an inference engine to automate the mapping between these terms. The same kind of process can be performed to align the original database models to a more general one, better semantically defined and based on a graph structure. The process can be incremental, as information from other data sources can continuously be integrated by creating similarities links between terms, and abstracting common concepts into highest level enterprise ontologies. Progressively we can build that enterprise “shared language” that makes “data exchange between hubs and applications become simpler”.
Recent standardized technologies ease such approach of data management. One of the most interesting is JSON-LD, an extension of the popular JSON data format where each field in a message is mapped to a concept in an ontology, and that enforce hyperlinking of resources using HTTP. The standard has been designed to allow a smooth transition from existing JSON API, enabling developers to smoothly add Linked Data principles in their API.
Semantic Web techniques have been already proven successfully to integrate complex and heterogeneous datasets like the ones found in Healthcare. It is likely that the emergence of these new standards will significantly reduce the gap between information and knowledge in many other sectors. It could be the next good thing you do with data.