The Semantic Web has a branding problem: It was built to manage data, not semantics. Somewhere along the line, insiders renamed it the “Data Web”. That was a great move for Web researchers, but what will the semantics crowd do with the name? Just as “semantics” was misplaced in the Data Web, “web” is misplaced in our vision of a global semantic network. The Semantic Web won’t act like a web at all.
The reason is that form must follow function and “web” is the wrong form for semantics. Do you remember why you stopped using the Yahoo Directory and switched to Google? Both provide lists of Web pages organized by categories. The difference is that search engines involve you in the creation of those categories through your queries. When search engines became comparable to the directories in assembling relevant lists, there was no going back. The form of a directory, as a largely static structure, is incompatible with the function of search.
Similarly with semantics. The Data Web, a “giant, global graph“, implies a persistent data structure. Most semantic data cannot be represented this way. Semantic data is highly personal. Like a search engine query, it doesn’t exist without context provided by consumers. As a result, the difference in the quality and scale of the data is as dramatic as the difference between a directory and a search engine. One is bounded, one is not.
Data structures can provide an historical snapshot of semantics. But even if we wanted to tombstone semantics like a directory, how would we encode such an immense data structure? Our industry is aware of the scalability challenges of globally linked data, but we’ve yet to come to terms with the scale of semantic data. For a very rough estimate, multiply all the content that’s available on the Internet by every individual consumer by every individual perspective they might have on that content. Semantics is data at a scale that will dwarf the Web of today.
So as not to raise the ire of the Semantic (er, Data) Web community, let me say I’m a fan. As a standard for data exchange, it’s brilliant. We use it here at Primal Fusion and will only extend our use of it over time. But semantics requires more fluid and probabilistic models. We need a different organizing basis for it, one that’s inclusive of both the data structures and the dynamic processes that generate them (most notably, affording personal perspective).
Perhaps the Semantic Web isn’t so much a misnomer as much as a composite term of Semantic Engines and Data Web. Semantic Engines will provide a contextual and ever-changing flow of semantic data onto the network. This may be encoded using Data Web standards, but I don’t think consumers will experience it as a web. It’ll be in a state of constant churn, fuzzy and calculated. As with Google, we’ll be very aware of the presence and value of these services, but not as a static network of data to traverse. A truly Semantic Web will emerge around each consumer.
Thanks to the team at Primal Fusion for their help with this post.
The “Data Web” or “Linked Data Web” has been decoupled from the “Semantic Web” becuase the “Semantic Web” describes an innvation continuum that incorporates: structured linked data, inferencing / resasoning entailment and beyond.
The “Linked Data Web” isn’t about static data, it is about links between data objects in a mesh that increases in link density ad infinitum.
Today, we have exponentially growing semi-structured data on the Web courtesy of Web 2.0 activity. What’s coming next is exponential growth across two vectors: structure and linkage.
I think that data web is the bridge leading from
document web to semantic web. Furthermore,
in order to realize semantic web, we must have
dataweb in its place first. Nova’s article “The
future of desktop” is very helpful for us to
understand this situation, based on this blog of
Peter’s.
I think the basic claim that semantics are a question of perspective is correct. But I think we already have a “more fluid and probabilistic models”: our society.
Semantics are given by us – groups of people – to things we use, think and communicate about. Our social mechanisms are by definition fluid and probabilistic, we just need to make sure we reflect them properly in our technology.
From what I’ve seen so far, these mechanism are reflected in the core idea of a URI, RDF and OWL. This is still not a perfect system, but we have to remember that we’re basically trying to imitate thousands of years of human and social evolution, so it’s a pretty tough job.
We first try to imitate our basic language skills – that’s RDF for you: pointing at something and making statements about it. We then move on to making abstractions on things – that’s basic RDFS and OWL.
We now have to move to being able to negotiate meaning, like we did with our spoken language. These social dynamics of negotiating meaning have to be somehow copied to the web in order for the semantic web to be realized.
Basically, I think the concept of “hubs” and “authorities” can be implemented here as well (something like “OntologyRank”, as opposed to Google’s “PageRank”).
What do you say?
Thanks for the comments. I certainly wouldn’t trivialize any of the activities conducted under the SemWeb banner. But does a model of exponentially growing data and links, perhaps with hubs and authorities, capture our deeply personal semantics? I agree strongly that people and societies provide highly relevant examples to emulate. But as we’re designing these solutions, it’s important to remember that this “social dynamic of negotiating of meaning” isn’t a post hoc activity of connecting meanings. Rather, it’s intrinsic to the process of creating meaning. Put another way, we can’t encode semantics, let alone link them, until we negotiate with individuals to establish those meanings in the first place. This analogy to search engines, where the data structure is calculated after the perspective of the individual is submitted, is a simple illustration of this type of conversational model.