What if the World Wide Web were one giant database, linking both human
readable documents and machine readable data in a way useful to both mankind and machine?
It would be the future of the Web espoused by Tim Berners-Lee,
father of the World Wide Web and director of the World Wide Web Consortium
(W3C). Since Berners-Lee and a few other leaders at the W3C first mentioned it
in May 2001, that vision has increasingly become a leading focus of the W3C’s work. They call it the Semantic Web.
Speaking before Britain’s scientifically-minded Royal Society Monday,
Berners-Lee attempted to explain the vision of the Semantic Web, and why he
believes it will reinvent the existing Web for both end users and
businesses.
“It’s so difficult to explain to people who are used to the Web why, before
the Web, it was so difficult to explain to people what the Web was all
about,” Berners-Lee said.
He explained that the words and concepts needed to explain the little
documentation system he began creating in the late 1980s, during his tenure
at CERN, the European particle physics lab, just hadn’t existed. But once
people saw the Web and what it could do, it seemed so simple, he said.
The Semantic Web faces much the same conundrum.
Simply put, the idea behind the Semantic Web is to give data more meaning
through the use of metadata
describes how and when and by whom a particular set of data was collected,
and how the data is formatted. By adding metadata to the existing Web, the
Semantic Web will allow both humans and machines to find and make use of
data in ways that previously haven’t been possible.
“The Semantic Web is just mechanical data,” Berners-Lee said. “It’s like a
great big database.”
For instance, he explained, consider an event listing on the Web for a
lecture. It would include data like the location, start time, end time, the
speaker, a phone number to call for more information and so on. But the
data is fairly static. It can be read by humans, but not by machines.
However, metadata could be applied to those datapoints which identify to
machines what they are. Then an interested party could click to attend the
event, and whatever calendaring application that person uses could
immediately schedule the event in the planner, denoting where it is, what
time it will start and what time it will end, and who will be speaking. It
could provide a map to get the person to that event, and supply information
about the speaker.
It is certainly possible to do this sort of thing without the Semantic Web,
Berners-Lee said. A person could just cut and paste from the Web site
listing to his or her calendaring application. He could then click on
multiple other links to get the rest of the data. But that, Berners-Lee
said, is not making use of the Web to its fullest extent.
“When it comes to the data in our lives, we are pre-Web,” he said. “It’s
silly for us to do things which the computer could do for us.”
And of course, he noted, the possibilities extend far beyond calendaring
functions. An important focus is Enterprise Application Integration (EAI).
“Wherever there is a connection of common concepts between different
applications, then it becomes interesting to connect those applications
together, to break them out of their boxes,” Berners-Lee said. “The
Semantic Web starts to connect them together.”
One of the numerous foundational specifications for the Semantic Web is the
Resource Description Framework
interoperability between applications that exchange machine-understandable
information on the Web.
By implementing products based on RDF as an EAI “hub,” companies can link
together documents, and data stored in disparate databases, and pull
related concepts together when analyzing the information. That sort of
thing can be done with XML Web services today, but it can be a laborious
task, Berners-Lee explained. For instance, you might have information in
three XML documents that you want to merge. But each document uses its own
schema
merge the data in the three documents, Berners-Lee said a person might have
to interview the people that created the schemas and then write a new
schema that can take the data in the documents and express them as a new
document.
“RDF just concatenates the documents,” Berners-Lee said.
It could also have tremendous applications in scientific fields, he noted.
For instance, researchers studying weather phenomena would be able to
identify which weather balloons supplied particular datapoints, who
manufactured the balloons and where the materials came from. In the case of
corrupted data, that could allow them to identify faulty balloons and even
discard the particular data supplied by those balloons.
“You don’t just want to look at data,” he said. “You want to look at
documents and see where they came from.”
Of course, there are still obstacles and potential problems that the
Semantic Web faces, even if you put aside the difficulties of explaining
why people should be excited about it, Berners-Lee said.
There are many specifications that still need to be delivered. Web Ontology
Language (which the W3C has given the acronym OWL, solely because it sounds
better than WOL), is one of those. There are about 20 more potential
standards to sift through. Berners-Lee said the whole thing could be a
failure if the Semantic Web is not compatible with the existing Web. And
companies could try to derail the whole process by claiming patents on the
technology the W3C is developing.
“When it comes to the infrastructure standards, we have to keep it like
HTTP: royalty-free,” he said.