Wednesday, April 18, 2007

(required) Discussion Topic for the blog: Critique the following interview by Tim Berners Lee on Semantic web..


 Here is a fairly high level inteview on semantic web by Tim Berners Lee given this week to Business Week.
Critique this interview (agree or disagree) in the context of the discussion in the class and your understanding.
Post your comments on blog.


CEO Guide to Technology April 9, 2007, 12:01AM EST text size: T T

Q&A with Tim Berners-Lee

The inventor of the Web explains how the new Semantic Web could have profound effects on the growth of knowledge and innovation

Tim Berners-Lee is far from finished with the World Wide Web. Having invented the Web in 1989, he's now working on ways to make it a whole lot smarter.

For the last decade or so, as director of the World Wide Web Consortium (W3C), Berners-Lee has been working on an effort he's dubbed the "Semantic Web." At the heart of the Semantic Web is technology that makes it easier for people to find and correlate the information they need, whether that data resides on a Web site, in a corporate database, or in desktop software.

The Semantic Web, as Berners-Lee envisions it, represents a change so profound that it's not always easy for others to grasp. This isn't the first time he's encountered that problem. "It was really hard explaining the Web before people just got used to it because they didn't even have words like click and jump and page," Berners-Lee says. In a recent conversation with writer Rachael King, Berners-Lee discussed his vision for the Semantic Web and how it can alter the way companies operate. Edited excerpts follow.

It seems one of the problems the Semantic Web can solve is helping unlock information in various silos, in different software applications, and different places that currently cannot be connected easily.

Exactly. When you use the word "silos," that's the word we hear when somebody in the enterprise talks about the stovepipe problem. Different words for the same problem: that business information inside the company is managed by different sorts of software, and you have to go to a different person and learn a different program to see it. Any enterprise CEO really ought to be able to ask a question that involves connecting data across the organization, be able to run a company effectively, and especially to be able to respond to unexpected events. Most organizations are missing this ability to connect all the data together.

Even outside data can be integrated, as I understand it.

Absolutely. Anybody making real decisions uses data from many sources, produced by many sorts of organizations, and we're stymied. We tend to have to use backs of envelopes to do this and people have to put data in spreadsheets, which they painfully prepare. In a way, the Semantic Web is a bit like having all the databases out there as one big database. It's difficult to imagine the power that you're going to have when so many different sorts of data are available.

It seems to me that we're overwhelmed with data and this might be a good way to help us find the data we need.

When you can treat something as data, your querying can be much more powerful.

In your speech at Princeton last year, you said that maybe you had made a mistake in naming it the Semantic Web. Do you think the name confuses some people?

I don't think it's a very good name but we're stuck with it now. The word semantics is used by different groups to mean different things. But now people understand that the Semantic Web is the Data Web. I think we could have called it the Data Web. It would have been simpler. I got in a lot of trouble for calling the World Wide Web "www" because it was so long and difficult to pronounce. At the end, when people understand what it is, they understand that it connects all applications together or gives them access to data across the company when they see a few general Semantic Web applications.

Some of the early work with the Semantic Web seems to have been done by government agencies such as the Defense Advanced Research Projects Agency and the National Aeronautics & Space Administration. Why do you think the government has been an early adopter of this technology?

I understand that DARPA had its own serious problems with huge amounts of data from all different sources about all sorts of things. So, they saw the Semantic Web rightly as something that was aimed directly at solving the problems they had on a large scale. I know that DARPA then funded some of the early development.

You have touched on the idea that the Semantic Web will make it easier to discover cures for diseases. How will it do that?

Well, when a drug company looks at a disease, they take the specific symptoms that are connected with specific proteins inside a human cell which might lead to those symptoms. So the art of finding the drug is to find the chemical that will interfere with the bad things happening and encourage the good things happening inside the cell, which involves understanding the genetics and all the connections between the proteins and the symptoms of the disease.

It also requires looking at all the other connections, whether there are federal regulations about the use of the protein and how it's been used before. We've got government regulatory information, clinical trial data, the genomics data, and the proteomics data that are all in different departments and different pieces of software. A scientist who is going through that creative process of brainstorming to find something that could possibly solve the disease has to somehow keep everything in their head at the same time or be able to explore all these different axes in a connected way. The Semantic Web is a technology designed to specifically do that—to open up the boundaries between the silos, to allow scientists to explore hypotheses, to look at how things connect in new combinations that have never before been dreamt of.

The Semantic Web makes it so much easier to find and correlate information about nearly anything, including people. What happens if that information gets into the wrong hands? Is there anything that can be done to safeguard privacy?

Here at [MIT], we are doing research and building systems that are aware of the social issues. They are aware of privacy constraints, of the appropriate uses of information. We think it's important to build systems that help you do the right thing, but also we're building systems that, when they take data from many, many sources and combine it and allow you to come to a conclusion, are transparent in the sense that you can ask them what they based their decision on and they can go back and you can check if these are things that are appropriate to use and that you feel are trustworthy.

Developing Semantic Web standards has taken years. Has it taken a long time because the Semantic Web is so complex?

The Semantic Web isn't inherently complex. The Semantic Web language, at its heart, is very, very simple. It's just about the relationships between things.


Yang Qin said...

After reading this interview, the first thing showing up in my brain is that, "semantic web" is "semantic web", but not "data web".
Admittedly, semantic web is a kind of standard organization and representation of data which makes data sharing and transmitting much more easier. However, in my opinion, the most important feature of semantic web is not how it represents data, but the "semantic part", i.e., the logically described "background knowledge". Without this part, data makes no sense at all and nobody can leverage the data at all. Moreover, the "semantic part" is the very point which shows the elegance of semantic web.
About the medical stuff the guy mentioned, I think it's just for fun. What he's saying is just the same thing as "data is good for curing diseases", which actually provides no information at all.

feelingbird said...

Two things in the conversation interest me, one is its definition of 'Semantic Web', and the other is the difference between 'Semantic Web' and 'Data Web'. In the opinion of Lee, the Semantic web integrates a bunch of techniques to construct connections among all the data, whereas the Data Web is a collection of data sources. I think that's the reason Lee prefers 'Data Web' over 'Semantic Web'. 'Semantic' is really a good idea, but it's one way to help people to retrieval information efficiently (I don't know if it's the only one). However common users more care about the data (or information) itself, not how this retrieval process is implemented. I guess that is one reason that there is a long way to go for the semantic web.

Zheshen(Jessie) said...

I agree with what Yang Qin said. In AI, knowledge and the ability to learn and make use of knowledge is the essential problem. I think to some extent, “Semantic Web” is trying to find an easier way to deal with this problem. It is trying to convert all kinds of knowledge into data with uniform “format”, or in other words--“grammar”. Then “making use of knowledge” becomes “querying”. I don’t believe that with a semantic architecture, the web can be intelligent enough to do something like discovering cures for diseases as a professional doctor, since there are so many subtle, creative, emotional factors, which are too difficult to model and compute, involved in the process of human brain learning, integrating and making use of knowledge. However, it is definitely that with “Semantic Web”, data from various sources can be much more accessible than ever before. Unfortunately, current problem with “Semantic Web” seems more social related rather technique related.

kartik talamadupula said...

I think there were a couple of things that stood our starkly as far as I can see (of course, the interview is pretty short as well):

1. The fact that DARPA and NASA seem to be the first ones to jump onto the Semantic Web bandwagon as well. Can't help but draw parallels with the www and how it was pioneered by the military and then brought into the larger public realm (with some obvious differences like the Semantic Web is already in use by the public, eg Jim Hendler's UMD page)

2. Early on, they talk about how the semantic web can unlock information in various "silos" (and the whole discussion about "Data Web" etc). To me, this makes it a prime candidate to harness the kind of backend databases that (I am sure) Google has built up. In light of this, I don't quite think privacy is being looked at seriously enough -- the part about addressing privacy concerns by ".. being aware of social issues" seems like sweet words, nothing more.

3. I felt the example of the potential of the Semantic Web (finding cures for diseases) could have been a bit more general and not such a specific thing, but maybe that's just CS speaking.

Raju said...

As many mentioned, curing disease example is just one application, but it is a catchy example for public. As it is mentioned in class, the biggest challenge is realizing semantic web is getting people confirmed to standard, so creating a buzz is necessary, and I think it is the right example to use.

The other interesting piece is privacy issues, which is associated with any mass information integration effort, if the semantic web is popular enough, it is almost sure that there will be enough researchers jumping into this area and we will have many methods. Even with the advent of Internet and WWW, protecting confidentiality and privacy is more difficult than pervious era, but confidential information still exist in the world, though we had to invent new tools for preserving privacy.

In a critiques hat, he does not touch much on problems for realizing semantic web, event the most severe one- the problem of getting people confirmed to the standard.

Sergei Golitsinski said...

It’s hard to argue about a concept with its creator. Having invented the World Wide Web right about after Al Gore invented the Internet, Tim Berners-Lee proceeded to invent the Semantic Web. Deviating from Berners-Lee’s interpretation of the concept of semantic web is like reassigning the meaning of words – not unlike Humpty Dumpty (“When I use a word,' Humpty Dumpty said, in a rather scornful tone,' it means just what I choose it to mean, neither more nor less.”) And yet, I’ll try, for my opinion on the subject is slightly different.

In this interview, Berners-Lee focuses on the idea that the Semantic Web is a means for connecting data sources. It gets to the point that if we replace most instances of the words “Semantic Web” with “XML,” the meaning (or, forgive the bad pun - ‘semantics’) of the interview will remain almost unchanged. However, this ambiguity (or, rather the generality of the interview) is easily explained by the publication and its target audience. Business Week is read by MBAs – therefore, the information presented in it, should be a) business-oriented (which explains Lee’s focus on data and decision-making support) and b) dirt-simple (explaining the borderline-ridiculous cures for diseases analogy).

In my opinion, the Semantic Web is not so much about bridging existing databases on the Web, as about bringing meaning to the content of existing web pages, regardless of whether this content comes from a database, or was hardcoded into the HTML by the author. In this regard, while any XML-like tagging (or any other type of mark-up) identifies *data* - RDF-Schema and Owl bring out the *meaning* (or semantics) of that data. Therefore, I stand by the term “Semantic Web” as opposed to “Data Web.” And since it seems that after having spent 7 years in Iowa, I still cannot escape farm terminology, I’d like to extend Lee’s silos analogy: it’s not so much about connecting the silos - it’s more about bringing meaning to the corn in the corn fields :-)

Shankar said...

I think Zheshen is correct in mentioning that implementing a Semantic Web is much more of a social challenge than a technical one. Semantic Web tries to tag data from different groups of people with some standard representation. However the problem here is that the data can "mean" different things to different people. (Ironically Tim Lee mentions the term 'Semantic Web' itself means different things to different people). Imagine tagging articles about controversial topics like Iraq war.
However the idea of Semantic Web is useful in the case of specific domains where all the information is in the form of sentences about some facts which everybody agrees on. This could be the idea behind data web that he mentions. So the goal of 'Data Web' does seem more modest and plausible than the goal of a Semantic Web.

Sanjay said...
This comment has been removed by the author.
Sanjay said...

The interview, though is short provides us with few key information or probably triggers us in learning more about the Semantic web. It is interesting to note that DARPA and NASA have shown good involvement in promoting the semantic web. It might not be out of place to mention about the DAML – DARPA Agent Markup Language that is used to facilitate the concept of Semantic web. XML itself provides no semantics for its tags. A program can be created to assign similar semantics to a "subProperty" tag, but since that semantics isn't part of the XML spec, applications will still conform to XML specs but not make the assertion. Other web languages such as RDFS go a step further than XML, and support the above example just given, but DAML is something more than that. It offers standard properties such as "childOf" in a English geneology site is the same as "enfantDe" on a French site. Thus it allows the machines to make inferences that human beings do.
As Raju mentioned, I am pretty much sure that there would be researchers coming up with novel techniques to protect the privacy of the users. Moreover noting the difference between the Semantic web which integrates a bunch of techniques to construct connections among all data and the Data Web which is a collection of data sources, the Data web would have been more appropriate, though this does not matter a lot as long as it is completely understood by the users ( as it was with the www).
Medical field- Determining a cure for diseases: One of the good reasons that is found to be more “pleasing” to the readers is the use of semantic web in determining the cure for diseases. The technology to connect all the data present in different departments and different pieces of software would be of ultimate help to the scientists and Semantic web would be a prime factor in achieving this feat.
The last lines of the interview explains to people why a simple language or concept is taking so long to be implemented (though not yet fully). It is not that easy to establish relations with every other thing.

Newton Alex said...

“The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users”. This is an extract from Time-Berners-Lee’s article in the Scientific American journal (May 2001). (

After reading the current interview in Business week, I get the feeling that Lee has gone down on his expectations. Here he talks mostly about data integration. He doesn’t talk anything about bringing structure to the web or making the web more meaningful. If we are just looking at integrating the information sources then the name “Data Web” is good enough, on the other hand if we still think about organizing the web and trying to assign some kind of structure and meaning to the data that is available on the web then it is appropriate to call it “Semantic web”. Actually, does it even matter what we call it? If it serves the purpose it is good :)

Regarding the social issues and privacy, this has always been a part of the development of new technologies. No inventions can be absolutely good or bad. The privacy issues are one of the negative aspects of the Semantic web. But we should weigh the advantages and disadvantages carefully so that we don’t miss the good things of the new technology. When HTTP was invented people found it so useful but then when they discovered the security or privacy issues involved in it they invented HTTPS. Similarly first let us have the Semantic Web, then we can invent a Secure Semantic Web :).