Guide to the semantic web

Published May 8, 2008

After two days of evaluating my last post about the semantic web, I have come to the conclusion that it is a very bad idea to blog when irritated about something. I’ll take the advice of Rob and Simo and simply not do it from now on. A funny thing was that I’ve tried for a long time to convince people to write semantic mark-up without much luck, so I decided to change tactics. Ending up pissing people off was not the plan, but it sure got a lot of ~~attention~~ heat on both the comments and on e-mail. I’m sorry about that.

On a positive side, it did lead to some interesting questions about why we should care about the semantic web. Also it illustrated that this is new territory to a lot of people and that was news to me. In this post I’ll try to answer some of the questions, get beneath the philosophy and give some examples on how to implement it.

What is the semantic web?

This question is better answered by the W3C and here is what they have to say about it:

The Semantic Web is a web of data. There is lots of data we all use every day, and it’s not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar?

Why not? Because we don't have a web of data. Because data is controlled by applications, and each application keeps it to itself.

The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.

This may be a little abstract or difficult to grasp, but let’s look at some examples where the semantic web can make our lives easier.

The bank example

Josh wrote in his comments:

The vast majority of ASP.NET web applications serve a business purpose. Lots of these apps don't even get public exposure (authentication required). […] For instance, what good would XFN and FOAF do for a banking application?

This is a good question I get a lot and it is based on a common misunderstanding on the use of semantic mark-up. As I normally explain it, you have to think of the semantic web as 3 things - a database, an enabler and some glue.

Let’s start with the semantic database. Whenever you mark up a web page with microformats (I’ll get back to those) you make parts of that page machine readable. It could be contact information or calendar events or some other structured data. A machine can query that data and aggregate that information with thousands of other microformatted web pages. That’s what Yahoo is experimenting with in their new search engine and Google is utilizing in their Social Graph API.

The enabler could be an online banking site. The pages are not public because only authenticated customers have access, so it cannot be used as a database. It can however enable you to query relevant information from the semantic database. Here is a scenario I got from a video with Tim Berners-Lee written from my memory:

Imaging reading your bank statement on your online banking application and see a transaction you don’t remember making. Each transaction in the statement has a date and time that could be used to figure out what you did that day. If the bank statement was marked up with semantic meaning about the date, then your browser can recognize it as a date – otherwise it can’t.

This brings us to the last part – the glue. In the bank scenario the glue is the browser or a browser plug-in since no existing browser supports microformats natively yet.

Now that the browser can recognize the dates on the bank statement, it should have no problem looking in your Outlook or Google calendar to see exactly what you did that day and present it to you at the click of a button.

Because your friends are using microformats on their website or profile page, the browser can also tell you exactly who you saw that day. You took some photos as well and uploaded them to Flickr, so now you also have photos associated with that particular day. Some of your friends are tagged on those photos with a link to their photos, so now you can associate their photos of you that day too. All with a click of a button in the browser – the glue.

It sounds very futuristic, but the technology for this has been around for years.

If browsers don’t support it, why should I?

This is the classic question of the chicken and the egg. If browsers don’t support it why should you, and if you don’t publish semantic mark-up why should the browser vendors waste their time on it? No one takes the first step and we end up getting nowhere. That is why we haven’t seen any killer applications that utilize the semantic web yet.

Lucky for us, this is an exciting time to play around with semantic formats because new services and applications that utilize it are starting to pop up like never before. We’re still waiting for the killer application, but that won’t happen before the database is big enough and there is only one way for that to happen. We need to start marking up our pages. If we don’t start then we stay in limbo and the bank scenario gets pushed further and further into the future. I for one have a hard time ignoring this chicken/egg situation – especially when so little is required to get started.

How to start

The easiest way to start is by choosing one or more microformats that make sense to use on your web application (I'll get back to that in a bit). Let’s take a look at microformats. To put it simple, a microformat is a standard naming convention of classes on HTML elements. Here is an example of a very simple hCard microformat marked up in existing HTML. hCard is used to mark up a person and is equivalent to the old vCard standard used by Outlook and other address books.

<div class="vcard">
<span class="fn">John Doe</span>
<a href="http://example.com" class="url">My website</a>
</div>

Notice the class names marked in bold. The name of those class attributes comes from the hCard standard defined at microformats.org. This is basic HTML and that is the whole idea with microformats. It’s easy for humans to implement and it’s easy for machines to read. You don’t have to change the layout of your page and you can use existing HTML elements already there.

Microformats are the best way to start because they can easily be added to existing web pages with little effort. Another example is the XFN (XHTML Friends Network) microformat. It is used to describe a person’s relations to other people. It could be family, co-workers, friends or other contacts. This is probably the easiest microformat and it uses the rel tag of the <a> element like so:

<a href=”http://johndoe.com” rel="friend co-worker">John Doe</a>
<a href=”http://melissa.com” rel="spouse">Melissa Smith</a>
<a href=”http://britney.com” rel="muse">Britney Spears</a>
<a href=”http://madskristensen.dk” rel="me">Mads Kristensen</a>

In case you’re wondering, the rel tag is valid XHTML. Here is a list of valid XFN relations you can use. The purpose is to make social relations machine readable and would be beneficial to use by social networks like Facebook and LinkedIn etc. Imaging signing up for the first time on Facebook and then just give them your URL and then let Facebook find your friends from your XFN tags and then connect you to them on Facebook automatically.

FOAF is the next step. It can also contain information about your friends and contacts like XFN can and that’s why Josh’s was right. XFN and FOAF (in most cases) are meant for public consumption and thereby contribute to the semantic database. An online bank site is not public and therefore XFN and FOAF aren’t suited for it.

I won’t go into details about FOAF because it deserves a post of its own.

Getting started

This is always the hard part when faced with something new. You saw the simplicity of the hCard and XFN microformat and you can rest assured that the other microformats are just as simple. To make it even easier to get started, I’ve listed different types of web applications and the microformats that might be possible for you to implement on those. They are listed in priority under each type. Just pick your type and follow the links to the implementation guides.

Personal website or blog

hCard
geo
xFolk or hAtom
rel-tag
XFN
XOXO (for the blogroll)
hResume (for your CV)

Company website

Webshop

Calendar and events

A good tip is to use the Operator Toolbar for Firefox when adding microformats to a page. It can show you how it looks as you code along. That way you know if you are doing it correctly.

I hope this will inspire you to get started using semantic mark-up on your existing and new web projects. Another day I’ll get to some other semantic formats and technologies such as FOAF, OWL, SIOC and APML.

Here are some links to earlier post I’ve written with how-to’s and code samples.

Comments

May 8, 2008

[b]Now[/b] you've got my full attention! This is starting to make logical sense to me. Thank you for enlightening us.

Josh Stodola

May 8, 2008

Trackback from DotNetKicks.com Guide to the semantic web

DotNetKicks.com

May 8, 2008

This a much better post :) I only used XFN and the rel-tag so far... I should dive into the other microformats as well

Simone

May 9, 2008

Nice post. This gives a very simple and easy to understand explanation of microformats. One question. How would the browser know which data is actually me? Using your example. What if my buddy logs into my computer to check his bank statement, how do we make sure that his info doesn't get posted to my calendar?

jacob

May 9, 2008

Much better post indeed. I have a couple comments: First, I don't think it was the tone of the last blog post that made it so bad, it was your inability to say anything I found useful. I read it and ignored it because I didn't want to up the drama, but I was very much on the metacrap side of things, so your post got my attention due to it's tone, but did a poor job of keeping it by not supporting how microformats can be at all useful to my business-oriented website. This is entirely different. Second, I am still concerned by metacrap. In his paper (http://www.well.com/~doctorow/metacrap.htm), some of Cory's comments are overly pessimistic since the web has already succeeded in many places where he indicates that it cannot, but with regard to meta-information he is very correct. This is because, as of yet, metadata is wholly author-driven and there is not yet an effective way of evaluating the reputation or ranking of such data; which is why most modern search engines almost entirely ignore meta information such as keywords, on even the most highly-reputable websites. In your example, this might not seem like a concern. For instance, whether my bank has supplied my browser with the right date or not, only my bank's reputation is at stake right? Yes and no. No matter whether their date is correct or not, if the information being retrieved by my browser originates only from my trusted local data sources - namely my calendar and emails, then authentic information is not much of a concern. But this describes a 'Semantic Intranet', if I may coin a term. It is not a 'web' of information. the only way I could see such a scenario being worthy of being called Web 3.0, would be if such information was highly discoverable and reliable across all the web. For instance, imagine that upon setting your search preference in your browser, you have also specified your semantic search preference. Thus clicking on a toolbar or quick-tag for that date could supply the user with relevant search results relating to that metadata as well. But how can anyone ensure that the semantic data authors by other websites does not spam us right back to Web 2.0? Say the flavor of the month site comes along and starts listing events every hour of every day for the next 100 years. It may not be as difficult a thing as I think it might be for search providers to web out the crap - but I am cautious with my optimism since it seems like a whole new ballgame. One which is not solved purely by PageRank or other similar approaches.

Jason Simone

May 9, 2008

Much better post. One I'm willing to bookmark and come back to when I have more time to think about it instead of the other hyperbole filled post about how ASP.Net developers are holding the web hostage and force feeding it lead paint chips :-)

Nathan

May 9, 2008

Trackback from Community Blogs .NET and Web 3.0

Community Blogs

May 9, 2008

@Jacob, It can know either by you telling it explicitly the first time you use it after install. Most likely it will use the XFN tag "me" to traverse your sites and verify your claims just like Google's Social Graph API does. If someone breaks into your computer then your screwed no matter if you use the semantic web or not.

Mads Kristensen

May 9, 2008

Mads, seriously, there's nothing wrong with ranting, but the best rants are "here's a problem and here's how I solve it / propose solving it". Yours came over as "here's a solution to a problem and if you don't see the problem then you're part of it".

Paul

May 9, 2008

@mads I figured you would have to give all your credentials for each site you plan on using microformats. Some people might have some issues with this. But maybe this could help push OpenId. Supply the browser(or whatever tool/plug in is used) your OpenId credentials, then every time you sign into a site with OpenId the microformat stuff can do it's thing.

jacob

May 10, 2008

Good article. I will read the next one as it hits the blog.

Enrico

May 10, 2008

Microformat sound microssoft is not open source: big problem

raul lilloy

May 10, 2008

@Raul, Microformats has nothing to do with open source or Microsoft.

Mads Kristensen

May 10, 2008

I just released my APML generator with a couple new controls (C#). I re-design the TagCloud and CatgeoryList to show my APML rating. Not much real use its different. Check out http://13sides.com. The generator won't make much sense in the next verison of BlogEngine.NET (as APML will be in the base) but the controls will still work. I'm currently working on a FOAF explorer, of course in C#. I have sketched out a fun control to view all your FOAF connections in AJAX style. I'll be posting about it soon.

Steve Dunlap

Comments are closed

Guide to the semantic web

What is the semantic web?

The bank example

If browsers don’t support it, why should I?

How to start

Getting started

Personal website or blog

Company website

Webshop

Calendar and events

Comments

May 8, 2008

May 8, 2008

May 8, 2008

May 9, 2008

May 9, 2008

May 9, 2008

May 9, 2008

May 9, 2008

May 9, 2008

May 9, 2008

May 10, 2008

May 10, 2008

May 10, 2008

May 10, 2008

May 12, 2008

May 14, 2008

May 14, 2008

May 27, 2008

July 3, 2008