0 Comments

Imaging a visitor that enters his website URL into a textbox and when he clicks the submit button, you are able to retrieve all kinds of information from the guy. His name, company info, online profiles, interests etc. all this from just a URL. It’s actually pretty easy if the website contains information about FOAF, APML or SIOC documents.

What you have to do is to download the HTML from the website and look for <link> elements in the header that matches FOAF, APML or SIOC type links. Then retrieve the URL to those documents from the href attribute and load it into an XML document. Now you can use XPath to find all the information you need.

Here’s is what a FOAF link element looks like:

<link type="application/rdf+xml" rel="meta" title="FOAF" href="http://example.com/foaf.xml" />

SIOC and APML links uses the same attributes in the same way, so we can use the title attribute to figure out which kind of document it is. All we need is a method that uses regular expressions to retrieve the document URLs from the HTML.

The code

This is a method that finds all the semantic links of a certain type in a HTML string.

private const string PATTERN = "<head.*<link( [^>]*title=\"{0}\"[^>]*)>.*</head>";

private static readonly Regex HREF = new Regex("href=\"(.*)\"", RegexOptions.IgnoreCase | RegexOptions.Compiled);

 

///<summary>

///Finds semantic links in a given HTML document.

///</summary>

///<param name="type">The type of link. Could be foaf, apml or sioc.</param>

///<param name="html">The HTML to look through.</param>

///<returns></returns>

private static Collection<Uri> FindLinks(string type, string html)

{

  MatchCollection matches = Regex.Matches(html, string.Format(PATTERN, type), RegexOptions.IgnoreCase | RegexOptions.Singleline);

  Collection<Uri> urls = new Collection<Uri>();

 

  foreach (Match match in matches)

  {

    if (match.Groups.Count == 2)

    {

      string link = match.Groups[1].Value;

      Match hrefMatch = HREF.Match(link);

 

      if (hrefMatch.Groups.Count == 2)

      {

        Uri url;

        string value = hrefMatch.Groups[1].Value;

        if (Uri.TryCreate(value, UriKind.Absolute, out url))

        {

          urls.Add(url);

        }

      }

    }

  }

 

  return urls;

}

Example

To find all the FOAF links in a page you can write something like this:

using (WebClient client = new WebClient())

{

  string html = client.DownloadString(txtUrl.Text);

  Collection<Uri> col = FindLinks("foaf", html);

 

  foreach (Uri url in col)

  {

    XmlDocument doc = new XmlDocument();

    doc.Load(url.ToString());

    Response.Write(Server.HtmlEncode(doc.OuterXml));

  }

}

If you want to search for APML or SIOC then just replace “foaf” with either “apml” or “sioc” in the method parameter. You might also want to take a look at my experimental FOAF parser class.

19 Comments

After two days of evaluating my last post about the semantic web, I have come to the conclusion that it is a very bad idea to blog when irritated about something. I’ll take the advice of Rob and Simo and simply not do it from now on. A funny thing was that I’ve tried for a long time to convince people to write semantic mark-up without much luck, so I decided to change tactics. Ending up pissing people off was not the plan, but it sure got a lot of attention heat on both the comments and on e-mail. I’m sorry about that.

On a positive side, it did lead to some interesting questions about why we should care about the semantic web. Also it illustrated that this is new territory to a lot of people and that was news to me. In this post I’ll try to answer some of the questions, get beneath the philosophy and give some examples on how to implement it.

What is the semantic web?

This question is better answered by the W3C and here is what they have to say about it:

The Semantic Web is a web of data. There is lots of data we all use every day, and it’s not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar?

Why not? Because we don't have a web of data. Because data is controlled by applications, and each application keeps it to itself.

The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.

This may be a little abstract or difficult to grasp, but let’s look at some examples where the semantic web can make our lives easier.

The bank example

Josh wrote in his comments:

The vast majority of ASP.NET web applications serve a business purpose. Lots of these apps don't even get public exposure (authentication required). […] For instance, what good would XFN and FOAF do for a banking application?

This is a good question I get a lot and it is based on a common misunderstanding on the use of semantic mark-up. As I normally explain it, you have to think of the semantic web as 3 things - a database, an enabler and some glue.

Let’s start with the semantic database. Whenever you mark up a web page with microformats (I’ll get back to those) you make parts of that page machine readable. It could be contact information or calendar events or some other structured data. A machine can query that data and aggregate that information with thousands of other microformatted web pages. That’s what Yahoo is experimenting with in their new search engine and Google is utilizing in their Social Graph API.

The enabler could be an online banking site. The pages are not public because only authenticated customers have access, so it cannot be used as a database. It can however enable you to query relevant information from the semantic database. Here is a scenario I got from a video with Tim Berners-Lee written from my memory:

Imaging reading your bank statement on your online banking application and see a transaction you don’t remember making. Each transaction in the statement has a date and time that could be used to figure out what you did that day. If the bank statement was marked up with semantic meaning about the date, then your browser can recognize it as a date – otherwise it can’t.

This brings us to the last part – the glue. In the bank scenario the glue is the browser or a browser plug-in since no existing browser supports microformats natively yet.

Now that the browser can recognize the dates on the bank statement, it should have no problem looking in your Outlook or Google calendar to see exactly what you did that day and present it to you at the click of a button.

Because your friends are using microformats on their website or profile page, the browser can also tell you exactly who you saw that day. You took some photos as well and uploaded them to Flickr, so now you also have photos associated with that particular day. Some of your friends are tagged on those photos with a link to their photos, so now you can associate their photos of you that day too. All with a click of a button in the browser – the glue.

It sounds very futuristic, but the technology for this has been around for years.

If browsers don’t support it, why should I?

This is the classic question of the chicken and the egg. If browsers don’t support it why should you, and if you don’t publish semantic mark-up why should the browser vendors waste their time on it?  No one takes the first step and we end up getting nowhere. That is why we haven’t seen any killer applications that utilize the semantic web yet.

Lucky for us, this is an exciting time to play around with semantic formats because new services and applications that utilize it are starting to pop up like never before. We’re still waiting for the killer application, but that won’t happen before the database is big enough and there is only one way for that to happen. We need to start marking up our pages. If we don’t start then we stay in limbo and the bank scenario gets pushed further and further into the future. I for one have a hard time ignoring this chicken/egg situation – especially when so little is required to get started.

How to start

The easiest way to start is by choosing one or more microformats that make sense to use on your web application (I'll get back to that in a bit). Let’s take a look at microformats. To put it simple, a microformat is a standard naming convention of classes on HTML elements. Here is an example of a very simple hCard microformat marked up in existing HTML. hCard is used to mark up a person and is equivalent to the old vCard standard used by Outlook and other address books.

<div class="vcard">
   <span class="fn">John Doe</span>
   <a href="http://example.com" class="url">My website</a>
</div>

Notice the class names marked in bold. The name of those class attributes comes from the hCard standard defined at microformats.org. This is basic HTML and that is the whole idea with microformats. It’s easy for humans to implement and it’s easy for machines to read. You don’t have to change the layout of your page and you can use existing HTML elements already there.

Microformats are the best way to start because they can easily be added to existing web pages with little effort. Another example is the XFN (XHTML Friends Network) microformat. It is used to describe a person’s relations to other people. It could be family, co-workers, friends or other contacts. This is probably the easiest microformat and it uses the rel tag of the <a> element like so:

<a href=”http://johndoe.com” rel="friend co-worker">John Doe</a>
<a href=”http://melissa.com” rel="spouse">Melissa Smith</a>
<a href=”http://britney.com” rel="muse">Britney Spears</a>
<a href=”http://madskristensen.dk” rel="me">Mads Kristensen</a>

In case you’re wondering, the rel tag is valid XHTML. Here is a list of valid XFN relations you can use. The purpose is to make social relations machine readable and would be beneficial to use by social networks like Facebook and LinkedIn etc. Imaging signing up for the first time on Facebook and then just give them your URL and then let Facebook find your friends from your XFN tags and then connect you to them on Facebook automatically.

FOAF is the next step. It can also contain information about your friends and contacts like XFN can and that’s why Josh’s was right. XFN and FOAF (in most cases) are meant for public consumption and thereby contribute to the semantic database. An online bank site is not public and therefore XFN and FOAF aren’t suited for it.

I won’t go into details about FOAF because it deserves a post of its own.

Getting started

This is always the hard part when faced with something new. You saw the simplicity of the hCard and XFN microformat and you can rest assured that the other microformats are just as simple. To make it even easier to get started, I’ve listed different types of web applications and the microformats that might be possible for you to implement on those. They are listed in priority under each type. Just pick your type and follow the links to the implementation guides.

Personal website or blog

  1. XOXO (for the blogroll)
  2. hResume (for your CV)

Company website

Webshop

Calendar and events

A good tip is to use the Operator Toolbar for Firefox when adding microformats to a page. It can show you how it looks as you code along. That way you know if you are doing it correctly.

I hope this will inspire you to get started using semantic mark-up on your existing and new web projects. Another day I’ll get to some other semantic formats and technologies such as FOAF, OWL, SIOC and APML.

Here are some links to earlier post I’ve written with how-to’s and code samples.

46 Comments

The following contains unfair generalizations about developers, rough language and a bad attitude. Dear reader, this is tough love from yours truly.

You’re participating in the stagnation of the World Wide Web and you hold the human race hostage.

Ouch, that was a bit harsh, but it probably got your attention. That’s important because what I want to address is no laughing matter. There seem to be a huge unwillingness among ASP.NET developers to participate in evolving the web. This is not acceptable and surely not understandable since the web is what brings food on the table for most ASP.NET developers.

The hostage situation

It seems that other developers are much better to push the web forward than our camp is. Python, PHP, RoR and even Perl developers seem to be much more ideological in their approach to web development and makes a bigger effort in evolving it. That might be because these languages are more common in universities. In the meanwhile, ASP.NET devs just sit back and watch it happen without contributing or maybe even caring about it. That is not a flattering attribute by any means.

In case you are wondering, I’m talking about the semantic web or Web 3.0 if you will. Whenever I have written about it, no one really seems to care. Most other .NET bloggers haven’t written about it and the online discussions rarely have ASP.NET devs joining in. As I see it, ASP.NET developers don’t care about the future of the platform they work with every single day and that is a crime.

It’s a crime because you as a developer limit the billions of web users from utilizing all the wonderful possibilities of the semantic web. It’s a crime because you hold them hostage for personal convenience. It’s more convenient to sit back than to learn something new, but this time it’s different. This is not equivalent to learning LINQ or the MVC Framework. This is bigger than you and it’s bigger than the client/company you are working for. This is what will define our online future.

It’s growing

You might think I’m being a bit dramatic about this matter and you are absolutely correct. According to Tim Berners-Lee, the semantic web is growing exponentially these days, but the ASP.NET camp is sleeping through it. The problem is that it will make the growth slower and then we are back at the hostage situation. If we don’t help push native support for these things in the next version of the .NET Framework and Visual Studio, then we become even further alienated and left behind. This matter demands dramatic change in the way we approach web development, so I think it is more than justified to be dramatic.

The semantic web is starting to become useful due to its growth lately, but for it to become truly useful, it has to be widely adopted by big, medium and small websites containing any structured data worth sharing (since it’s on the Internet in the first place, it most likely is worth sharing).

Make a small effort

What puzzles me again and again is that only a fraction of web developers use microformats or any other semantic formats. In the light of how easy and useful it is, it leaves me saddened. It is not a duty to implement semantic mark-up; it is a privilege to help drive the future of the web – and with a minimal effort as an added bonus.

So, if you have a website containing user profiles, calendar events or any other structured data, then please tag them up with the appropriate microformat and XFN tags. If you want to do a little more, then FOAF is a good place to start. Remember, it starts with us, the developers, so wake up! (the lack of the word 'please' is not accidental, although appropriate).

Semantic fun-facts

Did you know that Yahoo has a new search engine in beta that utilizes microformats in a very cool way?
Did you know that Technorati has an hCard microformat parser service?
Did you know that Google has a Social Graph API that takes advantage of microformats and FOAF?
Did you know that LinkedIn is the largest publisher of the hResume microformat?
Did you know that the Operator toolbar for Firefox let’s you utilize microformats on any web page?
Did you know that I've written a guide to implementing microformats in ASP.NET?

Update: If you hated this post then try this new friendly version

0 Comments

I’ve been looking deeply into FOAF lately and last week I worked on a parser that could wrap a FOAF document into a strongly typed class for easy consumption by C#. Now I want to share it with anyone interested. It appears there are no FOAF parsers available in C# yet. I wasn’t able to find any.

The parser

The class FoafParser has two public methods, Parse and ParseFriendsAsync, which handles two different approaches in fetching FOAF documents.

The Parse method

This method takes a URL as a parameter which must point to an online FOAF document. It then parses the foaf:Person element which contains fields such as name, birth date, photo, homepage etc. It then looks for people in the FOAF document that are described as someone you know, parses the information available and adds it to a public collection property. The following example parses a FOAF document and binds it to an ASP.NET Repeater.

FoafParser parser = new FoafParser();

parser.Parse(new Uri("http://example.com/foaf.xml"));

 

if (parser.IsParsed)

{

  ltOwner.Text = parser.Owner.Name;

  repFriends.DataSource = parser.Friends;

  repFriends.DataBind();

}

The ParseFriendsAsync method

When you list people you know in a FOAF document, it is common to only provide very little information such as the full name. When that is the case it is also common to provide a link to that persons FOAF document using the rdfs:seeAlso tag. We can then use that link to retrieve all your friend's FOAF documents and parse them as well.

It can take a long time to retrieve all those documents one at a time, but if we do it asynchronously then we can speed things up substantially. When you have called the Parse method on the FoafParser class, you can then call ParseFriendsAsync. It loops through all the friends that the Parse method found in search for the rdfs.seeAlso tag. When it finds it, a web request is made to retrieve the FOAF document of that friend and then parse it for information.

When the ParseFriendsAsync method is finished, it triggers an event you can listen to. All the friends with FOAF documents have now been updated in the collection.

private void RetrieveAsync()

{

  FoafParser parser = new FoafParser();

  parser.Parse(new Uri("http://example.com/foaf.xml"));

 

  if (parser.IsParsed)

  {

    parser.FriendParsingComplete += new EventHandler<EventArgs>(parser_FriendParsingComplete);

    parser.ParseFriendsAsync();

  }

}

 

void parser_FriendParsingComplete(object sender, EventArgs e)

{

  // Now the Friends collection has been updated asynchronously.

  FoafParser parser = (FoafParser)sender;

  Response.Write(parser.Friends.Count);

}

Download

In the zip file below you’ll find the FoafParser.cs class as well as an example .aspx page and code-behind file.

FoafTester.zip (3,77 kb)