Resolve and shorten URLs in C#

Published Sep 13, 2007

Recently I’ve needed a method that would look at some text and automatically discover all URLs and turn them into hyperlinks. I’ve done that before so it was a matter of copy/paste. This time it was a little more complicated, because the resolved URLs could not be longer than 50 characters long. That was important because otherwise it would break the design. A long URL doesn’t word wrap so it would end up bleeding out of the design.

So, the challenge was to resolve the URLs and turn them into links, while keeping the anchor text at a max of 50 characters long. To shorten the URL is easy enough, but it all comes down to how you want it shortened.

The rules

1. If the URL is longer than 50 characters then remove “http://”.
2. If it still is longer than allowed it must compress the folder structure like shown below.

http://www.microsoft.com/windows/server/2003/compare.aspx -> http://www.microsoft.com/.../compare.aspx

3. If the URL is still longer, then it must look for query strings and fragments and remove them as well.

The code

[code:c#]

private static readonly Regex regex = new Regex("((http://|www\\.)([A-Z0-9.-:]{1,})\\.[0-9A-Z?;~&#=\\-_\\./]{2,})", RegexOptions.Compiled | RegexOptions.IgnoreCase);
private static readonly string link = "<a href=\"{0}{1}\">{2}</a>";

public static string ResolveLinks(string body)
{
if (string.IsNullOrEmpty(body))
return body;

foreach (Match match in regex.Matches(body))
{
    if (!match.Value.Contains("://"))
    {
      body = body.Replace(match.Value, string.Format(link, "http://", match.Value, ShortenUrl(match.Value, 50)));
    }
    else
    {
      body = body.Replace(match.Value, string.Format(link, string.Empty, match.Value, ShortenUrl(match.Value, 50)));
    }
}

return body;
}

private static string ShortenUrl(string url, int max)
{
if (url.Length <= max)
return url;

// Remove the protocal
int startIndex = url.IndexOf("://");
if (startIndex > -1)
url = url.Substring(startIndex + 3);

if (url.Length <= max)
return url;

// Remove the folder structure
int firstIndex = url.IndexOf("/") + 1;
int lastIndex = url.LastIndexOf("/");
if (firstIndex < lastIndex)
url = url.Replace(url.Substring(firstIndex, lastIndex - firstIndex), "...");

if (url.Length <= max)
return url;

// Remove URL parameters
int queryIndex = url.IndexOf("?");
if (queryIndex > -1)
url = url.Substring(0, queryIndex);

if (url.Length <= max)
return url;

// Remove URL fragment
int fragmentIndex = url.IndexOf("#");
if (fragmentIndex > -1)
url = url.Substring(0, fragmentIndex);

if (url.Length <= max)
return url;

// Shorten page
firstIndex = url.LastIndexOf("/") + 1;
lastIndex = url.LastIndexOf(".");
if (lastIndex - firstIndex > 10)
{
    string page = url.Substring(firstIndex, lastIndex - firstIndex);
    int length = url.Length - max + 3;
    url = url.Replace(page, "..." + page.Substring(length));
}

return url;
}

[/code]

Implementation

To use these methods, just call the ResolveLinks method like so:

[code:c#]

string body = ResolveLinks(txtComment.Text);

[/code]

It works on URLs with or without the http:// protocol prefix. In other words http://www.example.com/ and http://www.example.com/ resolves to the same URL. This technique is implemented in the comments on this blog. You can test it by writing a comment with a URL in it.

The rules

The code

Implementation

Comments