Data at the root level is invalid

Apr 16, 2008

A few days ago I needed to write some functionality to fetch an XML document from a URL and load it into an XmlDocument. As always I use the WebClient to retrieve simple documents over HTTP and it looked like this:

using (WebClient client = new WebClient())

{

  string xml = client.DownloadString("http://example.com/doc.xml");

  XmlDocument doc = new XmlDocument();

  doc.LoadXml(xml);

}

I ran the function and got this very informative XmlException message: Data at the root level is invalid. Line 1, position 1. I’ve seen this error before so I knew immediately what the problem was. The XML document that was retrieved from the web had three strange characters in the very beginning of the document. It looks like this:

<?xml version="1.0" encoding="utf-8"?>

Of course that result in an invalid XML document and that’s why it threw the exception. The three characters are actually a hex value (0xEFBBBF) of the preample of the encoding used by the document.

As said, I knew this error and also an easy way around still using the WebClient. Instead of retrieving the document string from the URL and load it into the XmlDocument using its LoadXml method, the easiest way is to retrieve the response stream and use the Load method of the XmlDocument instead. It could look like this:

using (WebClient client = new WebClient())

using (Stream stream = client.OpenRead("http://example.com/doc.xml"))

{     

  XmlDocument doc = new XmlDocument();

  doc.Load(stream);

}

Often there are situations where the WebClient isn’t well suited for this or one might simply prefer to use the WebRequest and WebResponse classes. Still, the solution is very simple. Here is what it could look like:

WebRequest request = HttpWebRequest.Create("http://example.com/doc.xml");

using (WebResponse response = request.GetResponse())

using (Stream stream = response.GetResponseStream())

{

  XmlDocument doc = new XmlDocument();

  doc.Load(stream);

}

This is something that can give gray hairs if you haven’t run into it before, so I thought I’d share.  

If you have any issues with the three preample characters when serving - not consuming - XML documents, then check out Rick Strahl's very informative post about it.

* $4.95/month BlogEngine.net Hosting – Click Here!

Comments (11) -

Taras
Taras Ukraine
4/17/2008 10:10:31 AM #

This error reminds me a similar trouble i've got when working with HtmlAgilityPack library (which has a basic wrapper over WebClient class + some rudimentary caching infrastructure).

Dan
Dan United States
4/17/2008 12:32:30 PM #

Those three little characters preceding xml data remind me of another three little characters I see: on the first entry in the CSS of my blog's theme... So- where do they come from?

wwfDev
wwfDev
4/17/2008 3:19:04 PM #

Mads, why don't you mention Byte Order Mark (BOM) in your post - instead of calling it "the three characters" throughout your post? I mean, you obviously know what it is since you link to Rick's pages. This post would not come up in a BOM-google search (or maybe it will now that my comment mentions it Smile )and thats a pity, since it might be someone stumbeling upon a problem with BOM's and could have put your solution to it.

Mads Kristensen
Mads Kristensen Denmark
4/17/2008 3:21:56 PM #

@wwfDev,

That is actually on purpose exactly for SEO reasons. There are many articles and posts about BOM, but people will only search for BOM if they know what it is. It's more likely people that doesn't know what it is to search for "three strange characters".

Wayne
Wayne United States
4/21/2008 3:31:06 PM #

Yup, no clue what BOM is...I would have type those 'three strange characters' as well.  At least now I know.  Smile

Mike Hamilton
Mike Hamilton United States
4/21/2008 3:49:33 PM #

I ran into this when posting an XML file to a partners web service, and this article could have helped me then. Now that we have an internal blog (running some dotnetBlogEngine or something LOL), this is getting posted to it for future reference.

thanks

Wayne
Wayne United States
4/21/2008 6:45:37 PM #

Sorry Mads, just testing my gravitar setup....loving BlogEngine!

Wayne
Wayne United States
4/21/2008 6:46:30 PM #

Would be great if we could zap our comments...now I just feel foolish...

Mads Kristensen
Mads Kristensen Denmark
4/21/2008 6:48:18 PM #

@Wayne,

Don't worry. With that clown nose it's too late anyway Smile

Paulo Morgado
Paulo Morgado Portugal
4/22/2008 7:14:11 AM #

Why not just use an XmlReader?

Michael
Michael Denmark
5/8/2008 6:17:02 PM #

Hey Mads. I love your blog, but I have a pretty simple question for you about this post. Why don't you just use the "Load" method of the XmlDocument class to load the Xml file directly into the XmlDocument instance? The "Load" method has an overload that takes a string as the parameter, and if you place the full url for the Xml file into it, it will load the xml document, even though it's not a local Xml file. I'm just curious, 'cause I don't know if there's any advantages of using the "WebClient" or "WebRequest" class' like you do.

Pingbacks and trackbacks (3)+

Comments are closed

About the author

Mads Kristensen

Mads Kristensen
Program Manager at the Microsoft Web Platform team and founder of BlogEngine.NET.

More...

Month List

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.