Remove whitespace from your pages

Oct 21, 2007

Ok, this is not new. I’ve also written about this a few times in the past. The thing is that removing whitespace is a very tricky discipline that is different from site to site. At least that was what I thought until very recently.

For some unexplained reason I started working on a little simple method to remove whitespace in a way so it works on all websites without breaking any HTML. Maybe not unexplained since I’ve written about it so many times that it would seem I got a secret obsession.

Obsession or not, here is the code I ended up with after a few hours of hacking. Just copy the code onto your base page or master page and watch the magic.

private static readonly Regex REGEX_BETWEEN_TAGS = new Regex(@">\s+<", RegexOptions.Compiled);
private static readonly Regex REGEX_LINE_BREAKS = new Regex(@"\n\s+", RegexOptions.Compiled);
 
/// <summary>
/// Initializes the <see cref="T:System.Web.UI.HtmlTextWriter"></see> object and calls on the child
/// controls of the <see cref="T:System.Web.UI.Page"></see> to render.
/// </summary>
/// <param name="writer">The <see cref="T:System.Web.UI.HtmlTextWriter"></see> that receives the page content.</param>
protected override void Render(HtmlTextWriter writer)
{
  using (HtmlTextWriter htmlwriter = new HtmlTextWriter(new System.IO.StringWriter()))
  {
    base.Render(htmlwriter);
    string html = htmlwriter.InnerWriter.ToString();
 
    html = REGEX_BETWEEN_TAGS.Replace(html, "> <");
    html = REGEX_LINE_BREAKS.Replace(html, string.Empty);
 
    writer.Write(html.Trim());
  }
}

Remember that whitespace removal speeds up rendering in especially IE and reduces the overall weight of your page.

* $4.95/month BlogEngine.net Hosting – Click Here!

Comments (17) -

Miron Abramson
Miron Abramson Israel
10/21/2007 11:20:02 PM #

Hi Mads,
Good for us that you have such obsessions   Wink
Thanks for sharing.
Why don't you put this code in one of the modules that already in the site?

Miron

Mark kemper
Mark kemper Australia
10/22/2007 12:50:27 AM #

I concer with Miron

Fredrik
Fredrik Norway
10/22/2007 6:13:05 AM #

I like the idea, but would love to see some stats on this - how does the increased rendering time server-side compare to the time saved client-side?

NinjaCross
NinjaCross Italy
10/22/2007 7:07:43 AM #

Thanks for sharing Mads, I was waiting for a 360° solution like this  Smile

Michel
Michel France
10/22/2007 7:25:39 AM #

I use this sometimes, but I think you need to be careful about spaces inside TEXTAREA.

Mads Kristensen
Mads Kristensen Denmark
10/22/2007 7:36:56 AM #

@Michel, this technique does not change anything inside a TEXTAREA. I've also had that problem before so this version will not break things like that.

spybot
spybot Czech Republic
10/22/2007 9:58:59 AM #

Im using:

System.IO.StringWriter stringWriter = new System.IO.StringWriter();
System.Web.UI.HtmlTextWriter htmlWriter = new System.Web.UI.HtmlTextWriter(stringWriter);
base.Render(htmlWriter);
System.Text.StringBuilder htmlData = new System.Text.StringBuilder(stringWriter.ToString());

.. move postbeck controls downpage

//remove whitespace
html.Replace("  ", String.Empty);
html.Replace("\t", String.Empty);
html.Replace("\r\n", String.Empty);

writer.Write(htmlData.ToString());

..

Do you thing REGEX is faster like stringreplace?

Brian
Brian United States
10/22/2007 5:09:17 PM #

@spybot

I think Regex is faster.

michael
michael United States
10/22/2007 7:06:14 PM #

Having issues with AJAX postback. For now I put in a condition to not do anything for Request["HTTP_X_MICROSOFTAJAX"] == null and seems to work. Might be nice to do the same for AJAX returns though - prolly some regex magic needed.

Dactivo
Dactivo Spain
10/23/2007 7:07:31 AM #

This piece of code isn't working for me in textareas where you include more than one line break:

private static readonly Regex REGEX_LINE_BREAKS = new Regex(@"\n\s+", RegexOptions.Compiled);

Right now i am using only the first regex, but it would be great to have a piece of code that solves this, without affecting the content in textareas.

huobazi
huobazi People's Republic of China
11/16/2007 12:02:37 AM #

but when my page any  contain javascript comment
such as

<script type="javascript">
// here is  a line comment.
var myComment = "a line comment";
alert(myComment);
</script>

when remove the "\n"

it was changed to

<script type="javascript">// here is  a line comment. var myComment = "a line comment"; alert(myComment);</script>

so... javascript error.

how ???

sharona
sharona Dominican Republic
3/24/2008 11:14:05 AM #

Converted to VB.net and works perfect! Thank you!!

alex
alex Mexico
4/3/2008 7:34:34 PM #

Hi,

I really would like to use this, but i have one problem and i think you are the guru so i will explain you:

I need to remove tabs, whitespaces, and line breaks but only for the text outside the tags <report></report> (this is non standard i think, but we use it on a specific platform to generate reports)

So the content:


< p align = "center" > Hello              World!!< /p >
Follows is the report :
<report>
Hello everybody the balance is         : 250.00 usd

You can use it until tomorrow

        Regards        
</report>


This should finish like this:

<p align="center">Hello World!!</p>Follows is the report:<report>
Hello everybody the balance is         : 250.00 usd

You can use it until tomorrow

        Regards        </report>


I hope you can answer me Mads or someone other with expertise.

Thank you very much!

Alex
Alex Belarus
7/17/2008 10:12:34 AM #

The idea is good and worth trying. But check twice before going live. This method ruins your ajax and javascript if you use any.

Nino
Nino Canada
9/24/2008 12:44:06 PM #

Have you been able to get this to work with AJAX? I dosen't work for postbacks

Al
Al United States
10/27/2008 2:21:02 PM #

I'll second Nino's comment: has anyone gotten a whitespace removal solution to work with ASP.NET AJAX postbacks?

Yes, there are ways to make whitespace removal work for the standard requests and be disabled for AJAX requests, but I've been unable to find a solution that will successfully trim the whitespace in AJAX postbacks.

Barry Jones
Barry Jones United States
2/8/2009 10:50:20 PM #

Thanks for the great article, it has been really useful in the sites that I have developed.

I struggled on the asp.net AJAX post-back for a while until I looked at the source code for the page and noticed that the function __doPostBack was surrounded by JavaScript comments //.

//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
    theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}
//]]>


After the white space compression has removed tabs, newlines etc the JavaScript above appears on one line. This makes the whole thing appear as a comment and therefore the reason you get problems with post backs in .Net when using AJAX.


The solution I wrote for my sites is:

    html = REGEX_BETWEEN_TAGS.Replace(html, "> <");
    html = REGEX_LINE_BREAKS.Replace(html, string.Empty);
    html = html.Replace("\r", "");
    html = html.Replace("//<![CDATA[", "");
    html = html.Replace("//]]>", "");
    html = html.Replace("\n", "");


Pingbacks and trackbacks (4)+

Comments are closed

About the author

Mads Kristensen

Mads Kristensen
Program Manager at the Microsoft Web Platform team and founder of BlogEngine.NET.

More...

Month List

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.