A whitespace removal HTTP module for ASP.NET 2.0

Oct 2, 2006

I’ve written about whitespace removal before, but I think this is the best solution so far. It is a plug n’ play HTTP module that works simply by adding the class to the App_Code folder. It uses regular expressions to identify and remove the unnecessary whitespace from the current .aspx web page. The overhead from this module is almost too insignificant to even measure.

Implementation

Download the WhitespaceModule.cs below and put it in the App_Code folder. Then add this to the web.config.

<httpModules>   <add type="WhitespaceModule" name="WhitespaceModule"/> </httpModules>

The cool thing about HTTP modules like this is that you can add them to any ASP.NET project without changing existing code. You can do even more to reduce the weight of your webpages by using a HTTP compression module that can be implememted the same way as this module.

Download

WhitespaceModule.zip (0,97 KB)

* $4.95/month ASP.NET Hosting with FREE SQL 2012 DB! – Click Here!

Comments (27) -

 Eber Irigoyen
Eber Irigoyen
10/3/2006 8:30:03 PM #

so put the .cs file in each project where you need it?

why not just create an assembly?

Mads Kristensen
Mads Kristensen
10/3/2006 8:42:10 PM #

Sure you could do that if you like that better. For these small isolated types of classes I prefer to use the App_code folder. Mainly because it allows me to customize the classes for each project if I choose. If there's no need for customization that can't be handled through properties, an assembly would probably be the right way to go.

Michel
Michel
10/4/2006 5:06:59 AM #

It's exactly what I look for. I am not smart enough with the regular expressions, but do you take care of spaces included in textarea?

 Bryan Peters
Bryan Peters
10/6/2006 2:08:59 AM #

I threw it up on a site of mine and the homepage output went from 11.71k to 9.94k.  Wow.  I had no idea whitespace took up so much... space.

Thanks for sharing!

 Manu
Manu
10/6/2006 7:29:21 AM #

Nice nice nice!
Our application has a bunch of grids (build dynamically) and details (quite hard coded). For the grids it didn't really help a lot (+- 2%), but for the details... Wow! Around 42%!
I added a configuration parameter to be able to switch it on/off.
I will now try to change if (app.Request.RawUrl.Contains(".aspx")) to if (app.Request.RawUrl.Contains(".aspx") || app.Request.RawUrl.Contains(".js")) to see what it gives with our javascript files.
Thx for sharing!

Manu.

 michael
michael
10/26/2006 4:59:01 PM #

Bad things happen with this and the ASP.NET AJAX 1.0 beta. Prolly need to tweak the RegEx but not sure what it should be.

 Sachman Bhatti
Sachman Bhatti
11/30/2006 10:29:12 PM #

Is there any way to combine this with the CompressionModule?  I tried creating a stream

Stream strWhiteSpace = new WhitespaceFilter(app.Response.Filter);

and then passing strWhiteSpace to the constructors for GZipStream and DeflateStream but I get a lot of funk in my output

Mads Kristensen
Mads Kristensen
11/30/2006 10:33:38 PM #

Yes there is. Make sure that the whitespace filter is written above the http compression module in the web.config. Also make sure that Buffer=True in either the web.config or at page level. It is true by default, so if you've changed it, you have to change it back.

 Sachman Bhatti
Sachman Bhatti
11/30/2006 10:51:01 PM #

That didn't work for me and it might have something to with the heavy amount of javascript because that's where it seems to have problems.

I did find a solution, by combining them in a different way but because I can't put HTML code here I'm going to link to the combination:

http://www.sachmanbhatti.com/compressandtrim.txt

Adrian Roman
Adrian Roman Romania
7/27/2007 12:54:48 PM #

Very nice and very useful. We have implement it on our site Blocks4.NET and it works great. The html size was reduced by more than 20%. Thank you!

Miguel
Miguel Spain
8/20/2007 6:59:24 PM #

I have found your module is very well, but i cannot use it because it has one case (that happens many times when you design) where the replacement changes the design: For example if you create one link and then another link and you leave a whitespace between them, this whitespace will disappear causing the two links to be together:

  &lt;a  href=&quot;/x.aspx&quot;&gt;first text&lt;/a&gt;
  &lt;a href=&quot;/y.aspx&quot;&gt;second text&lt;/a&gt;
The result would be: first textsecond text.

i don't know really what to change from this line in your code:
  html = reg.Replace(html, string.Empty);

perhaps:
  html = reg.Replace(html, "  ");//two spaces
In that case you save a lot of space and you don't have any incident related to this issue

Mufaddal
Mufaddal United States
8/21/2007 10:49:05 AM #

This is test comment

Usama Nada
Usama Nada Egypt
8/23/2007 9:28:25 AM #

thanks man for shring your work with the community
i will try to include your work in my own httpCompression module i was working on last week and i hope it works
good work

Mac
Mac United States
10/15/2007 3:21:15 PM #

Thanks for your solution. But i have an issue with pages that have microsoft ajax enabled. If i have scriptmanage on page and i use update panel, the above solution doesnt seem to work. It seems Regular expression need to change but i dont know how. Can u please help me with this?

Also i tried using your compression module but it has check for app.Request["HTTP_X_MICROSOFTAJAX"] so it seem that it wont work for above mentioned scenario as well. Will appreciate any help to make this work as my html page size is 640 kb(not kidding) and business wont allow me to redesign as they want everything on the same page.

mcbeev
mcbeev United States
1/8/2008 5:50:54 PM #

Very nice work, just wondering if the issues with ajax.net extensions have been looked at recently or not ?

Mike
Mike Australia
1/26/2008 12:19:58 AM #

Great work with this module.

As people above have mentioned, your regular expression does not always work with Microsoft AJAX and may mess up formatting in certain circumstances.  To overcome this, the following regular expression is safer while remaining at least 95% as effective:

private static Regex reg = new Regex(@"^\s+", RegexOptions.Multiline | RegexOptions.Compiled);

This matches whitespace from the start of the line only, which is where the vast majority appears anyway. It also removes blank lines. It leaves other whitespace alone.  As this expression is going to be applied multiple times, note the use of the RegexOptions.Compiled for efficiency.

The only places I can think of where this would not always work as desired would be in preformatted sections such as <pre> and <texarea> tags where leading whitespace is required to be displayed by the browser. If this situation does not arise in your project then I think the above expression should be 100% safe.

Daniel Clarke
Daniel Clarke United Kingdom
1/28/2008 10:53:22 PM #

Mads,
Great module, very easy to implement.  I can get a huge reduction - from 71k to 62k, or from 107k to 80k.  Thanks.

I too had the formatting issue.


Mike from Australia,
Thanks for the new regex.  It fixes the formatting issue.  The performace is this:  Mads' regex: 71k to 62k.  Yours: 71k to 63k.  Very comprable, but with better formatting.



Thanks all.

Daniel Clarke

Mike
Mike Australia
1/29/2008 11:53:04 AM #

Hi again Daniel,

I noticed that the it still does not work 100% with MS AJAX partial rendering.  An easy way around this is to change the context_BeginRequest method to the following:

void context_BeginRequest(object sender, EventArgs e)
{
  HttpApplication app = sender as HttpApplication;
  if(app.Request.RawUrl.Contains(".aspx") && app.Request.Headers["X-MicrosoftAjax"] != "Delta=true")
  {
    app.Response.Filter = new WhitespaceFilter(app.Response.Filter);
  }
}

It just checks if the MS AJAX is set to Delta=true, and if so, does not strip any whitespace.  It would be good to know what exactly is breaking it.  I looked for ten minutes before giving up - maybe someone smarter than me can work out how to strip whitespace from a delta update.

Malek chtioui
Malek chtioui Tunisia
5/13/2008 5:50:06 PM #

hello,
Great,
But just not working with ajax postbacks and partial rendering.

JC
JC United States
5/27/2008 2:10:15 PM #

Here is the VB version


#Region "Using"

Imports System
Imports System.IO
Imports System.Web
Imports System.IO.Compression
Imports System.Text.RegularExpressions

#End Region

''' <summary>
''' Removes whitespace from the webpage.
''' </summary>
Public Class WhitespaceModule
    Implements IHttpModule

#Region "IHttpModule Members"

    Private Sub Dispose() Implements IHttpModule.Dispose
        ' Nothing to dispose;
    End Sub

    Private Sub Init(ByVal context As HttpApplication) Implements IHttpModule.Init
        AddHandler context.BeginRequest, AddressOf context_BeginRequest
    End Sub

#End Region

    Private Sub context_BeginRequest(ByVal sender As Object, ByVal e As EventArgs)
        Dim app As HttpApplication = TryCast(sender, HttpApplication)
        If app.Request.RawUrl.Contains(".aspx") Then
            app.Response.Filter = New WhitespaceFilter(app.Response.Filter)
        End If
    End Sub

#Region "Stream filter"

    Private Class WhitespaceFilter
        Inherits Stream

        Public Sub New(ByVal sink As Stream)
            _sink = sink
        End Sub

        Private _sink As Stream
        Private Shared reg As New Regex("(?<=[^])\t{2,}|(?<=[>])\s{2,}(?=[<])|(?<=[>])\s{2,11}(?=[<])|(?=[\n])\s{2,}")

#Region "Properites"

        Public Overloads Overrides ReadOnly Property CanRead() As Boolean
            Get
                Return True
            End Get
        End Property

        Public Overloads Overrides ReadOnly Property CanSeek() As Boolean
            Get
                Return True
            End Get
        End Property

        Public Overloads Overrides ReadOnly Property CanWrite() As Boolean
            Get
                Return True
            End Get
        End Property

        Public Overloads Overrides Sub Flush()
            _sink.Flush()
        End Sub

        Public Overloads Overrides ReadOnly Property Length() As Long
            Get
                Return 0
            End Get
        End Property

        Private _position As Long
        Public Overloads Overrides Property Position() As Long
            Get
                Return _position
            End Get
            Set(ByVal value As Long)
                _position = value
            End Set
        End Property

#End Region

#Region "Methods"

        Public Overloads Overrides Function Read(ByVal buffer As Byte(), ByVal offset As Integer, ByVal count As Integer) As Integer
            Return _sink.Read(buffer, offset, count)
        End Function

        Public Overloads Overrides Function Seek(ByVal offset As Long, ByVal origin As SeekOrigin) As Long
            Return _sink.Seek(offset, origin)
        End Function

        Public Overloads Overrides Sub SetLength(ByVal value As Long)
            _sink.SetLength(value)
        End Sub

        Public Overloads Overrides Sub Close()
            _sink.Close()
        End Sub

        Public Overloads Overrides Sub Write(ByVal buffer As Byte(), ByVal offset As Integer, ByVal count As Integer)
            Dim data As Byte() = New Byte(count - 1) {}
            System.Buffer.BlockCopy(buffer, offset, data, 0, count)
            Dim html As String = System.Text.Encoding.[Default].GetString(buffer)

            html = reg.Replace(html, String.Empty)

            Dim outdata As Byte() = System.Text.Encoding.[Default].GetBytes(html)
            _sink.Write(outdata, 0, outdata.GetLength(0))
        End Sub

#End Region

    End Class

#End Region

End Class

Vale
Vale
11/16/2008 3:08:07 PM #

No working with AJAX
regex exclude script from html
It is regex start and end contain script tag
<script[^>]*>[\w|\t|\r|\W]*?</script>
the inverse to get just the html script tag exlcude
?????

Vale
Vale
11/17/2008 4:36:26 PM #

I share my solution it allows to eliminate spaces in white and return of you line in .NET 2.0 with AJAX 1.0

protected override void Render(HtmlTextWriter writer)
        {
            if (this.Request.Headers["X-MicrosoftAjax"] != "Delta=true")
            {
                Regex reg = new Regex(@"<script[^>]*>[\w|\t|\r|\W]*?</script>");
                StringBuilder sb = new StringBuilder();
                StringWriter sw = new StringWriter(sb);
                HtmlTextWriter hw = new HtmlTextWriter(sw);
                base.Render(hw);
                string html = sb.ToString();
                MatchCollection mymatch = reg.Matches(html);
                html = reg.Replace(html, string.Empty);
                reg = new Regex(@"(?<=[^])\t{2,}|(?<=[>])\s{2,}(?=[<])|(?<=[>])\s{2,11}(?=[<])|(?=[\n])\s{2,}|(?=[\r])\s{2,}");
                html = reg.Replace(html, string.Empty);                
                reg = new Regex(@"</body>");
                string str = string.Empty;
                foreach (Match match in mymatch)
                {
                    str += match.ToString();
                }
                html = reg.Replace(html, str + "</body>");
                writer.Write(html);
            }
            else
                base.Render(writer);
        }

Vesa Vainio
Vesa Vainio Finland
1/13/2009 1:30:32 PM #

I find it a bit of a problem that the decision about using the filter is made already in BeginRequest. At that point of the page life cycle the application logic has not yet had a chance to set the Response.ContentType to match the real content type of the response.

I have a situation where the user can load PDF files from .aspx pages, and the PDF files were broken by this filtering.

I solved my issue by setting the Response.Filter = null in the Page_Load event of the pages concerned (and only for postbacks).

However, a preferable solution would be to be able to set the Response.Filter only after most of the page life cycle has been completed, but before actual render (or direct writes to Response) is done.

Any ideas on how to neatly solve this?

I do have a custom super class for all of my pages in this application and probably I could manage it there, but that solution cannot be part of this module itself.

Cheers,
Vesa

Vesa Vainio
Vesa Vainio Finland
2/13/2009 8:11:04 AM #

The original regex has the formatting issue. The one posted by Mike doesn't. I mean this one:

private static Regex reg = new Regex(@"^\s+", RegexOptions.Multiline | RegexOptions.Compiled);

However, this does not work entirely. When a page of any reasonable size gets written through the module, there will be several calls to Write. In each call the regex is applied separately. And each time Write gets called, the start and end of the buffer can be at any point in the HTML code. Now imagine that somewhere in the middle you get a call where the beginning of the buffer happens to be in the middle of a tag and the first character just happens to be a space. That space gets trimmed away!

I happened to notice this in a case where a nice <span class="approved"> loses the space and becomes <spanclass="approved"> and breaks formatting on the page.

I verified by debugging that this is really so.

So basically, if you use this thing, you will sometimes get all kinds of funny errors in your pages. They will be rare, but they WILL happen. I'm not sure if the original regex has this problem or not.

I think the solution might be to look for the last line feed in the buffer and cut the buffer so that the regex is only applied to whole lines. Then save the unused tail part to be used in the next call to Write. And then finally write out the last saved data on a Flush.

I don't think this splitting and joining buffers would increase the overhead noticeably. I will probably code this thing when I have some extra time...

Dalibor
Dalibor Canada
3/16/2009 9:15:07 PM #

Can this be used for specific pages only, so instead of using the HttpModule, to use the HttpHandler? If so, how could I do this and what modifications do I need to make?

Krishan Murari
Krishan Murari India
4/3/2009 2:10:53 AM #

Thanks for the great article.

do you have any alternative or solution for removing the extra spaces for pages created with 1.1 framework and visual studio 2003?

John Grinder
John Grinder Germany
4/9/2009 2:43:24 AM #

Hi Mad,

one question regarding this module:
does it also remove the linebreaks in ASP.NET 2.0 output?

Would be cook, if you could drop me a line!

Regards

John

Pingbacks and trackbacks (5)+

Comments are closed

About the author

Mads Kristensen

Mads Kristensen
Program Manager at the Microsoft Web Platform team and founder of BlogEngine.NET.

More...

Month List

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.