A whitespace removal HTTP module for ASP.NET 2.0

by Mads Kristensen 3. October 2006 06:00

I’ve written about whitespace removal before, but I think this is the best solution so far. It is a plug n’ play HTTP module that works simply by adding the class to the App_Code folder. It uses regular expressions to identify and remove the unnecessary whitespace from the current .aspx web page. The overhead from this module is almost too insignificant to even measure.

Implementation

Download the WhitespaceModule.cs below and put it in the App_Code folder. Then add this to the web.config.

<httpModules>   <add type="WhitespaceModule" name="WhitespaceModule"/> </httpModules>

The cool thing about HTTP modules like this is that you can add them to any ASP.NET project without changing existing code. You can do even more to reduce the weight of your webpages by using a HTTP compression module that can be implememted the same way as this module.

Download

WhitespaceModule.zip (0,97 KB)

* Only $4.95/month ASP.NET & Windows 2008 + IIS 7 Hosting! FREE SQL Included

Tags:

ASP.NET

Comments

10/4/2006 5:30:03 AM #

 Eber Irigoyen

so put the .cs file in each project where you need it?

why not just create an assembly?

Eber Irigoyen |

10/4/2006 5:42:10 AM #

Mads Kristensen

Sure you could do that if you like that better. For these small isolated types of classes I prefer to use the App_code folder. Mainly because it allows me to customize the classes for each project if I choose. If there's no need for customization that can't be handled through properties, an assembly would probably be the right way to go.

Mads Kristensen |

10/4/2006 2:06:59 PM #

Michel

It's exactly what I look for. I am not smart enough with the regular expressions, but do you take care of spaces included in textarea?

Michel |

10/6/2006 11:08:59 AM #

 Bryan Peters

I threw it up on a site of mine and the homepage output went from 11.71k to 9.94k.  Wow.  I had no idea whitespace took up so much... space.

Thanks for sharing!

Bryan Peters |

10/6/2006 4:29:21 PM #

 Manu

Nice nice nice!
Our application has a bunch of grids (build dynamically) and details (quite hard coded). For the grids it didn't really help a lot (+- 2%), but for the details... Wow! Around 42%!
I added a configuration parameter to be able to switch it on/off.
I will now try to change if (app.Request.RawUrl.Contains(".aspx")) to if (app.Request.RawUrl.Contains(".aspx") || app.Request.RawUrl.Contains(".js")) to see what it gives with our javascript files.
Thx for sharing!

Manu.

Manu |

10/27/2006 1:59:01 AM #

 michael

Bad things happen with this and the ASP.NET AJAX 1.0 beta. Prolly need to tweak the RegEx but not sure what it should be.

michael |

12/1/2006 7:29:12 AM #

 Sachman Bhatti

Is there any way to combine this with the CompressionModule?  I tried creating a stream

Stream strWhiteSpace = new WhitespaceFilter(app.Response.Filter);

and then passing strWhiteSpace to the constructors for GZipStream and DeflateStream but I get a lot of funk in my output

Sachman Bhatti |

12/1/2006 7:33:38 AM #

Mads Kristensen

Yes there is. Make sure that the whitespace filter is written above the http compression module in the web.config. Also make sure that Buffer=True in either the web.config or at page level. It is true by default, so if you've changed it, you have to change it back.

Mads Kristensen |

12/1/2006 7:51:01 AM #

 Sachman Bhatti

That didn't work for me and it might have something to with the heavy amount of javascript because that's where it seems to have problems.

I did find a solution, by combining them in a different way but because I can't put HTML code here I'm going to link to the combination:

http://www.sachmanbhatti.com/compressandtrim.txt

Sachman Bhatti |

7/27/2007 9:54:48 PM #

Adrian Roman

Very nice and very useful. We have implement it on our site Blocks4.NET and it works great. The html size was reduced by more than 20%. Thank you!

Adrian Roman Romania |

8/21/2007 3:59:24 AM #

Miguel

I have found your module is very well, but i cannot use it because it has one case (that happens many times when you design) where the replacement changes the design: For example if you create one link and then another link and you leave a whitespace between them, this whitespace will disappear causing the two links to be together:

  &lt;a  href=&quot;/x.aspx&quot;&gt;first text&lt;/a&gt;
  &lt;a href=&quot;/y.aspx&quot;&gt;second text&lt;/a&gt;
The result would be: first textsecond text.

i don't know really what to change from this line in your code:
  html = reg.Replace(html, string.Empty);

perhaps:
  html = reg.Replace(html, "  ");//two spaces
In that case you save a lot of space and you don't have any incident related to this issue

Miguel Spain |

8/21/2007 7:49:05 PM #

Mufaddal

This is test comment

Mufaddal United States |

8/23/2007 6:28:25 PM #

Usama Nada

thanks man for shring your work with the community
i will try to include your work in my own httpCompression module i was working on last week and i hope it works
good work

Usama Nada Egypt |

10/16/2007 12:21:15 AM #

Mac

Thanks for your solution. But i have an issue with pages that have microsoft ajax enabled. If i have scriptmanage on page and i use update panel, the above solution doesnt seem to work. It seems Regular expression need to change but i dont know how. Can u please help me with this?

Also i tried using your compression module but it has check for app.Request["HTTP_X_MICROSOFTAJAX"] so it seem that it wont work for above mentioned scenario as well. Will appreciate any help to make this work as my html page size is 640 kb(not kidding) and business wont allow me to redesign as they want everything on the same page.

Mac United States |

1/9/2008 2:50:54 AM #

mcbeev

Very nice work, just wondering if the issues with ajax.net extensions have been looked at recently or not ?

mcbeev United States |

1/26/2008 9:19:58 AM #

Mike

Great work with this module.

As people above have mentioned, your regular expression does not always work with Microsoft AJAX and may mess up formatting in certain circumstances.  To overcome this, the following regular expression is safer while remaining at least 95% as effective:

private static Regex reg = new Regex(@"^\s+", RegexOptions.Multiline | RegexOptions.Compiled);

This matches whitespace from the start of the line only, which is where the vast majority appears anyway. It also removes blank lines. It leaves other whitespace alone.  As this expression is going to be applied multiple times, note the use of the RegexOptions.Compiled for efficiency.

The only places I can think of where this would not always work as desired would be in preformatted sections such as <pre> and <texarea> tags where leading whitespace is required to be displayed by the browser. If this situation does not arise in your project then I think the above expression should be 100% safe.

Mike Australia |

1/29/2008 7:53:22 AM #

Daniel Clarke

Mads,
Great module, very easy to implement.  I can get a huge reduction - from 71k to 62k, or from 107k to 80k.  Thanks.

I too had the formatting issue.


Mike from Australia,
Thanks for the new regex.  It fixes the formatting issue.  The performace is this:  Mads' regex: 71k to 62k.  Yours: 71k to 63k.  Very comprable, but with better formatting.



Thanks all.

Daniel Clarke

Daniel Clarke United Kingdom |

1/29/2008 8:53:04 PM #

Mike

Hi again Daniel,

I noticed that the it still does not work 100% with MS AJAX partial rendering.  An easy way around this is to change the context_BeginRequest method to the following:

void context_BeginRequest(object sender, EventArgs e)
{
  HttpApplication app = sender as HttpApplication;
  if(app.Request.RawUrl.Contains(".aspx") && app.Request.Headers["X-MicrosoftAjax"] != "Delta=true")
  {
    app.Response.Filter = new WhitespaceFilter(app.Response.Filter);
  }
}

It just checks if the MS AJAX is set to Delta=true, and if so, does not strip any whitespace.  It would be good to know what exactly is breaking it.  I looked for ten minutes before giving up - maybe someone smarter than me can work out how to strip whitespace from a delta update.

Mike Australia |

4/24/2008 4:40:11 AM #

pingback

Pingback from pimp.webdevelopernews.com

Removing Whitespace From Your Pages With ASP.NET

pimp.webdevelopernews.com |

5/14/2008 2:50:06 AM #

Malek chtioui

hello,
Great,
But just not working with ajax postbacks and partial rendering.

Malek chtioui Tunisia |

5/27/2008 11:10:15 PM #

JC

Here is the VB version


#Region "Using"

Imports System
Imports System.IO
Imports System.Web
Imports System.IO.Compression
Imports System.Text.RegularExpressions

#End Region

''' <summary>
''' Removes whitespace from the webpage.
''' </summary>
Public Class WhitespaceModule
    Implements IHttpModule

#Region "IHttpModule Members"

    Private Sub Dispose() Implements IHttpModule.Dispose
        ' Nothing to dispose;
    End Sub

    Private Sub Init(ByVal context As HttpApplication) Implements IHttpModule.Init
        AddHandler context.BeginRequest, AddressOf context_BeginRequest
    End Sub

#End Region

    Private Sub context_BeginRequest(ByVal sender As Object, ByVal e As EventArgs)
        Dim app As HttpApplication = TryCast(sender, HttpApplication)
        If app.Request.RawUrl.Contains(".aspx") Then
            app.Response.Filter = New WhitespaceFilter(app.Response.Filter)
        End If
    End Sub

#Region "Stream filter"

    Private Class WhitespaceFilter
        Inherits Stream

        Public Sub New(ByVal sink As Stream)
            _sink = sink
        End Sub

        Private _sink As Stream
        Private Shared reg As New Regex("(?<=[^])\t{2,}|(?<=[>])\s{2,}(?=[<])|(?<=[>])\s{2,11}(?=[<])|(?=[\n])\s{2,}")

#Region "Properites"

        Public Overloads Overrides ReadOnly Property CanRead() As Boolean
            Get
                Return True
            End Get
        End Property

        Public Overloads Overrides ReadOnly Property CanSeek() As Boolean
            Get
                Return True
            End Get
        End Property

        Public Overloads Overrides ReadOnly Property CanWrite() As Boolean
            Get
                Return True
            End Get
        End Property

        Public Overloads Overrides Sub Flush()
            _sink.Flush()
        End Sub

        Public Overloads Overrides ReadOnly Property Length() As Long
            Get
                Return 0
            End Get
        End Property

        Private _position As Long
        Public Overloads Overrides Property Position() As Long
            Get
                Return _position
            End Get
            Set(ByVal value As Long)
                _position = value
            End Set
        End Property

#End Region

#Region "Methods"

        Public Overloads Overrides Function Read(ByVal buffer As Byte(), ByVal offset As Integer, ByVal count As Integer) As Integer
            Return _sink.Read(buffer, offset, count)
        End Function

        Public Overloads Overrides Function Seek(ByVal offset As Long, ByVal origin As SeekOrigin) As Long
            Return _sink.Seek(offset, origin)
        End Function

        Public Overloads Overrides Sub SetLength(ByVal value As Long)
            _sink.SetLength(value)
        End Sub

        Public Overloads Overrides Sub Close()
            _sink.Close()
        End Sub

        Public Overloads Overrides Sub Write(ByVal buffer As Byte(), ByVal offset As Integer, ByVal count As Integer)
            Dim data As Byte() = New Byte(count - 1) {}
            System.Buffer.BlockCopy(buffer, offset, data, 0, count)
            Dim html As String = System.Text.Encoding.[Default].GetString(buffer)

            html = reg.Replace(html, String.Empty)

            Dim outdata As Byte() = System.Text.Encoding.[Default].GetBytes(html)
            _sink.Write(outdata, 0, outdata.GetLength(0))
        End Sub

#End Region

    End Class

#End Region

End Class

JC United States |

11/17/2008 12:08:07 AM #

Vale

No working with AJAX
regex exclude script from html
It is regex start and end contain script tag
<script[^>]*>[\w|\t|\r|\W]*?</script>
the inverse to get just the html script tag exlcude
?????

Vale |

11/18/2008 1:36:26 AM #

Vale

I share my solution it allows to eliminate spaces in white and return of you line in .NET 2.0 with AJAX 1.0

protected override void Render(HtmlTextWriter writer)
        {
            if (this.Request.Headers["X-MicrosoftAjax"] != "Delta=true")
            {
                Regex reg = new Regex(@"<script[^>]*>[\w|\t|\r|\W]*?</script>");
                StringBuilder sb = new StringBuilder();
                StringWriter sw = new StringWriter(sb);
                HtmlTextWriter hw = new HtmlTextWriter(sw);
                base.Render(hw);
                string html = sb.ToString();
                MatchCollection mymatch = reg.Matches(html);
                html = reg.Replace(html, string.Empty);
                reg = new Regex(@"(?<=[^])\t{2,}|(?<=[>])\s{2,}(?=[<])|(?<=[>])\s{2,11}(?=[<])|(?=[\n])\s{2,}|(?=[\r])\s{2,}");
                html = reg.Replace(html, string.Empty);                
                reg = new Regex(@"</body>");
                string str = string.Empty;
                foreach (Match match in mymatch)
                {
                    str += match.ToString();
                }
                html = reg.Replace(html, str + "</body>");
                writer.Write(html);
            }
            else
                base.Render(writer);
        }

Vale |

1/7/2009 1:44:34 AM #

pingback

Pingback from adrianroman.ro

  ASP.NET 2.0: Site optimization using HTTP Modules (Part 2 - Whitespace removal HTTP Module) by Adrian Roman

adrianroman.ro |

1/13/2009 10:30:32 PM #

Vesa Vainio

I find it a bit of a problem that the decision about using the filter is made already in BeginRequest. At that point of the page life cycle the application logic has not yet had a chance to set the Response.ContentType to match the real content type of the response.

I have a situation where the user can load PDF files from .aspx pages, and the PDF files were broken by this filtering.

I solved my issue by setting the Response.Filter = null in the Page_Load event of the pages concerned (and only for postbacks).

However, a preferable solution would be to be able to set the Response.Filter only after most of the page life cycle has been completed, but before actual render (or direct writes to Response) is done.

Any ideas on how to neatly solve this?

I do have a custom super class for all of my pages in this application and probably I could manage it there, but that solution cannot be part of this module itself.

Cheers,
Vesa

Vesa Vainio Finland |

2/13/2009 5:11:04 PM #

Vesa Vainio

The original regex has the formatting issue. The one posted by Mike doesn't. I mean this one:

private static Regex reg = new Regex(@"^\s+", RegexOptions.Multiline | RegexOptions.Compiled);

However, this does not work entirely. When a page of any reasonable size gets written through the module, there will be several calls to Write. In each call the regex is applied separately. And each time Write gets called, the start and end of the buffer can be at any point in the HTML code. Now imagine that somewhere in the middle you get a call where the beginning of the buffer happens to be in the middle of a tag and the first character just happens to be a space. That space gets trimmed away!

I happened to notice this in a case where a nice <span class="approved"> loses the space and becomes <spanclass="approved"> and breaks formatting on the page.

I verified by debugging that this is really so.

So basically, if you use this thing, you will sometimes get all kinds of funny errors in your pages. They will be rare, but they WILL happen. I'm not sure if the original regex has this problem or not.

I think the solution might be to look for the last line feed in the buffer and cut the buffer so that the regex is only applied to whole lines. Then save the unused tail part to be used in the next call to Write. And then finally write out the last saved data on a Flush.

I don't think this splitting and joining buffers would increase the overhead noticeably. I will probably code this thing when I have some extra time...

Vesa Vainio Finland |

3/17/2009 6:15:07 AM #

Dalibor

Can this be used for specific pages only, so instead of using the HttpModule, to use the HttpHandler? If so, how could I do this and what modifications do I need to make?

Dalibor Canada |

4/3/2009 11:10:53 AM #

Krishan Murari

Thanks for the great article.

do you have any alternative or solution for removing the extra spaces for pages created with 1.1 framework and visual studio 2003?

Krishan Murari India |

4/9/2009 11:43:24 AM #

John Grinder

Hi Mad,

one question regarding this module:
does it also remove the linebreaks in ASP.NET 2.0 output?

Would be cook, if you could drop me a line!

Regards

John

John Grinder Germany |

5/7/2009 11:56:16 AM #

trackback

Realizzare un ActionResult per ottimizzare le nostre pagine web.

Realizzare un ActionResult per ottimizzare le nostre pagine web.

Il blog di ugo lattanzi |

7/10/2010 8:40:01 PM #

pingback

Pingback from blog.josemanuelperez.es

Website optimization checklist | JMPerez Blog

blog.josemanuelperez.es |

Comments are closed

About the slave

Mads Kristensen Mads Kristensen
Web developer at ZYB and founder of BlogEngine.NET. More...

LinkedIn ZYB Facebook Last.fm Twitter View Mads Kristensen's profile on Technorati

The Lounge

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008