Skip to content
August 12, 2010 / lawrencebarsanti

ColdFusion sanitize HTML

Jeff Atwood posted a white-list approach to sanitizing HTML output on RefactorMyCode (http://refactormycode.com/codes/333-sanitize-html). I ported the code to ColdFusion and decided to share it with the community.

<!--- ORIGINAL CODESNIPPET:4100A61A-1711-4366-B0B0-144D1179A937 --->
<cfcomponent>
	<cfset variables.reTags = '<[^>]*(>|$)'>
	<cfset variables.reWhitelist = '(?x) ^</?(b(lockquote)?|code|d(d|t|l|el)|em|h(1|2|3)|i|kbd|li|ol|p(re)?|s(ub|up|trong|trike)?|ul)>$ | ^<(b|h)r\s?/?>$'>
	<cfset variables.reWhitelistLinks = '(?x) ^<a\s href="(\##\d+|(https?|ftp)://[-a-z0-9+&@##/%?=~_|!:,.;\(\)]+)" (\stitle="[^"<>]+")?\s?>$ | ^</a>$'>
	<cfset variables.reWhitelistImages = '(?x) ^<img\s src="https?://[-a-z0-9+&@##/%?=~_|!:,.;\(\)]+" (\swidth="\d{1,3}")? (\sheight="\d{1,3}")? (\salt="[^"<>]*")? (\stitle="[^"<>]*")? \s?/?>$'>
		
	<cffunction name="findAll" returntype="array" output="no" access="private">
		<cfargument name="regex" type="string" required="yes">
		<cfargument name="text" type="string" required="yes">

		<cfset var L = structNew()>
		<cfset L.result = []>
		<cfset L.offset = 1>		
		<cfloop condition="true">
			<cfset L.match = reFind(arguments.regex, arguments.text, L.offset, true)>

			<cfif L.match.len[1] GT 0>
				<cfset L.details = {
					text = Mid(arguments.text, L.match.pos[1], L.match.len[1]),
					index = L.match.pos[1],
					length = L.match.len[1]
				}>		
				<cfset arrayAppend(L.result, L.details)>
				<cfset L.offset = L.details.index + L.details.length>
			<cfelse>
				<cfbreak>
			</cfif>
		</cfloop>
		<cfreturn L.result>
	</cffunction>
	
	<cffunction name="sanatize" output="no" returntype="string" access="public">
		<cfargument name="html" required="yes" type="string">
		<cfset var L = structNew()>
		<cfset L.result = arguments.html>		
		<cfif len(arguments.html) GT 0>
			<cfset L.tags = findAll(variables.reTags, arguments.html)>
			<cfloop from='#ArrayLen(L.tags)#' to='1' index='L.i' step='-1'>
				<cfset L.tagname = lcase(L.tags[L.i].text)>
				<cfset L.allowTag = reFind(variables.reWhitelist, L.tagname) GT 0
												OR reFind(variables.reWhitelistLinks, L.tagname) GT 0
												OR reFind(variables.reWhitelistImages, L.tagname) GT 0>
				<cfif NOT L.allowTag>
					<cfset L.result = RemoveChars(L.result, L.tags[L.i].index, L.tags[L.i].length)>
				</cfif>
			</cfloop>
		</cfif>
		<cfreturn L.result>
	</cffunction>
</cfcomponent>
Advertisements

4 Comments

Leave a Comment
  1. Mohamad / Dec 10 2010 11:48 pm

    This is great, Lawrence. Thanks for giving it a go. Can I ask you where the Regex argument comes from?

    I’m not really sure if I understand the code correctly. I am under the impression that a single function will handle the sanitation process. All do have to do is pass it a string. Your component has a couple of functions and I’m not sure what the first one does, or where the “regex” argument is supposed to come from… can you pleas clarify some of this stuff?

    I would appreciate it.

    Thanks again!

  2. lawrencebarsanti / Dec 11 2010 2:23 pm

    Mohamad,
    If you save the code in a file called Sanitizer.cfc then you can use it as follows.

    <cfset sanatizer = CreateObject(“component”, “Sanitizer”);>
    <cfset safe = sanitizer.sanatize(“some unsafe string”);>

    • Mohamad / Dec 11 2010 6:48 pm

      Lawrence, I got it to work really nice. Many thanks–I really needed something identical and this was perfect.

  3. prabhakar pittala / Oct 17 2012 2:37 am

    Prabhakar, great its working to me . Many thanks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: