ColdFusion sanitize HTML
August 12, 2010
Jeff Atwood posted a white-list approach to sanitizing HTML output on RefactorMyCode (http://refactormycode.com/codes/333-sanitize-html). I ported the code to ColdFusion and decided to share it with the community.
<!--- ORIGINAL CODESNIPPET:4100A61A-1711-4366-B0B0-144D1179A937 --->
<cfcomponent>
<cfset variables.reTags = '<[^>]*(>|$)'>
<cfset variables.reWhitelist = '(?x) ^</?(b(lockquote)?|code|d(d|t|l|el)|em|h(1|2|3)|i|kbd|li|ol|p(re)?|s(ub|up|trong|trike)?|ul)>$ | ^<(b|h)r\s?/?>$'>
<cfset variables.reWhitelistLinks = '(?x) ^<a\s href="(\##\d+|(https?|ftp)://[-a-z0-9+&@##/%?=~_|!:,.;\(\)]+)" (\stitle="[^"<>]+")?\s?>$ | ^</a>$'>
<cfset variables.reWhitelistImages = '(?x) ^<img\s src="https?://[-a-z0-9+&@##/%?=~_|!:,.;\(\)]+" (\swidth="\d{1,3}")? (\sheight="\d{1,3}")? (\salt="[^"<>]*")? (\stitle="[^"<>]*")? \s?/?>$'>
<cffunction name="findAll" returntype="array" output="no" access="private">
<cfargument name="regex" type="string" required="yes">
<cfargument name="text" type="string" required="yes">
<cfset var L = structNew()>
<cfset L.result = []>
<cfset L.offset = 1>
<cfloop condition="true">
<cfset L.match = reFind(arguments.regex, arguments.text, L.offset, true)>
<cfif L.match.len[1] GT 0>
<cfset L.details = {
text = Mid(arguments.text, L.match.pos[1], L.match.len[1]),
index = L.match.pos[1],
length = L.match.len[1]
}>
<cfset arrayAppend(L.result, L.details)>
<cfset L.offset = L.details.index + L.details.length>
<cfelse>
<cfbreak>
</cfif>
</cfloop>
<cfreturn L.result>
</cffunction>
<cffunction name="sanatize" output="no" returntype="string" access="public">
<cfargument name="html" required="yes" type="string">
<cfset var L = structNew()>
<cfset L.result = arguments.html>
<cfif len(arguments.html) GT 0>
<cfset L.tags = findAll(variables.reTags, arguments.html)>
<cfloop from='#ArrayLen(L.tags)#' to='1' index='L.i' step='-1'>
<cfset L.tagname = lcase(L.tags[L.i].text)>
<cfset L.allowTag = reFind(variables.reWhitelist, L.tagname) GT 0
OR reFind(variables.reWhitelistLinks, L.tagname) GT 0
OR reFind(variables.reWhitelistImages, L.tagname) GT 0>
<cfif NOT L.allowTag>
<cfset L.result = RemoveChars(L.result, L.tags[L.i].index, L.tags[L.i].length)>
</cfif>
</cfloop>
</cfif>
<cfreturn L.result>
</cffunction>
</cfcomponent>
Advertisement
December 10, 2010 at 11:48 pm
This is great, Lawrence. Thanks for giving it a go. Can I ask you where the Regex argument comes from?
I’m not really sure if I understand the code correctly. I am under the impression that a single function will handle the sanitation process. All do have to do is pass it a string. Your component has a couple of functions and I’m not sure what the first one does, or where the “regex” argument is supposed to come from… can you pleas clarify some of this stuff?
I would appreciate it.
Thanks again!
December 11, 2010 at 2:23 pm
Mohamad,
If you save the code in a file called Sanitizer.cfc then you can use it as follows.
<cfset sanatizer = CreateObject(“component”, “Sanitizer”);>
<cfset safe = sanitizer.sanatize(“some unsafe string”);>
December 11, 2010 at 6:48 pm
Lawrence, I got it to work really nice. Many thanks–I really needed something identical and this was perfect.