URL Filtering: make text url-safe

June 25, 2013 — Leave a comment

url filtering: #83 filtered flareURL filtering is important: everyone from marketers to search engines loves nice, text-rich URLs. If you want to create keyword-rich URLs based on your dynamic content then you need to clean and filter the source text first. Here is a handy function that will generate text very much like WordPress slugs… You can use this pretty much anywhere. I’ve used it mostly in tagging systems, and in code that creates permalinks based on post titles or taxonomy terms.

The code

function filter($value, $force_lower_case = true)
   if ($force_lower_case) {
	$value = strtolower($value);
   // remove everything except A-Z, a-z, 0-9, hyphens, and whitespace
   $value = preg_replace( "/[^a-zA-Z0-9-\s]/", '', $value );
   // convert whitespaces to hyphens  
   $value = preg_replace( "/\s/", '-', $value );
   // replace multiple hyphens with a single hyphen  
   $value = preg_replace( "/[-]+/", '-', $value ); 
   return $value;

Note: this is a white list solution. In other words it tightly controls the characters that are allowed to pass the url filtering. In my experience this is a secure approach, though it will by default not allow some characters that are valid in URLS. I believe the solution here isa good basic one, but if you do want to allow other characters you should do some research before adding them to the white list.
Bear in mind that the three regular expressions here will have an overhead of performance cost. Because of the regular expressions, this code is best suited for running once when writing to the database, and is not ideal for running on (for example) each page load.

If you want to do faster URL filtering, this code could be reduced to 2 (or possibly even 1) preg_replace statements. I think the second preg_replace could alternatively be converted to a str_replace statement, which should be a little faster. See some qualification of the speed differences here, but note that the tests there are for 100,000 iterations. A single pass (or up to a few dozen) is not likely to be noticeably different.

To measure the actual speed difference, try my GLPTimer php performance script.

If you have any comments or questions about this code please use the comment form. Cheers!

No Comments

Be the first to start the conversation.

Leave a Reply


Text formatting is available via select HTML.

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>