Building a Complete CodeIgniter Application: Part 3

Jim O'Halloran • October 24, 2007

php codeigniter feedignition

I left you at the end of part 2 with the news that there was a large security hole in the work we'd done so far. Readers who've done a bit of web development in the past should recognise the vulnerability as cross site scripting (XSS) and might understand the problems XSS can create. In this part I want to discuss some common security problems, and the steps we need to take to eliminate those.

Understand that security is not a product but a process. We can't buy security, we can't develop our code and "bolt on" some security later. Effective security needs to be built into the product/project from the time it's first written, and ongoing care and attention needs to be paid to making sure that every new line of code doesn't compromise our security in some way. If you need evidence of that, there's any number of Open Source CMS or forum products out there which were put together and released, and have struggled for many many releases (often with little success) to properly secure themselves against attack. Security in the applications we write is the result of education, awareness, care and attention to detail in every piece of code we write, secure code should be the result of the process we use to write our code, not an afterthought.

In the first two parts I've done a couple of things already which were security related, so lets first loop back and explain what we did and why. Then settle in while I explain cross site scripting (XSS) and we look at the HTML Purifier tool then apply it to the problem at hand. Finally I'll talk about handling user logins and secure storage of passwords.

Source Code Leakage / Direct Access to Included Files

Anything that's placed in underneath the document root on your web server will be accessible to all users of that web server (unless you employ some sort of .htaccess style login controls). This means that all fo your PHP scripts, images, etc can be accessed by a user with a standard web browser, if they know (or can guess) the correct URL. If the file has a .php extenstion the server will treat it as a PHP script and run it through the PHP interpreter first then return the results to the browser. If the PHP script in question was an include file this might lead to unexpected results and possibly errors. If your include files had a .inc extension instead of .php (some developers prefer it that way), the source code itself would be returned to the browser on most PHP installations. CodeIgniter's conventions prevent you from using the .inc extension on your controllers, models, views, libraries, and helpers but it's worth understanding the risk because some 3rd party libraries you might want to hook into CodeIgniter might still use it. Either case (the direct execution of include files, or source code leakage) is undesirable. How do we avoid that? There's two things we can do:

Most CI code modules include a test to ensure the script wasn't called directly, that's the "if (!defined('BASEPATH')) exit('No direct script access allowed');" line at the top of the file. Include this in each of the modules you create and you're mostly there.
The most effective remedy is to store your include files outside the web server's document root. With CI, the entire application is an include ( directly or indirectly) from the index.php file. This emans the "system" directory can be kept out side of the web server's document root, leaving just the index.php file exposed to the user's browser.

Back in part 1, we moved the system directly and edited index.php to allow us to create a situation where the html directory represented our document root, and the system folder was outside it, and therefore not web accessible. You can also use the same idea for protecting images, store them outside the document root and use a PHP script to control access to the image data as required. I described this technique in an earlier blog post.

The major advantage of the second approach is that if a bug is found in the PHP interpreter that causes it to return source code, or a server admin screws up the Apache configuration so that the PHP interpreter no longer works (very easy to do), the files outside the document root remain inaccessible, and an "attacker" can't access their content via specially crafted URLs.

SQL Injection

Together I would describe SQL Injection and XSS as the "big two". These are the things that can cause the most damage to your application, and it's users, but they also take a lot of work to properly eliminate. SQL Injection occurs when we take unvalidated user inout and use it in an SQL string without any form of "escaping" applied. If you have code like this, you're vulnerable to SQL Injection:

$sql = "SELECT * FROM users WHERE username = '".$_POST['username']."' AND password = '".$_POST['password']."'";

So what's wrong with that? Well, to begin with, let's say that my surname is used as the username, what do we get as SQL?

SELECT * FROM users WHERE username = 'O'Halloran' AND password = 'SuperSecret123';

Can you spot the problem? 10 points if you said "that's invalid SQL"! Exactly, the ' in my surname marks the end of the string, then our DB will try and interpret the "Halloran" part as an SQL keyword or identifier and fail. So now we've got a way to generate invalid SQL statements, but is that a security problem? Absolutely, what if our attacker entered "admin'; -- " as their username? Let's see that in SQL:

SELECT * FROM users WHERE username = 'admin'; -- ' AND password = 'SuperSecret123';

That's actually valid SQL because the "--" indicates the start of a comment... Lets trim off the comment text, the problem should be obvious now:

SELECT * FROM users WHERE username = 'admin';

Where'd our password check gone??? If we use the above fragment of PHP code to construct the SQL statement, it becomes trivially easy to bypass the login page. So what do we do about this? The problem occurs because our database uses ' characters to distinguish the start and end of strings. Given that these characters might also legitimately occur in the middle of a string it needs a way to tell the difference between the end of a string and a ' in the data. All databases do this by "escaping" the mid-string 's in some way. In MySQL we can use the sequence "\'" and MySQL knows that whenever we see a backslash followed by a single quote (') it shouldn't treat that single quote as the end of a string. The PHP magic quotes feature was intended to automagically fix this, but it created more problems than it solved. So there's two solutions to the problem, we can call an escaping function on the values before they are used in SQL statements, which in standard PHP would look like this:

$sql = "SELECT * FROM users WHERE username = '".mysql_real_escape_string($_POST['username'])."' AND password = '".mysql_real_escape_string($_POST['password'])."'";

CodeIgniter provides it's own string escaping functions as well. Different databases often have slightly different rules regarding escaping special characters, so we shouldn't use mysql_real_escape_string with a Postgres database. Using the CI escaping functions gives us some portability between database engines. The CI version would look like this:

$rs = $this->db->query("SELECT * FROM users WHERE username = '".$this->db->escape($username)." AND password = ".$this->db->escape($password));

CI also provides a very convenient form of escaping often called "prepared statements" or "query bindings". When using prepared statements you write your SQL statements and put question marks in the SQL string where a variable should go. You then supply an array of variables as the second parameter, and CI will take care of inserting those into the SQL string with proper escaping. Using prepared statements out example becomes:

$rs = $this->db->query("SELECT * FROM users WHERE username = ? AND password = ?", array($username, $password));

I actually find this form of query makes the SQL much easier to read, you can focus on the logic of the SQL without being distracted by the mechanics of constructing the string with escaped values.

In part 2, I posted the FeedItemModel class, which had both load and save methods.for feed items. If you take another look at the save() method now you can clearly see the use of prepared statements, ensuring all data is escaped properly before it's use in the database.

function save() {
    if ($this->_id !== false) {
        $this->db->query('UPDATE items SET link=?, title=?, text=?, updated_time=NOW() WHERE id=?', array($this->link, $this->title, $this->text, $this->_id));
    } else {
        $this->db->query('INSERT INTO items(feed_id, remote_id, link, title, text, created_time, updated_time) VALUES(?, ?, ?, ?, ?, NOW(), NOW())', array($this->feed_id, $this->remote_id, $this->link, $this->title, $this->text));
        $this->_id = $this->db->insert_id();
    }
}

Escaping data in this way doesn't stop an attacker fron trying to get rubbish data inserted into the database (we need to validate and filter for that) but it will prevent them from being able to generate invalid SQL in the process.

Cross Site Scripting (XSS)

SQL Injection vulnerabilities allow an attacker to manipulate the SQL that's used with the database in order to acheive unintended results. Cross Site Scripting on the other hand allows an attacker to manipulate the HTML that's sent to either their own, or another user's browser. Usually the objective would be to insert Javascript code into the HTML and have that executed for their own purposes. The javascript might redirect the user to a different web site, read cookies, or anything else that we can accomplish with javascript. Cross Site Scripting is a real issue for any site which relies on user generated content, a user could creat malicious content which "attacks" other users or site admins. For example, if a forum package was suceptible to XSS, a user on that forum could make a post which automatically redirected anyone who viewed that post to a different web site (maybe a competing forum). As another example, lets say we had a blog package that was vulnerable. An attacker could leave a comment which captured subsequent user's site cookies and submitted them to a remote web site the attacker controlled. Later the blog's owner might view that comment, the attacker captures his/her cookies and uses those to impersonate the site owner. As I said, pretty much any site which displays user generated content is a potential target for XSS attacks. That's pretty much all of Web 2.0!

Like SQL Injection the solution here involves escaping. XSS is often introduced via a script tag in the content we're trying to generate. The objective here is to remove HTML content which might allow the introduction of XSS. This might mean:

Stripping out all HTML tags and attempting to reduce input to plain text.
Limiting users to a small subset of HTML known to be safe from XSS attacks.
Escaping angle brackets and other special HTML characters so they become visible instead of being parsed as HTML by the browser.
Using an alternate markup language like bbCode or textile which doesn't give users the tools they need to create XSS attacks.

The code I posted in part 2 is extremely vulnerable to XSS, we're displaying title, and item content without sanitizing it in any way. This data comes from an RSS feed from some unknown person/company elsewhere on the net and shouldn't be trusted. You can see just how vulnerable we are right now, but running the following SQL statement then going to your FeedIgnition home page.

INSERT INTO `feedignition`.`items` (`id` , `feed_id` , `remote_id` , `title` , `link` , `text`, `updated_time` , `created_time`)
VALUES ( NULL , '1', '12345678901234567890123456789012', 'XSS Example', 'http://www.example.com/', '<script language="javascript">alert("XSS Vulnerable!");</script>If you see an alert box with the words "XSS Vulnerable" you''re vulnerable to XSS exploits.', NOW( ) , NOW( ));

This test displays a simple message box, but remember anything which can be done in javascript could be done at this point.

Filter Input, Escape Output

The basic security mantra for web development is "Filter Input, Escape Output". This means check all of the input to your application to ensure it conforms with your expectations, and use escaping functions on all of your output (e.g. SQL Queries).

Proper input filtering means validating all input to your system to make sure that the data you receive as input is the sort of data you expected to receive. If you expected a numeric value in a particular field, make sure you got one. If there's a select field with 3 possible values, make sure what you got on submission was one of these values. It's extremely easy for an attacker to bypass the HTML form itself and construct a HTTP request with their own data, so don't assume the only thing you'll ever get on form submit is what a user might have been able to enter.

Addressing XSS

So lets take stock for a moment of the things we're displaying in the feed view and asses it's risk.

Pagination - Generated entirely by CI, no user entered data. No risk.
Item Title - From external RSS feed. XSS vulnerable. HTML formatting not needed.
Link URL - From RSS feed, XSS vulnerable if javascript: psuedo-protocol is used. Need to restrict to just http, https, and ftp URLs.
Item Content - From RSS feed, XSS vulnerable. Want to allow an item to contain HTML (e.g. this post) but need to prevent XSS.

The pagination can remain "as is", there's no risk there, but lets look at each of the others in turn and see what we might do about them.

The item's title should be plain text, and even if it isn't we want to treat it as such to preserve the formatting of our header. I'll use the htmlentities() function provided by PHP to convert all "special" HTML characters (e.g. angle brackets) into their HTML representation (e.g. < into <). Edit system/application/views/feed/view.php and change the following line:

<h1><a href="<?=$item['link'];?>"><?=$item['title'];?></a></h1>

... to:

<h1><a href="<?=$item['link'];?>"><?=htmlentities($item['title']);?></a></h1>

We also want to change the update_all method in the controller to decode any entities which were supplied in the RSS feed, thus avoiding encoding them twice. Look for the following line in system/application/controllers/feed.php: ``` $this->FeedItemModel->title = $item->get_title();


 ... and change it to... ```
$this->FeedItemModel->title = htmlentities_decode($item->get_title());

I noted above that the link URLs shouldn't use the javascript Psuedo-protocol ( allowing this allows feeds to introduce javascript directly). We could to a simple string comparison on the URL and see if it uses an invalid protocol, but what I'd rather do is use a whitelist approach, and allow any valid http, https and ftp urls.

We'll decode the URL when we fetch it from the feed, and see if it matches our whitelist of protocols. In the controller's update_all() function, find the following line:

$this->FeedItemModel->link = $item->get_permalink();

Now let's make our changes. Replace the above line with:

$permalink = $item->get_permalink();
$scheme = @parse_url($permalink, PHP_URL_SCHEME);
if ($scheme === FALSE || !in_array($scheme, array('http', 'https', 'ftp'))) { // parse_url returns false for seriously malformed URLs
    $permalink = '';
}
$this->FeedItemModel->link = $permalink;

The final challenge is the item's content. We want to remove the possibility of script (and therefore XSS), while allowing otherwise harmless HTML (e.g. bulleted lists, bold, underlines, headings, etc). CI is of assistance here, there's an xss_clean method available on the input library which strips out some of the tags and other html entities normally used to introduce XSS. However when this was applied to the example we inserted into the database earlier, it left the javascript inside the script tag visible, which isn't ideal. Time to pull out the big guns!

HTML Purifier

HTML Purifier is a PHP library specifically designed to parse HTML, remove any element not on a whitelist, and put the HTML back together in a standards compliant form. This means that you can have a text area, allow the user to type anything they like into it, and HTML Purifier will remove XSS attacks and return W3C valid HTML. I like the idea of a whitelist based approach because it means that HTML Purifier won't need to be updated every time there's a new form of XSS attack found, the whitelist approach means that we remove everything except HTML tags that are known to be safe, rather than removing things which are known to be bad. Sounds like exactly what we want for this job!

Andy Mathijs at Mindloop has done the hard work figuring out how to integrate HTML Purifier and CodeIgniter . As discussed in part 2, its quite easy to integrate 3rd party libraries into CodeIgniter provided they meet a few simple rules.

They're placed in the system/application/libraries directory.
They follow the CodeIgniter file naming convention (at least for the main class file).
The constructor doesn't expect any arguments, as CI won't allow you to supply parameters via the $this->load->library() method. All of these can usually be overcome with minor tweaks to the library code.

Lets start by downloading HTML Purifier , then unzipping it to a temporary directory. I'm not aiming for PHP4 compatibility, so I got the strict version (Andy uses the standard version in the link above, so either should work).

wget http://htmlpurifier.org/releases/htmlpurifier-2.1.2-strict.zip
unzip htmlpurifier-2.1.2-strict.zip /tmp

Change into the temporary directory and pick out the bits that we need. We need both the HTMLPurifier directory and the HTMLPurifier.php file from the "library" directory of the bundle we just unpacked:

cd /tmp/htmlpurifier-2.1.2-strict/library
cp HTMLPurifier.php  /var/www/system/application/libraries/
cp -a HTMLPurifier  /var/www/system/application/libraries/

We also need to add the following line to the top of the HTMLPurifier.php file to add the HTML Purifier modules to the PHP include path.

set_include_path(dirname(FILE) . PATH_SEPARATOR . get_include_path() );

Now we can load HTML Purifier like any other CI library and use it. We'll purify the item content when we update the feeds rather than on output for efficiency reasons (a feed item may be displayed many times, but likely only written to the database once). So lets go back to the update_all function in our controller and add the following near the top, just underneath the block that loads the SimplePie library

$this->load->library('HTMLPurifier');
$purifier_config = HTMLPurifier_Config::createDefault();
$purifier_config->set('Cache', 'SerializerPath', BASEPATH .'cache');

Notice that, like SimplePie, we're reconfiguring the HTML Purifier library to use the CI cache directory for it's entify cache. This makes setting filesystem permissions much easier later when we need to install the app. Now we need to change the following line:

$this->FeedItemModel->text = $item->get_content();

to use HTML purifier:

$this->FeedItemModel->text = $this->htmlpurifier->purify($item->get_content(), $purifier_config);

So now we have a version of the HTML which we can be sure if free of XSS. We don't need to make any changes to the output in this case, just display it as is.

Wrapup

In part 1 , we got CodeIgniter intalled and arranged into a usable form, including .htaccess configuration. Then in part 2 we got a basic River of News style aggregator off the ground, introducing basic CodeIgniter MVC concepts along the way. While there wasn't a whole lot of new code in this part of the series, we've made some very important security enhancements. I hope the brief introduction to SQL Injection and XSS has been useful, and you'll see the techniques introduced here used throughout the project in coming parts.

Sorry this part has taken so long to materialise, it was written some weeks ago, but the code required additional testing before I was prepared to post it. I'm aiming for a fortnightly posting schedule for the remainder of the series, we'll see how we go.