PHP Tips and Tricks with Rasmus Lerdorf

Jim O'Halloran • January 14, 2004

php linuxconfau-2004

After lunch, the next thing on the Linux.Conf.Au agenda for me was Rasmus Lerdorf's "PHP Tips & Tricks" presentation.

Rasmus started off with an introduction to where PHP fits in the programming landscape. What some people are doing with it, and what its really designed for. Essentially Rasmus sees PHP as a templating system, which he describes as "a mechanism to separate logic from layout". As he says "PHP is a general purpose templating system". Other templating systems have been built on top of PHP, but by the time they usually add loops and conditionals, "Any general purpose templating system will eventually become PHP."

Rasmus briefly demoed a few applications written in PHP, but one that caught my eye especially was Cacti which was a nice web based graphical management and monitoring tool.

He also demoed the usage of the gdchart library end extension to create a line chart in about 8 lines of PHP code. Gdchart is written in C, optimized for performance, with Yahoo! type scalability in mind.

PHP can generate a Macromedia Flash animation. Tools are also available to decompile Macromedia authored Flash files into PHP, which can then be rebuilt with dynamic data if required. That's pretty cool, and someone is using this sort of thing to create an online RPG type game, which is really neat.

PECL is the PHP Extension Code Library. As PHP has grown more extensions, PHP has become harder to release as each extension needs to be bought into a releasable state. PECL aims to solve that by removing many of the extensions from the main distributions and putting them into separate PEAR installable packages.

When setting up PHP with MySQL, make sure that MySQL allows more connections than Apache. Apache defaults to 150 simultaneous connections while MySQL defaults to 100. Most of the time this will work, but when your PHP site gets SlashDotted you'll run out of MySQL connections and scripts will fail because MySQL will refuse connections before Apache.

The PHP "magic quotes" feature automatically escapes quotes, etc and automatically prevents most forms of SQL injection attack. Wish I'd know about that a couple of weeks beck when I was working on fixing SQL Injections in MyHelpDesk.

For busy sites a reverse proxy like Squid can be used to boost performance dramatically. You can also use SquidGuard redirector to redirect different domain names to different apache instances or different machines altogether.

$PATH_INFO can be useful for creating friendly URL's. Using an Apache trick you can force a PHP script to be executed and return some results. You can also replace your 404 error page using an Apache configuration option and use PHP script to redirect to different locations. Of course if you really want your 404 page to 404, use "Header( ‘HTTP/1.0 404 Not Found');". Rasmus also demonstrated a really neat concept for using the 404 page to generate and cache dynamic image files.

All this talk of using the 404 page to do useful work prompted Rasmus to ask the question "Why should you decide where the information on your site is located, why not leave t to your users?". In other words, why not use the 404 page to try and conjour up some useful content (e.g. a search or something) for whatever URL the user types in. Interesting food for thought.

The "auto_prepend" configuration option allows you to specify a file which is automatically prepended to all PHP files. This can be handy for including common code without having to do so explicitly.

There are several options available (safe mode, open basedir, etc) for ISP's needing to isolate different PHP users from each other and their host systems, but none are really 100% effective. When coding scripts, watch out for uninitialised variables, and never ever trust user data. Be paranoid with your validation of anything supplied from the client browser.

The RealPath will properly resolve a file name figuring out any "/../"s which might be in use. Then prefix the RealPath with the Document Root before opening any files and you'll pretty much guarantee that nothing can be opened outside of the document root.

I've seen it suggested that people use .php extensions for their include instead of .inc for security reasons. However, it seems that .inc may be a better solution as long as Apache is configured not to serve up that file type at all.

If you allow files to be uploaded, be especially paranoid if they're to reside inside the document root. Validate that you've receive the file type you expect, including opening up the fill to ensure that its contents really do match up with the extension.

The some of the major changes in PHP5 relate to Object Oriented features, which I haven't really played with that much in PHP4, so I haven't really noted whats new. Thereis also a Try/Cattch error handler mechanism, which should simplify the code in error prone areas like connecting to a database. DOMXML has been improved, with a general cleanup, and bug fixes.

PHP5 also introduces a new simple XML parser, which should make working with XML a lot easier. However the simple XML parser does load the entire file into memory which might make it unsuitable for processing large files.

PHP5 also bundles SQLite, which is an SQL interface for flat files. Pretty neat looking stuff too.

Rasmus also shared some hints on optimising PHP code. Essentially you should try to keep the includes to a minimum, use OOP techniques only where appropriate, and the same for layers, abstractions, etc. Opcode caches can dramatically improve performance. Poorly written regular expressions can also slow things down as well. Finally if you have plenty of spare CPU, and limited bandwidth, try turning on output compression.

There are a few useful techniques for benchmarking PHP applications. First of all, have a look at the average size of the pages you're generating. If they're fairly large you may need to look at kernel buffers. Also run http_load from acme.com for load testing. While http_load is running, use vmstat to check for idle CPU time. If the CPU is idle, then it suggests the system is IO bound somewhere, and you need to improve throughput somehow. A fully utilized CPU suggests that some benefit can be gained by tuning the PHP code itself.

If we need to tune the PHP, then check the include_path and shorten where possible. Turn off open_basedir if you don't need it. Also remove un used arrays from the variable_order setting n PHP.ini to prevent PHP from populating unused $_[] arrays. Also look into an opcode cache.

The XDebug extension can be used to get stack trace data for profiling. XDebug also has a modified rror handler which gives a lot more debug information than the standard error handler. XDebug.org is the home for XDebug.

All in all it was a brilliant presentation, which could probably be renamed " Things every PHP programmer should know but probably doesn't". Rasmus's slides are available from Rasmus' site.

I do try to keep these things reasonably short, but they seem to be getting longer every day.