Those who know don't talk.
Those who talk don't know.
Sometimes, PHP "as is" simply isn't enough. Although these cases are rare for the average user, professional applications will soon lead PHP to the edge of its capabilities, in terms of either speed or functionality. New functionality cannot always be implemented natively due to language restrictions and inconveniences that arise when having to carry around a huge library of default code appended to every single script, so another method needs to be found for overcoming these eventual lacks in PHP.
As soon as this point is reached, it's time to touch the heart of PHP and take a look at its core, the C code that makes PHP go.
This information is currently rather outdated, parts of it only cover early stages of the ZendEngine 1.0 API as it was used in early versions of PHP 4.
More recent information may be found in the various README files that come with the PHP source and the » Internals section on the Zend website.
"Extending PHP" is easier said than done. PHP has evolved to a full-fledged tool consisting of a few megabytes of source code, and to hack a system like this quite a few things have to be learned and considered. When structuring this chapter, we finally decided on the "learn by doing" approach. This is not the most scientific and professional approach, but the method that's the most fun and gives the best end results. In the following sections, you'll learn quickly how to get the most basic extensions to work almost instantly. After that, you'll learn about Zend's advanced API functionality. The alternative would have been to try to impart the functionality, design, tips, tricks, etc. as a whole, all at once, thus giving a complete look at the big picture before doing anything practical. Although this is the "better" method, as no dirty hacks have to be made, it can be very frustrating as well as energy- and time-consuming, which is why we've decided on the direct approach.
Note that even though this chapter tries to impart as much knowledge as possible about the inner workings of PHP, it's impossible to really give a complete guide to extending PHP that works 100% of the time in all cases. PHP is such a huge and complex package that its inner workings can only be understood if you make yourself familiar with it by practicing, so we encourage you to work with the source.
The name Zend refers to the language engine, PHP's core. The term PHP refers to the complete system as it appears from the outside. This might sound a bit confusing at first, but it's not that complicated ( see below). To implement a Web script interpreter, you need three parts:
The interpreter part analyzes the input code, translates it, and executes it.
The functionality part implements the functionality of the language (its functions, etc.).
The interface part talks to the Web server, etc.
The following sections discuss where PHP can be extended and how it's done.
As shown above, PHP can be extended primarily at three points: external modules, built-in modules, and the Zend engine. The following sections discuss these options.
External modules can be loaded at script runtime using the function dl(). This function loads a shared object from disk and makes its functionality available to the script to which it's being bound. After the script is terminated, the external module is discarded from memory. This method has both advantages and disadvantages, as described in the following table:
|External modules don't require recompiling of PHP.||The shared objects need to be loaded every time a script is being executed (every hit), which is very slow.|
|The size of PHP remains small by "outsourcing" certain functionality.||External additional files clutter up the disk.|
|Every script that wants to use an external module's functionality has to specifically include a call to dl(), or the extension tag in php.ini needs to be modified (which is not always a suitable solution).|
Third parties might consider using the extension tag in php.ini to create additional external modules to PHP. These external modules are completely detached from the main package, which is a very handy feature in commercial environments. Commercial distributors can simply ship disks or archives containing only their additional modules, without the need to create fixed and solid PHP binaries that don't allow other modules to be bound to them.
Built-in modules are compiled directly into PHP and carried around with every PHP process; their functionality is instantly available to every script that's being run. Like external modules, built-in modules have advantages and disadvantages, as described in the following table:
|No need to load the module specifically; the functionality is instantly available.||Changes to built-in modules require recompiling of PHP.|
|No external files clutter up the disk; everything resides in the PHP binary.||The PHP binary grows and consumes more memory.|
Of course, extensions can also be implemented directly in the Zend engine. This strategy is good if you need a change in the language behavior or require special functions to be built directly into the language core. In general, however, modifications to the Zend engine should be avoided. Changes here result in incompatibilities with the rest of the world, and hardly anyone will ever adapt to specially patched Zend engines. Modifications can't be detached from the main PHP sources and are overridden with the next update using the "official" source repositories. Therefore, this method is generally considered bad practice and, due to its rarity, is not covered in this book.
Prior to working through the rest of this chapter, you should retrieve clean, unmodified source trees of your favorite Web server. We're working with Apache (available at » http://httpd.apache.org/) and, of course, with PHP (available at » https://www.php.net/ - does it need to be said?).
Make sure that you can compile a working PHP environment by yourself! We won't go into this issue here, however, as you should already have this most basic ability when studying this chapter.
Before we start discussing code issues, you should familiarize yourself with the source tree to be able to quickly navigate through PHP's files. This is a must-have ability to implement and debug extensions.
The following table describes the contents of the major directories.
|php-src||Main PHP source files and main header files; here you'll find all of PHP's API definitions, macros, etc. (important). Everything else is below this directory.|
|php-src/ext||Repository for dynamic and built-in modules; by default, these are the "official" PHP modules that have been integrated into the main source tree. From PHP 4.0, it's possible to compile these standard extensions as dynamic loadable modules (at least, those that support it).|
|php-src/main||This directory contains the main php macros and definitions. (important)|
|php-src/pear||Directory for the PHP Extension and Application Repository. This directory contains core PEAR files.|
|php-src/sapi||Contains the code for the different server abstraction layers.|
|TSRM||Location of the "Thread Safe Resource Manager" (TSRM) for Zend and PHP.|
|ZendEngine2||Location of the Zend Engine files; here you'll find all of Zend's API definitions, macros, etc. (important).|
Discussing all the files included in the PHP package is beyond the scope of this chapter. However, you should take a close look at the following files:
php-src/main/php.h, located in the main PHP directory. This file contains most of PHP's macro and API definitions.
php-src/Zend/zend.h, located in the main Zend directory. This file contains most of Zend's macros and definitions.
php-src/Zend/zend_API.h, also located in the Zend directory, which defines Zend's API.
Zend is built using certain conventions; to avoid breaking its standards, you should follow the rules described in the following sections.
For almost every important task, Zend ships predefined macros that are extremely handy. The tables and figures in the following sections describe most of the basic functions, structures, and macros. The macro definitions can be found mainly in zend.h and zend_API.h. We suggest that you take a close look at these files after having studied this chapter. (Although you can go ahead and read them now, not everything will make sense to you yet.)
Resource management is a crucial issue, especially in server software. One of the most valuable resources is memory, and memory management should be handled with extreme care. Memory management has been partially abstracted in Zend, and you should stick to this abstraction for obvious reasons: Due to the abstraction, Zend gets full control over all memory allocations. Zend is able to determine whether a block is in use, automatically freeing unused blocks and blocks with lost references, and thus prevent memory leaks. The functions to be used are described in the following table:
|emalloc()||Serves as replacement for malloc().|
|efree()||Serves as replacement for free().|
|estrdup()||Serves as replacement for strdup().|
|estrndup()||Serves as replacement for strndup(). Faster than estrdup() and binary-safe. This is the recommended function to use if you know the string length prior to duplicating it.|
|ecalloc()||Serves as replacement for calloc().|
|erealloc()||Serves as replacement for realloc().|
To allocate resident memory that survives termination of the current script, you can use malloc() and free(). This should only be done with extreme care, however, and only in conjunction with demands of the Zend API; otherwise, you risk memory leaks.
The following directory and file functions should be used in Zend modules. They behave exactly like their C counterparts, but provide virtual working directory support on the thread level.
|Zend Function||Regular C Function|
|V_CHDIR_FILE()||Takes a file path as an argument and changes the current working directory to that file's directory.|
Strings are handled a bit differently by the Zend engine than other values such as integers, Booleans, etc., which don't require additional memory allocation for storing their values. If you want to return a string from a function, introduce a new string variable to the symbol table, or do something similar, you have to make sure that the memory the string will be occupying has previously been allocated, using the aforementioned e*() functions for allocation. (This might not make much sense to you yet; just keep it somewhere in your head for now - we'll get back to it shortly.)
Complex types such as arrays and objects require different treatment. Zend features a single API for these types - they're stored using hash tables.
To reduce complexity in the following source examples, we're only working with simple types such as integers at first. A discussion about creating more advanced types follows later in this chapter.
PHP 4 features an automatic build system that's very flexible. All modules reside in a subdirectory of the ext directory. In addition to its own sources, each module consists of a config.m4 file, for extension configuration. (for example, see » http://www.gnu.org/software/m4/)
All these stub files are generated automatically, along with .cvsignore, by a little shell script named ext_skel that resides in the ext directory. As argument it takes the name of the module that you want to create. The shell script then creates a directory of the same name, along with the appropriate stub files.
Step by step, the process looks like this:
:~/cvs/php4/ext:> ./ext_skel --extname=my_module Creating directory my_module Creating basic files: config.m4 .cvsignore my_module.c php_my_module.h CREDITS EXPERIMENTAL tests/001.phpt my_module.php [done]. To use your new extension, you will have to execute the following steps: 1. $ cd .. 2. $ vi ext/my_module/config.m4 3. $ ./buildconf 4. $ ./configure --[with|enable]-my_module 5. $ make 6. $ ./php -f ext/my_module/my_module.php 7. $ vi ext/my_module/my_module.c 8. $ make Repeat steps 3-6 until you are satisfied with ext/my_module/config.m4 and step 6 confirms that your module is compiled into PHP. Then, start writing code and repeat the last two steps as often as necessary.
The default config.m4 shown in The default config.m4. is a bit more complex:
Example #1 The default config.m4.
dnl config.m4 for extension my_module dnl Comments in this file start with the string 'dnl'. dnl Remove where necessary. This file will not work dnl without editing. dnl If your extension references something external, use with: dnl PHP_ARG_WITH(my_module, for my_module support, dnl Make sure that the comment is aligned: dnl [ --with-my_module Include my_module support]) dnl Otherwise use enable: dnl PHP_ARG_ENABLE(my_module, whether to enable my_module support, dnl Make sure that the comment is aligned: dnl [ --enable-my_module Enable my_module support]) if test "$PHP_MY_MODULE" != "no"; then dnl Write more examples of tests here... dnl # --with-my_module -> check with-path dnl SEARCH_PATH="/usr/local /usr" # you might want to change this dnl SEARCH_FOR="/include/my_module.h" # you most likely want to change this dnl if test -r $PHP_MY_MODULE/; then # path given as parameter dnl MY_MODULE_DIR=$PHP_MY_MODULE dnl else # search default path list dnl AC_MSG_CHECKING([for my_module files in default path]) dnl for i in $SEARCH_PATH ; do dnl if test -r $i/$SEARCH_FOR; then dnl MY_MODULE_DIR=$i dnl AC_MSG_RESULT(found in $i) dnl fi dnl done dnl fi dnl dnl if test -z "$MY_MODULE_DIR"; then dnl AC_MSG_RESULT([not found]) dnl AC_MSG_ERROR([Please reinstall the my_module distribution]) dnl fi dnl # --with-my_module -> add include path dnl PHP_ADD_INCLUDE($MY_MODULE_DIR/include) dnl # --with-my_module -> chech for lib and symbol presence dnl LIBNAME=my_module # you may want to change this dnl LIBSYMBOL=my_module # you most likely want to change this dnl PHP_CHECK_LIBRARY($LIBNAME,$LIBSYMBOL, dnl [ dnl PHP_ADD_LIBRARY_WITH_PATH($LIBNAME, $MY_MODULE_DIR/lib, MY_MODULE_SHARED_LIBADD) dnl AC_DEFINE(HAVE_MY_MODULELIB,1,[ ]) dnl ],[ dnl AC_MSG_ERROR([wrong my_module lib version or lib not found]) dnl ],[ dnl -L$MY_MODULE_DIR/lib -lm -ldl dnl ]) dnl dnl PHP_SUBST(MY_MODULE_SHARED_LIBADD) PHP_NEW_EXTENSION(my_module, my_module.c, $ext_shared) fi
If you're unfamiliar with M4 files (now is certainly a good time to get familiar), this might be a bit confusing at first; but it's actually quite easy.
Note: Everything prefixed with dnl is treated as a comment and is not parsed.
The config.m4 file is responsible for parsing the command-line options passed to configure at configuration time. This means that it has to check for required external files and do similar configuration and setup tasks.
The default file creates two configuration directives in the configure script: --with-my_module and --enable-my_module. Use the first option when referring external files (such as the --with-apache directive that refers to the Apache directory). Use the second option when the user simply has to decide whether to enable your extension. Regardless of which option you use, you should uncomment the other, unnecessary one; that is, if you're using --enable-my_module, you should remove support for --with-my_module, and vice versa.
By default, the config.m4 file created by ext_skel accepts both directives and automatically enables your extension. Enabling the extension is done by using the PHP_EXTENSION macro. To change the default behavior to include your module into the PHP binary when desired by the user (by explicitly specifying --enable-my_module or --with-my_module), change the test for $PHP_MY_MODULE to == "yes":