php’s missing array_usearch

Pure geekery today, kids.

php has a bunch of functions that work on arrays. (In most modern languages arrays are defined by classes or prototypes and have methods built-in. php is not a modern language.) There are functions like array_sort that takes an array and puts the elements of that array in order. That’s fine if you have an array of numbers or strings, but for more complex things, how does the code decide which comes first?

For that use, there is another function, array_usort that takes an array, and you tell it what code to use to compare the two items.

php also has a method called array_search which finds whether an array has a particular item in it. As before this works fine for simple items, but becomes less useful as the items in the array grow in complexity, or you want to find something that you don’t already have a full example of. What if you have a list of books and you want to find the one titled Huckleberry Finn?

It seems logical that there would be a search function where, as for array_usort, you tell the code what defines a “match”, and then off it goes to see what comes up. Logical, but it’s not there (unless it’s tucked away with a terrible name that makes no sense, which is entirely possible in php).

So after about the eleventy-hundredth time writing a little loop to find something in an array I said, “dangit, I’m writing array_usearch.”

function array_usearch(array $array, Closure $test) {
    $found = false;
    $iterator = new ArrayIterator($array);
 
    while ($found === false && $iterator->valid()) {
        if ($test($iterator->current())) {
            $found = $iterator->key();
        }
        $iterator->next();
    }
 
    return $found;
}

All this does is try each element in the array against a function you provide until the function returns true, then it returns the key for that item in the array. If no match is found, it returns false, the same way array_search does. Simple! Using it would look something like this:

// define a type to put into a list
class Thing  {
    public $id;
    public $name;
 
    public function __construct($id, $name, $category) {
        $this->id = $id;
        $this->name = $name;
    }
}
 
// make a list of them, mixed up a bit
$listOfThings = [
    new Thing(1, 'one'),
    new Thing(2, 'two'),
    new Thing(4, 'four'),
    new Thing(3, 'three'),
];
 
// find the index of the item with id = 4
$id4Index = array_usearch($listOfThings, function($thing) {
    return $thing->id === 4;
});
// $id4Index will now be 2

The function will work on all php array types, whether with numeric indices or strings.

php purists might object to using the name array_usearch because all the other array_u* functions take a callable for defining the function, while this version uses a Closure. There are a couple of reasons: 1) Closures didn’t exist in php when the array_u* functions were defined, 2) it’s the 21st century now and other languages use closures in this manner for a reason, and 3) closures allow the function that gets passed to array_usearch to be reused with different values. With a little extra setup we can make searching super-clean:

// function that returns an anonymous function that captures the id to search for
$idClosure = function($id) {
    return function($item) use ($id) {
        return $item->id = $id;
    }
}
 
$id4Index = array_usearch($idClosure(4)); // value will be 2
$id2Index = array_usearch($idClosure(2)); // value will be 1

Now we can write code compactly that can search for matches of arbitrary complexity, and we can create little factories to produce the search functions themselves, so the complexity is tucked away out of sight. This variation takes an array of key/value pairs and searches for items that match all of those values:

function firstIndexMatching(array $array, array $criteria, bool $useStrict = true) {
 
    if (count($criteria) < 1) {
        return false;
    }
 
    // create a closure that has captured the search criteria
    $testWithCriteria = function($criteria, $useStrict) {
 
        return function($item) use ($criteria, $useStrict) {
 
            foreach($criteria as $key => $value) {
                if (!isset($item->$key)) {
                    return false;
                } else if ($useStrict && $item->$key !== $value) {
                    return false;
                } else if (!$useStrict && $item->$key != $value) {
                    return false;
                }
            }
 
            return true;
        };
    };
 
    return array_usearch($array, $testWithCriteria($criteria, $useStrict));
}

Now if you have an array of people, for instance, you can search for the first match with a given name:

$joeCoolIndex = firstIndexMatching($people, [
    'firstName' => 'Joe',
    'lastName' => 'Cool'
]);

The loop and the comparisons are moved out of the way and all the main part of your code need to do is supply the criteria for the search.

Ultimately after a search like this, you will want to have the item, not just its index. That’s easy enough, but don’t forget that if no match is found, array_usearch will return false, which php will often conflate with 0, so extra care has to be taken when using the returned index.

$joeCool = $joeCoolIndex !== false ? $people[$joeCoolIndex] ?? null : null;

Obviously this could be added to the firstIndexMatching function if one is never interested in the index itself.

And there you have it! A simple callback-based search function, ready to keep your main code clean and clear.

2

I’m Doing it Wrong

It is a lovely evening, and I’m enjoying patio life. My employer had a beer bash today, but The Killers are playing and I didn’t reserve a spot in time. So I came home instead, and after proper family greetings I repaired to the patio to do creative stuff. It’s blogtober, after all.

So what creative stuff have I been up to?

Creating a class that extends Event Service Sessions to add calendar server capabilities. (php is about the worst language on the planet for injecting new context-related capabilities into an existing class definition. In other words, php is not friendly to duck punching, or “Monkey Patching” as the kids call it these days.

The linked Wikipedia article completely misses the most common use-case for this practice, in which I want to get a thing from some service and then augment it. But php doesn’t flex that way, so I just have to deal with it.

Which is to say, I’m doing Friday evening wrong. It is lovely out, my co-workers are chugging down the last of their beers as The Killers wrap up. I am on my patio with my dogs, the air finally starting to cool after an unusually warm day. It is nice. You’d think I could find a better use of this time than wrangling with a programming language.

But apparently you’d think wrong.

3

Time Not Well-Spent

Here it is, Whiskey-Exemption Thursday, and my weight is on-target so I can even have beer. The purpose of Thursday is to devote an evening to pushing the writing forward, and hang the consequences.

What have I been writing this fine evening? I’ve been trying to come up with the least-objectionable way to emulate Swift’s extensions to Protocols in php. The answer: there is no way.

Begin geek

Coding with php is coding with flint knives and bearskins; the power of php is in its wham-bam-thank-you-ma’am ability to do a quick task and then to go away.

Bless the movers behind php, they’re trying to evolve their language to catch up with the way people are using it these days. If they had known Drupal was coming along, they might not have been so quick-and-dirty before. Drupal might be slightly less awful as a result.

There are design patterns enabled by Swift that I get a little misty contemplating. Being able to add extensions (with executable code!) to protocols is enormously powerful. Having experienced that, I wanted to do the same thing in php, creating a trait “taggable” and having classes that used it automatically injected with the implementation. Injected, not inherited. Ain’t gonna happen.

End geek

At least now I’m writing prose about writing the code rather than writing the code itself. Progress, I guess.

3

Cascading Style Sheets (CSS) and PHP

Often when dealing with Cascading Style sheets, or CSS, I find myself wishing that the CSS mechanism included variables. This is especially true when dealing with colors, since you want the same color applied to lots of different things. It can be a real pain to go back through an old style sheet and find the code for the color you want. I was quietly surprised that no one making up how CSS worked had addressed something like this.

Then, a while back I was giving a buddy of mine a few exercises to introduce him to the exciting world of Web programming, touching on CSS, HTML, PHP and MySQL. I gave him pretty much no guidance; I just thought up plans that would introduce him to the concepts and gave him a list of my favorite references. (I’ll be posting those exercises here in the nearish future.)

Anyway, without me to tell him how to do things, he went and dug around and one of the first style sheets he sent me for evaluation had a .php extension rather than .css.

Bingo! Once you see it in action, it’s obvious. PHP can be used to generate CSS files just as easily as it can be used to generate HTML files. Now my style sheets can change based on external conditions or can simply define a set of colors that all the style definitions share. Why did it take me so long to figure this out? It seems like this technique should be a lot more common than it is.

Here’s a quick code snippet for those who want to try it for themselves:

<?php
	header('Content-Type: text/css');
 
	$header_back_color = '#dddddd';
?>
 
#corner_table th {
	background-color:<?php echo $header_back_color ?>;
	text-align: center;
}

A couple of notes: the <?php MUST be the very first thing in the file. No empty lines, no spaces. The reason is that the next line, with the header() function, has to be called before the server sends any page content. (Once the server starts sending content back to the browser, it’s too late to be fiddling with the headers. Any whitespace outside the <?php tag will be considered content.) The header line is necessary because you need to tell your browser that what you are sending really is a css file.

In the <head> of the html file, you call the style sheet just like normal, but of course the file you fetch will have a php extension:

<link rel="stylesheet"
      href="http://yourdomain.com/css-tables.php"
      type="text/css"
      media="screen" />

That’s all there is to it. Why have I not done this with every css file?

Lost in Translation?

Even if you’re not a programmer, take a look at the following lines of code:

public function sendCommunication($oCommunication)
{
    if (self::emailMode != EMAIL_TEST_MODE_NONE) {
        if (self::emailMode == EMAIL_TEST_MODE_LOGGED_IN_ONLY) {
            // DO NOT COMMENT OUT THE FOLLOWING LINES
            // EVER
            // FOR ANY REASON
            // INSTEAD CHECK THE TEST MODE AND SET THE ADDRESS FIELDS ACCORDINGLY
            $oCommunication->to = $oCommunication->from;
            $oCommunication->cc = '';
        }

Now, I ask you, even if you’re not a programmer, you know there’s one thing you would never, ever, do to the above code. Right? Now let’s say you are a programmer, a professional, being paid because of your ability to find solutions to problems and express them in an abstract language.

Now further imagine that changing the above code can lead to the customers of the people paying for this work getting spammed with confusing emails with our client’s name on them.

Yeah, you guessed it.

1

Haloscan comments to WordPress – the nitty gritty.

As I mentioned in the previous episode, I recently had to move more than 8000 comments from my old comment system, Haloscan, and import them into WordPress. Haloscan served me well back in the day, but they are going away, and all my more recent comments are in the WordPress system anyway. Nice to have them all in one place.

The process turned out to be pretty easy. I found a script for importing comments from a different system, modified it, modified it some more, found a fundamental problem with it, fixed that, and in the end not much of code remained from the example, except the part where the WordPress logo is displayed on the screen. I assume that part came from the code the guy copied to make the code that I copied.

Along the way I learned a couple of things. PHP is a pretty flexible language, but running a loop that sets up 8500 data structures and runs 25500 database queries exposes PHP’s primary weakness: memory management. The whiz kids who invented PHP designed it for a load/compile/execute/exit-and-clean-up flow. Memory allocated during execution is cleaned up when the program is done running (usually when the Web page is delivered). When you try to do heavy lifting with PHP, you have to start paying attention to getting your memory back before the traditional clean-up time.

The code I started with did a direct database query to add the comment to the comments table, but that got things out of sync with other tables. (The posts table keeps track of the number of comments that apply to it, presumably for performance reasons.) I dug into the core WordPress code and found the method they call to post comments, and I made my code call that function. I have no idea what all the bookkeeping chores are that function does, and really I don’t care as long as they get done.

I didn’t worry about performance too much at first (after all, it only has to run once), but one of the database queries I did was really expensive (scanning all the posts for a specific set of characters). Even running on my local server it was slow, and I knew that if I tried something like that on my actual Web host alarms would go off and they’d shut me down for a while. I did a little optimization on that front, and it was enough.

The following script has some Muddle-specific code in it, but it might come in handy for others who need to move Haloscan comments to a new system. The part that parses Haloscan XML is pretty generic and would work for anyone, the part that saves the comments might be useful as a guide as well. The main difference others will have to deal with is where to get proper post_id based on the thread field in the XML. In my case I had a link in each blog episode back to the Haloscan thread.

The HTML bit in the middle of the file is not essential; but it puts a nice WordPress logo on the screen when the script starts up. I inherited that from the script I started with.

NOTE: While this script has code in it specific to me, I am available to customize it for others who need to move their code from Haloscan into another environment, or, for that matter, from any structured source into WordPress. Drop me a line!

<?php
 
if (!file_exists('../wp-config.php')) die("There doesn't seem to be a wp-config.php file. You must install WordPress before you import any comments.");
require('../wp-config.php');
 
function saveCommentToWP($comment, $dbRef, &$postThreads) {
    //echo "here's where the comment save happens <br/><br />";
    $thread = $comment['thread'];
    $postID = $postThreads[$thread];
    if (!isset($postThreads[$thread])) {
        $query = "SELECT * FROM wp_posts WHERE post_content LIKE '%".$thread."%' AND post_status='publish'";
        $postID = $dbRef->get_var($query, 0);
        $postThreads[$thread] = $postID ? $postID : 0;
        if ($postThreads[$thread] == 0)
            echo ("<br />Thread $thread has no post!");
        else
            echo "<br />Thread $thread";
        flush();       // got to have real-time updates!
    }
 
    if ($postID && $postID != 0) {
        $userId = $comment['email'] == '[email protected]' ? 1 : 0;
 
        //set up the data the way wp_insert_comment expects it.
        $wp_commentData = array();
        $wp_commentData['comment_post_ID'] = (int) $postID;
        $wp_commentData['user_id'] = (int) $userId;
        $wp_commentData['comment_parent'] = 0;
        $wp_commentData['comment_author_IP'] = $comment['ip'];
        $wp_commentData['comment_agent'] = 'Haloscan';
        $wp_commentData['comment_date'] = $comment['datetime'];
        $wp_commentData['comment_date_gmt'] = $comment['datetime'];
        $wp_commentData['comment_approved'] = '1';
        $wp_commentData['comment_content'] = $comment['text'];
        $wp_commentData['comment_author'] = $comment['name'];
        $wp_commentData['comment_author_email'] = $comment['email'];
        $wp_commentData = wp_filter_comment($wp_commentData);
 
        $comment_ID = wp_insert_comment($wp_commentData);
 
        //echo ("<strong>saved comment $comment_ID</strong>");
    }
 
    // try to reclaim some memory
    unset($wp_commentData);
    unset($comment);
}
 
header( 'Content-Type: text/html; charset=utf-8' );
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<title>WordPress &rsaquo; Import Comments from RSS</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style media="screen" type="text/css">
    body {
        font-family: Georgia, "Times New Roman", Times, serif;
        margin-left: 20%;
        margin-right: 20%;
    }
    #logo {
        margin: 0;
        padding: 0;
        background-image: url(http://wordpress.org/images/logo.png);
        background-repeat: no-repeat;
        height: 60px;
        border-bottom: 4px solid #333;
    }
    #logo a {
        display: block;
        text-decoration: none;
        text-indent: -100em;
        height: 60px;
    }
    p {
        line-height: 140%;
    }
    </style>
</head><body> 
<h1 id="logo"><a href="http://wordpress.org/">WordPress</a></h1> 
 
<?php
 
// Bring in the data
$reader = new XMLReader();
if ($reader->open('export-8.xml')) {
    $postThreads = array();
    $thread = '';
    while ($reader->read()) {
        //echo "<br />read node type: ".$reader->nodeType.';     '.$reader->name.': '.$reader->value;
        if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'thread') {
            $thread = $reader->getAttribute('id');
        }
        if ($thread) {
            if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'comment') {
                // begin building comment
                $comment = array('thread' => $thread);
                $reader->read();
                while ( !($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'comment') ) {
                    if ($reader->nodeType == XMLReader::ELEMENT) {
                        $property = $reader->name;
                        $reader->read(); // assumes text element following element tag has the data
                        $comment[$property] = $reader->value;
                    }
                    $reader->read();
                }
                saveCommentToWP($comment, $wpdb, $postThreads);
            }
        }
    }
    $reader->close();
}
 
?>
 
 
</body>
</html>

3