May. 20th, 2010

haggholm: (Default)

My pet peeve, and current candidate for leading cause of bugs that are subtle and difficult to track down:

Poor naming.

It may sound trivial (if it doesn’t, you’re already on my team), but having proper variable names, and especially proper function and method names, is in my opinion critical to having a stable and maintainable system. We’ve all seen and laughed at Daily WTF samples of tables named table47; we’ve all cringed at people who named their variables foo and bar…and these are bad, they impede understanding, but what’s even worse than incomprehensible names are misleading names.

It’s been said before but bears repeating (and repeating, and repeating): The names of entities in code are an extremely important part of your documentation. Code, it’s often said, is never out of date, unlike any other kind of documentation—this isn’t really true, but it should be true. If you write a function called getPersonId(), then it had damned well return a person ID or I will come down on you like the wrath of the heavens.

Of course, if things have entirely the wrong names (e.g. because the author of the code was an idiot), then it tends to be pretty obvious. If you request an object ID but receive a table row, you’ll catch on pretty quickly to the fact that the function does not do what you expect it to do. But hopefully, the code you work on was not written by idiots at all, but by at least reasonably competent developers who named things in a way that reflects what the code actually does. And code does not go out of date. Right?

Here’s the problem: Unless you unit test your code, and do so comprehensively, the code can perfectly well go out of date. Here’s what, in my experience, happens: A developer writes a function to accomplish a task. He names it properly, uses it properly, and if possible slaps a few unit tests on it. Later, he discovers that while the function does what it should, the task isn’t quite what he expected. Perhaps fetchFooEntities(), rather than looking up all Foo entities, was written for a piece of the system that should really look up just the subset of active Foo entities. So he refactors the code accordingly. No other code needs refactoring because his was, to date, the only one that called this function.

And voilà!—the system now has a misleading function name. The code, at least the function name, is out of date, because fetchFooEntities() does the job of a function that should be called fetchActiveFooEntities(). The next unsuspecting developer who comes along will see that there’s a function to fetch Foo entities, and that (since it’s not parameterised) it it fetches them all. The function has a straightforward name, but what it actually does is subtly different—therefore there will be bugs. And because the difference is subtle, the bug will be subtle, too.

Please make sure that you give your functions and your variables appropriate and descriptive names. And please, if you change the semantics of those functions or variables, change the names accordingly.

haggholm: (Default)

PHP, among other problems, is a dynamically and (problematically) weakly typed language. What this means is that variables are cast, willy-nilly, to work in whatever fashion the programmer or the PHP interpreter feels is appropriate for the occasion. For example, a string "1" is equivalent to the integer value 1. Or at least equivalent-ish.

The equality test operator, ==, is defined in PHP for strings as for other built-in types. However, as the official documentation states,

If you compare a number with a string or the comparison involves numerical strings, then each string is converted to a number and the comparison performed numerically. These rules also apply to the switch statement.

And:

When a string is evaluated in a numeric context, the resulting value and type are determined as follows.

If the string does not contain any of the characters '.', 'e', or 'E' and the numeric value fits into integer type limits (as defined by PHP_INT_MAX), the string will be evaluated as an integer . In all other cases it will be evaluated as a float .

The value is given by the initial portion of the string . If the string starts with valid numeric data, this will be the value used. Otherwise, the value will be 0 (zero). Valid numeric data is an optional sign, followed by one or more digits (optionally containing a decimal point), followed by an optional exponent. The exponent is an 'e' or 'E' followed by one or more digits.

In the typical PHP context, where scripts are expected to deal with form input and so forth, this seems to make a lot of sense—everything arrives as string data, but the string "123" clearly encodes a number. Well, if it all worked properly, maybe it wouldn’t be so bad. But note that little subtlety above, that you might not expect if you hadn’t either seen it or read it in the docs: If the string starts with valid numeric data, this will be the value used. Otherwise, the value will be 0 (zero). This means that the following are all true:

"1" == 1
"a" != 1
"a" == 0

Yes—because any string that isn’t a number gets converted to zero, this is what you get. I saw this cause a nasty bug only today. (Personally, I prefer strcmp() et al for string comparisons. It’s clunky, but at least I know what it does, in all cases…I think. This is PHP, so one can never be quite sure.)

Another subtle consequence of the (accurate) definition from the documentation: If you compare a number with a string or the comparison involves numerical strings…, then it performs a numeric conversion. Thus, if at least one operand is an integer, the comparison is numeric; if both operands are numeric strings, the comparison is numeric; if both operands are strings, but only one of them is numeric, then it’s a regular string comparison. This makes sense…sort of…but combined with the 0 quirk above, this means that equality in PHP is not always transitive!

"a" ==  0  // true
  0 == "0" // true
"a" == "0" // false!

Normally you expect equality to be transitive, that is, if A = B and B = C, then obviously A = C. In PHP, though, this is not necessarily true: "a" = 0 and 0 ="0", but "a""0"! This follows the specification presented by the PHP documentation (the last comparison fails because both operands are strings but only one of them is numeric, so the comparison is lexical), but it doesn’t make much mathematical or common sense.

In fact, since a given binary relation ~ on a set A is said to be an equivalence relation if and only if it is reflexive, symmetric and transitive [Wikipedia], the “Equal” operator == in PHP is not, in fact, a valid equivalence relation at all.


This is not the only problem, however. A different problem—and one that, unlike the 0-comparisons above, I do not find mentioned or justified in the documentation, is that integers are parsed differently by the regular parser and the string conversion parser. This baffles me; not only is it stupid and weird, but it’s also strange that they don’t just reuse the same routines. The problem is introduced by the fact that PHP, like many other languages, accept integer literals in base 10 (decimal), base 16 (hexadecimal), and base 8 (octal). Octal integer literals are denoted by prefixing them with a 0, thus 01 means “1 in octal notation” (which equals 1 in decimal notation), 010 means “10 in octal notation” (which equals 9 in decimal notation), and 09 is invalid—it means “9 in octal notation”, which makes no sense.

Well, it turns out that for reasons best known to the PHP developers themselves, the automatic conversion of strings to numbers in PHP is handled by something analogous to the C library function strtod(), whose input format is described as such:

The expected form of the (initial portion of the) string is optional leading white space as recognized by isspace(3), an optional plus ('+') or minus sign ('-') and then either (i) a decimal number, or (ii) a hexadecimal number, or (iii) an infinity, or (iv) a NAN (not-a-number).

In other words, integer literals in PHP accept octal notation, but automatic conversions of strings to integers do not. Thus,

   01 == 1   // true, fine
 "01" == 1   // true, fine
  010 == 10  // false, fine -- it's equal to 8
"010" == 10  // true! The conversion assumes decimal notation
"010" == 010 // false!

This also means that casting a string $s to an integer, $x = (int)$s, is not equivalent to evaling it, eval("\$x = {$s}").

On a side note, octal numbers are handled in a pretty weird way to begin with. As the documentation warns you,

If an invalid digit is given in an octal integer (i.e. 8 or 9), the rest of the number is ignored.

Thus,

"09" == 0  // true
"09" == 09 // false; recall that "09" is decimal

This form of behaviour is why I dislike PHP so intensely. As the Zen of Python reminds us, Explicit is better than implicit, and Errors should never pass silently (Unless explicitly silenced). A language that silently squashes errors and returns 0 or null or some similar “empty-ish” value instead of warning you that something went wrong is a language that is not engineered to help you discover your errors, a language that would rather let you produce incorrect output than crash. (Crashing is way better than incorrect output. At least you know something is wrong. Silent logic errors kill people and crash space probes.)


Keep in mind when you code that in general you don’t know whether some string may be numeric or not—if it’s input (direct user input, data from a database, what have you), then the string might happen to be numeric, and you won’t know unless you check (e.g. with is_numeric()).

If you can’t get away from PHP (always an attractive option), I suggest that you stick with strcmp() and its relatives (strncmp(), strcasecmp(), and so on) if you want to compare strings, and explicit casts to integers (or floats), with validation (cf. is_numeric()), if you want to compare numbers. The bugs that are likely to arise from the inconsistencies above may be rare, but they can be subtle and they can be damnably annoying.


For the sake of completeness, the script that I used to discover and verify the above:

<?php

function run_test($test_string) {
	eval("\$result = ($test_string) ? 'true' : 'false';");
	echo "$test_string => $result\n";
}

$tests = array(
	'      "1" == 1        ',
	'      "a" == 1        ',
	'      "a" == 0        ',
	'      "a" == "0"      ',
	'      "0" == 0        ',
	'     "01" == 1        ',
	'    "010" == 10       ',
	'      010 == 10       ',
	'    "010" == 010      ',
	'   "0x10" == 0x10     ',
	'     "09" == 09       ',
	'      "0" == 09       ',
	'      "0" == "09"     ',
	'      "a" == 1e-1000  ',
	'  1e-1000 == "1e-1000"',
	'"1e-1000" == "0"      ',
	'"1e-1000" == "a"      ',
);

foreach ($tests as $test) {
	run_test($test);
}

$s = "010";
echo "\"$s\" == (int)\"$s\" ? " . ($s==(int)$s ? 'yes' : 'no') . "\n";
eval("\$x = {$s};");
echo "\"$s\" == $s ? " . ($s==$x ? 'yes' : 'no') . "\n";

Profile

haggholm: (Default)
Petter Häggholm

July 2025

S M T W T F S
  12 345
6789101112
13141516171819
20212223242526
2728293031  

Most Popular Tags