haggholm: (Default)

PHP, among other problems, is a dynamically and (problematically) weakly typed language. What this means is that variables are cast, willy-nilly, to work in whatever fashion the programmer or the PHP interpreter feels is appropriate for the occasion. For example, a string "1" is equivalent to the integer value 1. Or at least equivalent-ish.

The equality test operator, ==, is defined in PHP for strings as for other built-in types. However, as the official documentation states,

If you compare a number with a string or the comparison involves numerical strings, then each string is converted to a number and the comparison performed numerically. These rules also apply to the switch statement.

And:

When a string is evaluated in a numeric context, the resulting value and type are determined as follows.

If the string does not contain any of the characters '.', 'e', or 'E' and the numeric value fits into integer type limits (as defined by PHP_INT_MAX), the string will be evaluated as an integer . In all other cases it will be evaluated as a float .

The value is given by the initial portion of the string . If the string starts with valid numeric data, this will be the value used. Otherwise, the value will be 0 (zero). Valid numeric data is an optional sign, followed by one or more digits (optionally containing a decimal point), followed by an optional exponent. The exponent is an 'e' or 'E' followed by one or more digits.

In the typical PHP context, where scripts are expected to deal with form input and so forth, this seems to make a lot of sense—everything arrives as string data, but the string "123" clearly encodes a number. Well, if it all worked properly, maybe it wouldn’t be so bad. But note that little subtlety above, that you might not expect if you hadn’t either seen it or read it in the docs: If the string starts with valid numeric data, this will be the value used. Otherwise, the value will be 0 (zero). This means that the following are all true:

"1" == 1
"a" != 1
"a" == 0

Yes—because any string that isn’t a number gets converted to zero, this is what you get. I saw this cause a nasty bug only today. (Personally, I prefer strcmp() et al for string comparisons. It’s clunky, but at least I know what it does, in all cases…I think. This is PHP, so one can never be quite sure.)

Another subtle consequence of the (accurate) definition from the documentation: If you compare a number with a string or the comparison involves numerical strings…, then it performs a numeric conversion. Thus, if at least one operand is an integer, the comparison is numeric; if both operands are numeric strings, the comparison is numeric; if both operands are strings, but only one of them is numeric, then it’s a regular string comparison. This makes sense…sort of…but combined with the 0 quirk above, this means that equality in PHP is not always transitive!

"a" ==  0  // true
  0 == "0" // true
"a" == "0" // false!

Normally you expect equality to be transitive, that is, if A = B and B = C, then obviously A = C. In PHP, though, this is not necessarily true: "a" = 0 and 0 ="0", but "a""0"! This follows the specification presented by the PHP documentation (the last comparison fails because both operands are strings but only one of them is numeric, so the comparison is lexical), but it doesn’t make much mathematical or common sense.

In fact, since a given binary relation ~ on a set A is said to be an equivalence relation if and only if it is reflexive, symmetric and transitive [Wikipedia], the “Equal” operator == in PHP is not, in fact, a valid equivalence relation at all.


This is not the only problem, however. A different problem—and one that, unlike the 0-comparisons above, I do not find mentioned or justified in the documentation, is that integers are parsed differently by the regular parser and the string conversion parser. This baffles me; not only is it stupid and weird, but it’s also strange that they don’t just reuse the same routines. The problem is introduced by the fact that PHP, like many other languages, accept integer literals in base 10 (decimal), base 16 (hexadecimal), and base 8 (octal). Octal integer literals are denoted by prefixing them with a 0, thus 01 means “1 in octal notation” (which equals 1 in decimal notation), 010 means “10 in octal notation” (which equals 9 in decimal notation), and 09 is invalid—it means “9 in octal notation”, which makes no sense.

Well, it turns out that for reasons best known to the PHP developers themselves, the automatic conversion of strings to numbers in PHP is handled by something analogous to the C library function strtod(), whose input format is described as such:

The expected form of the (initial portion of the) string is optional leading white space as recognized by isspace(3), an optional plus ('+') or minus sign ('-') and then either (i) a decimal number, or (ii) a hexadecimal number, or (iii) an infinity, or (iv) a NAN (not-a-number).

In other words, integer literals in PHP accept octal notation, but automatic conversions of strings to integers do not. Thus,

   01 == 1   // true, fine
 "01" == 1   // true, fine
  010 == 10  // false, fine -- it's equal to 8
"010" == 10  // true! The conversion assumes decimal notation
"010" == 010 // false!

This also means that casting a string $s to an integer, $x = (int)$s, is not equivalent to evaling it, eval("\$x = {$s}").

On a side note, octal numbers are handled in a pretty weird way to begin with. As the documentation warns you,

If an invalid digit is given in an octal integer (i.e. 8 or 9), the rest of the number is ignored.

Thus,

"09" == 0  // true
"09" == 09 // false; recall that "09" is decimal

This form of behaviour is why I dislike PHP so intensely. As the Zen of Python reminds us, Explicit is better than implicit, and Errors should never pass silently (Unless explicitly silenced). A language that silently squashes errors and returns 0 or null or some similar “empty-ish” value instead of warning you that something went wrong is a language that is not engineered to help you discover your errors, a language that would rather let you produce incorrect output than crash. (Crashing is way better than incorrect output. At least you know something is wrong. Silent logic errors kill people and crash space probes.)


Keep in mind when you code that in general you don’t know whether some string may be numeric or not—if it’s input (direct user input, data from a database, what have you), then the string might happen to be numeric, and you won’t know unless you check (e.g. with is_numeric()).

If you can’t get away from PHP (always an attractive option), I suggest that you stick with strcmp() and its relatives (strncmp(), strcasecmp(), and so on) if you want to compare strings, and explicit casts to integers (or floats), with validation (cf. is_numeric()), if you want to compare numbers. The bugs that are likely to arise from the inconsistencies above may be rare, but they can be subtle and they can be damnably annoying.


For the sake of completeness, the script that I used to discover and verify the above:

<?php

function run_test($test_string) {
	eval("\$result = ($test_string) ? 'true' : 'false';");
	echo "$test_string => $result\n";
}

$tests = array(
	'      "1" == 1        ',
	'      "a" == 1        ',
	'      "a" == 0        ',
	'      "a" == "0"      ',
	'      "0" == 0        ',
	'     "01" == 1        ',
	'    "010" == 10       ',
	'      010 == 10       ',
	'    "010" == 010      ',
	'   "0x10" == 0x10     ',
	'     "09" == 09       ',
	'      "0" == 09       ',
	'      "0" == "09"     ',
	'      "a" == 1e-1000  ',
	'  1e-1000 == "1e-1000"',
	'"1e-1000" == "0"      ',
	'"1e-1000" == "a"      ',
);

foreach ($tests as $test) {
	run_test($test);
}

$s = "010";
echo "\"$s\" == (int)\"$s\" ? " . ($s==(int)$s ? 'yes' : 'no') . "\n";
eval("\$x = {$s};");
echo "\"$s\" == $s ? " . ($s==$x ? 'yes' : 'no') . "\n";
haggholm: (Default)

From: [Me]
To: [People]
Subject: [something pertaining to Excel spreadsheet problems]

…The issue is that the data stored are not the same as the data displayed. The Excel parser we use does not convert date cells to strings we can parse. And the reason why we've never encountered it before is that we always used CSV files rather than Excel spreadsheets...

However, it DOES have access to the format, e.g. date cells are tagged as type 3, and I managed to find out that Excel stores dates as the number of days since January 1, 1900, so I have modified the parser to convert type-3 cells to formatted datestamps offset from that date. (Actually, it wasn't quite that simple since PHP for stupid reasons cannot represent the year 1900 in datestamps!, so I had to use a workaround wherein I used the Unix Epoch as an offset...but the basic principle remains the same.)

I should have this tested, reviewed, and uploaded before lunchtime.

A bit of a weird and frustrating problem, but I love this stuff, deep down. It’s interesting.

haggholm: (Default)

Parse error: syntax error, unexpected T_PAAMAYIM_NEKUDOTAYIM, expecting '}' in /var/www/htdocs/webeval/erez/classes/assignment/assignmentSQLgen.php on line 251

According to Wikipedia,

Paamayim Nekudotayim (פעמיים נקודתיים pronounced [paʔamajim nəkudotajim]) is a name for the Scope Resolution Operator (::) in PHP. It means "twice colon" or "double colon" in Hebrew.

Nekudotayim (נקודתיים) means 'colon'; it comes from nekuda (IPA: [nəkuda]), 'point' or 'dot', and the dual suffix ayim (יים-), hence 'two points'. Similarly, the word paamayim (פעמיים) is derived by attaching the dual suffix to paam (IPA: [paʔam]) ('one time' or 'once'), thus yielding 'twice'.

The name was introduced in the Israeli-developed Zend Engine 0.5 used in PHP 3. Although it has been confusing to many developers, it is still being used in PHP 5.

…Of course.

haggholm: (Default)
// 1.
$map[$value] = ($value == $req['value']) ? 1.0 : 0.0;

// 2.
$map[$value] = ($value == $req['value']) ? 1.0 : 0;

Can anyone think of any reason whatsoever why these two statements should behave differently? If you had told me they would, I would have laughed derisively. And yet, PHP 5.2.6† at least thinks that they are not merely different, but critically so: While (2) works, (1) results in a syntax error:

Parse error: syntax error, unexpected T_DNUMBER in [...].php on line 232

Note that

  1. the literal 0.0 is not illegal in general, and
  2. the statement fails with other floating-point literals, too—it may be irrelevant to write 0.0 rather than 0, but I also couldn’t write 0.5 if that were what I needed.

What the hell is this lunacy‽

Update: This must be a bug, not (another) idiotic design feature: It raises a parse error when I run it through Apache/mod_php‡, but not with the CLI version of the PHP interpreter. On the other hand, why on Earth should the two use different parsers…? The mystery only deepens.

petter@petter-office:~/temp$ php --version
PHP 5.2.6-2ubuntu4 with Suhosin-Patch 0.9.6.2 (cli) (built: Oct 14 2008 20:06:32)
Copyright (c) 1997-2008 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2008 Zend Technologies
    with Xdebug v2.0.3, Copyright (c) 2002-2007, by Derick Rethans

‡ I often wonder if it isn’t really mod_intercal. PHP is but a PLEASE and a COME FROM away from being INTERCAL 2.0 (for the Web).

haggholm: (Default)

At work, as previously mentioned, I'm a great champion of phpDocumentor, and I take some pride of being the moving (read: nagging) force that resulted in our now having a repository of generated documentation for the core parts of our system. Unfortunately, there's a bug that prevents us from running it on the entire codebase, so we have to run it in pieces on the really interesting bits—run it on our whole codebase and it'll eat up all memory and die horribly. (It even dies on my desktop, which has a respectable 4 GiB of RAM. It really shouldn't. Our codebase isn't huge.)

One solution that has been proposed (as seen on the previously referenced bug page) is to move the data handling part of the documentor to a[n] SQLite database instead of keeping it all in memory. Apparently, PHP's string handling is rather inefficient, and there are a lot of cross references—this is why we use a parsing documentor rather than something that merely reads docblock headers, after all!

So I figure, what the hell, I have some free time to kill, I'm an open source geek, and improving this would make work life easier. At the same time, I can't afford to do this at work—or rather, my company can't afford to spend developer time on external tools rather than our own software—so I figured I might spend some evenings taking a look at this and seeing if I can make any headway.

A quick look at the codebase tells me that…well, the phpDocumentor code references all the memory bits through arrays rather than method calls, and it makes use of a lot of public class members. In fact, if I make all the member variables of one of the classes I suspect would need some surgery protected, the program ceases to work. There are many classes, many with a good ten or twenty public variables, promiscuously referencing the public members of other classes (ohoho), and no data encapsulation API whatsoever to hijack.

I'd like to see this improve. I'm presently a bit uncertain of whether I'd really like to spend my free time on it…

haggholm: (Default)

When comparing an expression with a bool, PHP does not perform type widening, but treats it as evaluation of a Boolean expression.

// true:
print_r((true == 8) ? 'true' : 'false');
// false:
print_r(((int)true == 8) ? 'true' : 'false');
haggholm: (Default)

I'm feeling very bloggy today.

Anyway, I'm getting more comfortable with PHPUnit, and although I've spoken of it in near-monosyllables here before, I haven't really written a ground-up post talking about what it is, what it does, and why you'd be a fool to write PHP code without it (or something like it; there are other PHP testing frameworks).


Unit testing theory, 101 )
PHPUnit )
Code coverage reports (with PHPUnit) )
Wrapping it up (in an XML configuration file) )
haggholm: (Default)

Once again, the weak typing and checks of PHP drive me to the brink of insanity (or still farther past it, depending on your definitions). Consider the following: I am picking a random element from an array and returning it, optionally removing it, to tweak test parameters. The array is associative, so I need to return a pair of $key, $value. I whip out iteration #1 of the utility function, which just returns array($key => $value). Makes sense, right? Key => value. Now I go on with my main testing function:

$arr = array(...);
list($key, $value) = $this->extractRandomArrayElement($arr);
// $key is null???
// $value is null???

If you've been paying attention, you should be laughing at me (in my lame defence, I haven't had any coffee yet today). Of course the list construct assumes that I'm returning array($key, $value) while I'm returning array($key => $value)—a significant difference, and I'm trying to use list with two elements to extract a single-element array. (list is meant for numeric arrays, anyway.) Of course this code should fail. But it fails silently. The standard modus operandi for PHP when you do something completely nonsensical appears to be not to throw an exception and die (as Python would) or a fatal error and die (as PHP at least does on static syntax errors), but to assign null to everyone concerned and go on as though nothing had happened.

This is not helpful in the least.

haggholm: (Default)

PHP can be kind of tortuous, and the interpreter did crash a lot when I installed the penultimate version of the xdebug module, but now I do have a setup with PHP, PHPUnit, and xdebug. Generating code coverage reports is excruciatingly slow (I need to improve my filters, but hopefully-exhaustive testing that has to go through the nice but sluggish PEAR::MDB2 will be slow whatever I do)—but damn, those reports are nice.

haggholm: (Default)

This is why PHP fails to make me happy. —Well, that's not quite accurate: Say rather that this is one of the many ways in which it so fails:

// This gives me one result...
$should_match = ( $desired_value and ($db_value == PREF_TYPE_YES)) or
                (!$desired_value and ($db_value == PREF_TYPE_NO));

// ...This gives me another result. Note the identical logic.
if (( $desired_value and ($db_value == PREF_TYPE_YES)) or
    (!$desired_value and ($db_value == PREF_TYPE_NO)))
{
	$should_match = true;
}
else
{
	$should_match = false;
}

What am I missing?

Update:

David poked at this after I threw my hands up in disgust. It turns out to be a good old precedence issue, since, sensibly enough, or does not have the same precedence as ||, and similarly, and is different from &&. Therefore,

$x = true or false;
// is not equivalent to
$x = true || false;
// but instead to
($x = true) or false;
// rather than the
$x = (true or false);
// that I expected, and that || provides.

I see my mistake, but whoever made this design decision: I hate you.

Syndicate

RSS Atom

Most Popular Tags