haggholm: (Default)

PHP, among other problems, is a dynamically and (problematically) weakly typed language. What this means is that variables are cast, willy-nilly, to work in whatever fashion the programmer or the PHP interpreter feels is appropriate for the occasion. For example, a string "1" is equivalent to the integer value 1. Or at least equivalent-ish.

The equality test operator, ==, is defined in PHP for strings as for other built-in types. However, as the official documentation states,

If you compare a number with a string or the comparison involves numerical strings, then each string is converted to a number and the comparison performed numerically. These rules also apply to the switch statement.


When a string is evaluated in a numeric context, the resulting value and type are determined as follows.

If the string does not contain any of the characters '.', 'e', or 'E' and the numeric value fits into integer type limits (as defined by PHP_INT_MAX), the string will be evaluated as an integer . In all other cases it will be evaluated as a float .

The value is given by the initial portion of the string . If the string starts with valid numeric data, this will be the value used. Otherwise, the value will be 0 (zero). Valid numeric data is an optional sign, followed by one or more digits (optionally containing a decimal point), followed by an optional exponent. The exponent is an 'e' or 'E' followed by one or more digits.

In the typical PHP context, where scripts are expected to deal with form input and so forth, this seems to make a lot of sense—everything arrives as string data, but the string "123" clearly encodes a number. Well, if it all worked properly, maybe it wouldn’t be so bad. But note that little subtlety above, that you might not expect if you hadn’t either seen it or read it in the docs: If the string starts with valid numeric data, this will be the value used. Otherwise, the value will be 0 (zero). This means that the following are all true:

"1" == 1
"a" != 1
"a" == 0

Yes—because any string that isn’t a number gets converted to zero, this is what you get. I saw this cause a nasty bug only today. (Personally, I prefer strcmp() et al for string comparisons. It’s clunky, but at least I know what it does, in all cases…I think. This is PHP, so one can never be quite sure.)

Another subtle consequence of the (accurate) definition from the documentation: If you compare a number with a string or the comparison involves numerical strings…, then it performs a numeric conversion. Thus, if at least one operand is an integer, the comparison is numeric; if both operands are numeric strings, the comparison is numeric; if both operands are strings, but only one of them is numeric, then it’s a regular string comparison. This makes sense…sort of…but combined with the 0 quirk above, this means that equality in PHP is not always transitive!

"a" ==  0  // true
  0 == "0" // true
"a" == "0" // false!

Normally you expect equality to be transitive, that is, if A = B and B = C, then obviously A = C. In PHP, though, this is not necessarily true: "a" = 0 and 0 ="0", but "a""0"! This follows the specification presented by the PHP documentation (the last comparison fails because both operands are strings but only one of them is numeric, so the comparison is lexical), but it doesn’t make much mathematical or common sense.

In fact, since a given binary relation ~ on a set A is said to be an equivalence relation if and only if it is reflexive, symmetric and transitive [Wikipedia], the “Equal” operator == in PHP is not, in fact, a valid equivalence relation at all.

This is not the only problem, however. A different problem—and one that, unlike the 0-comparisons above, I do not find mentioned or justified in the documentation, is that integers are parsed differently by the regular parser and the string conversion parser. This baffles me; not only is it stupid and weird, but it’s also strange that they don’t just reuse the same routines. The problem is introduced by the fact that PHP, like many other languages, accept integer literals in base 10 (decimal), base 16 (hexadecimal), and base 8 (octal). Octal integer literals are denoted by prefixing them with a 0, thus 01 means “1 in octal notation” (which equals 1 in decimal notation), 010 means “10 in octal notation” (which equals 9 in decimal notation), and 09 is invalid—it means “9 in octal notation”, which makes no sense.

Well, it turns out that for reasons best known to the PHP developers themselves, the automatic conversion of strings to numbers in PHP is handled by something analogous to the C library function strtod(), whose input format is described as such:

The expected form of the (initial portion of the) string is optional leading white space as recognized by isspace(3), an optional plus ('+') or minus sign ('-') and then either (i) a decimal number, or (ii) a hexadecimal number, or (iii) an infinity, or (iv) a NAN (not-a-number).

In other words, integer literals in PHP accept octal notation, but automatic conversions of strings to integers do not. Thus,

   01 == 1   // true, fine
 "01" == 1   // true, fine
  010 == 10  // false, fine -- it's equal to 8
"010" == 10  // true! The conversion assumes decimal notation
"010" == 010 // false!

This also means that casting a string $s to an integer, $x = (int)$s, is not equivalent to evaling it, eval("\$x = {$s}").

On a side note, octal numbers are handled in a pretty weird way to begin with. As the documentation warns you,

If an invalid digit is given in an octal integer (i.e. 8 or 9), the rest of the number is ignored.


"09" == 0  // true
"09" == 09 // false; recall that "09" is decimal

This form of behaviour is why I dislike PHP so intensely. As the Zen of Python reminds us, Explicit is better than implicit, and Errors should never pass silently (Unless explicitly silenced). A language that silently squashes errors and returns 0 or null or some similar “empty-ish” value instead of warning you that something went wrong is a language that is not engineered to help you discover your errors, a language that would rather let you produce incorrect output than crash. (Crashing is way better than incorrect output. At least you know something is wrong. Silent logic errors kill people and crash space probes.)

Keep in mind when you code that in general you don’t know whether some string may be numeric or not—if it’s input (direct user input, data from a database, what have you), then the string might happen to be numeric, and you won’t know unless you check (e.g. with is_numeric()).

If you can’t get away from PHP (always an attractive option), I suggest that you stick with strcmp() and its relatives (strncmp(), strcasecmp(), and so on) if you want to compare strings, and explicit casts to integers (or floats), with validation (cf. is_numeric()), if you want to compare numbers. The bugs that are likely to arise from the inconsistencies above may be rare, but they can be subtle and they can be damnably annoying.

For the sake of completeness, the script that I used to discover and verify the above:


function run_test($test_string) {
	eval("\$result = ($test_string) ? 'true' : 'false';");
	echo "$test_string => $result\n";

$tests = array(
	'      "1" == 1        ',
	'      "a" == 1        ',
	'      "a" == 0        ',
	'      "a" == "0"      ',
	'      "0" == 0        ',
	'     "01" == 1        ',
	'    "010" == 10       ',
	'      010 == 10       ',
	'    "010" == 010      ',
	'   "0x10" == 0x10     ',
	'     "09" == 09       ',
	'      "0" == 09       ',
	'      "0" == "09"     ',
	'      "a" == 1e-1000  ',
	'  1e-1000 == "1e-1000"',
	'"1e-1000" == "0"      ',
	'"1e-1000" == "a"      ',

foreach ($tests as $test) {

$s = "010";
echo "\"$s\" == (int)\"$s\" ? " . ($s==(int)$s ? 'yes' : 'no') . "\n";
eval("\$x = {$s};");
echo "\"$s\" == $s ? " . ($s==$x ? 'yes' : 'no') . "\n";
haggholm: (Default)

My pet peeve, and current candidate for leading cause of bugs that are subtle and difficult to track down:

Poor naming.

It may sound trivial (if it doesn’t, you’re already on my team), but having proper variable names, and especially proper function and method names, is in my opinion critical to having a stable and maintainable system. We’ve all seen and laughed at Daily WTF samples of tables named table47; we’ve all cringed at people who named their variables foo and bar…and these are bad, they impede understanding, but what’s even worse than incomprehensible names are misleading names.

It’s been said before but bears repeating (and repeating, and repeating): The names of entities in code are an extremely important part of your documentation. Code, it’s often said, is never out of date, unlike any other kind of documentation—this isn’t really true, but it should be true. If you write a function called getPersonId(), then it had damned well return a person ID or I will come down on you like the wrath of the heavens.

Of course, if things have entirely the wrong names (e.g. because the author of the code was an idiot), then it tends to be pretty obvious. If you request an object ID but receive a table row, you’ll catch on pretty quickly to the fact that the function does not do what you expect it to do. But hopefully, the code you work on was not written by idiots at all, but by at least reasonably competent developers who named things in a way that reflects what the code actually does. And code does not go out of date. Right?

Here’s the problem: Unless you unit test your code, and do so comprehensively, the code can perfectly well go out of date. Here’s what, in my experience, happens: A developer writes a function to accomplish a task. He names it properly, uses it properly, and if possible slaps a few unit tests on it. Later, he discovers that while the function does what it should, the task isn’t quite what he expected. Perhaps fetchFooEntities(), rather than looking up all Foo entities, was written for a piece of the system that should really look up just the subset of active Foo entities. So he refactors the code accordingly. No other code needs refactoring because his was, to date, the only one that called this function.

And voilà!—the system now has a misleading function name. The code, at least the function name, is out of date, because fetchFooEntities() does the job of a function that should be called fetchActiveFooEntities(). The next unsuspecting developer who comes along will see that there’s a function to fetch Foo entities, and that (since it’s not parameterised) it it fetches them all. The function has a straightforward name, but what it actually does is subtly different—therefore there will be bugs. And because the difference is subtle, the bug will be subtle, too.

Please make sure that you give your functions and your variables appropriate and descriptive names. And please, if you change the semantics of those functions or variables, change the names accordingly.

haggholm: (Default)

A horrible bug was causing trouble for our clients. Yesterday, I hunted it down and fixed it (as I thought), which involved changes to both PHP code and Javascript. Today, it seems that the bug is still occurring… Naturally, I am unable to reproduce it.

Hypothesis: The bug fix I created works. If so, any client with the latest code properly deployed should not experience this problem. However, since some of the changes were made to Javascript, I do not and cannot know whether all the fix was “properly” deployed: Some of the users may have stale versions of the Javascript files in their browser (or proxy) caches, and I can’t detect, let alone fix this problem.

Hack: Rename the Javascript file. This causes no problems (unlike CVS, subversion will after all keep track of the change history across renames), and clients are forced to reload it (they cannot use a stale, cached copy when the request is ostentatiously for a different resource).

End result: Who knows? Either my fix was good and this should clean up stale caches; or I was wrong, the fix didn’t address all cases of the problem in the first place. I have no way of knowing until and unless I receive more automatic error reports. Hopefully I won’t at all, which means I’ll never be quite sure that it’s gone.

I hate bugs I cannot reproduce.

haggholm: (Default)

Good thing: OpenSSH 4.9+ offers Match and ChrootDirectory directives which can filter users by group and chroot-jail them to their home directories.

Weird thing: ChrootDirectory requires that the full path up to and including the home directory be writable by root only. This means that the users must not have write permissions on their own home directories. As far as I can tell, I can only make this useful by creating user-writable subdirectories inside. (This works fine for our purposes, but is, well, sort of bizarre: Home directories that the users cannot write to!)

Bloody annoying thing: RHEL 5 comes with, I believe, OpenSSH 4.3. Versions with ChrootDirectory have been around for years, but naturally RHEL is a few more years behind, so I have to create my own package to get a chroot capable SSH setup. It’s not hard, but it is annoying and adds a maintenance burden.

haggholm: (Default)

It’s extremely frustrating to have to wait for over ten minutes when you’re ready to commit some new code, just because you have to wait for a big, slow unit test suite to complete. It’s also frustrating when you’re actively addressing a known bug that’s been exposed by unit tests and, having made a change that will hopefully fix it, sit and twiddle your thumbs as the tests re-run. Efficiency matters, even in unit tests.

I’ve spent a few workdays attacking the test suite for the module I’m working on with the proper tools—a profiler and KCacheGrind, a profiling data visualiser. By figuring out where the test suite spent most of its time and optimising the slow parts (largely by caching data that were recomputed superfluously, caching prepared statements, etc.), I cut down the expected running time for company-wide unit tests by an estimated 10% and my own module’s tests by approximately 80%—an improvement by a factor of 5, from 12:31 to 2:40!

Of course this number is going to creep up as the test suite grows, coverage improves, and setup becomes more involved. However, that’s all the more reason to do this, and just means that it may become relevant to do it again at some point in the future.

As a bonus, the majority of the performance improvements were to business code exercised by the unit tests rather than code exclusive to the test framework, so application performance will be improved as well. I should be cautious in my conclusions here, though: While there will be improvements, some of the code exercised very heavily by unit tests is not run very frequently by users.

haggholm: (Default)

Interesting and peculiar. It turns out that Tonya’s way of deleting entries is to just delete everything that is not resubmitted. This should work, but it fails on the last entry. The reason why it doesn’t work is a little bit subtle and weird.

The query in question is

db()->execPrintf('DELETE FROM am_releases_templates WHERE release_id = %i AND id NOT IN %@i', $release_id, array_keys($template_ids));

The question is, what happens when $template_ids is empty? What does printfQuery() do? printfQuery() is mine, of course, so I should know, and what I did was to pass in the tuple (NULL), since SQL considers NULL not equal to anything. So, I thought, for any value x, `x IN (NULL)` should be false—and consequently, `x NOT IN (NULL)` must be true. Stupidly, I didn’t test and verify this.

It turns out that MySQL returns an empty result set when you compare against the tuple (NULL). That is, `...AND id NOT IN (NULL)` is *not* the complement of `...AND id IN (NULL)`, so the union of `x and not x` is...an empty set, rather than all the elements. This is rather weird.

Conclusion: I really don’t like MySQL.

Update: Not just MySQL, but SQL in general, it seems.

haggholm: (Default)

As regular readers here all know, I have a number of issues at work—but mostly, that’s to be expected, inevitable, and not beyond my ability to deal with. I don’t expect any job is always fun and interesting; every job is bound to have times of stress; and in spite of occasional periods of very intense frustration, there’s no co-worker I have to interact with that I never get along with. (True, some I get along with much less frequently than others…)

But there is one thing that always bugs me, and it’s my work environment. In the old office location, I shared an office with two co-workers. When we moved into the new building downtown, the three of us, again, shared an office—an improvement, even, as I now had a window. Some re-shuffling occurred, and we moved into yet another room (within the same office), but I still had a window at my back. Then, finally, too much space was needed, and support staff needed private offices to reduce noise pollution as they spend all day on the phone, and we ended up booted out of our office.

I now share a cubicle with one co-worker. Behind me and on my left side are actual walls. To my left and above me is a vent, which blows cold air on me in the winter, but fails to give any impression of fresh air. To my right is a cubicle wall; in front of me is my co-worker, and half a cubicle wall (and a “doorway”). The lighting is fluorescent. I can’t see any windows from my desk. In this little pocket, air flow is poor in spite of the vent that had me shivering in the winter, and I often feel lethargic (even by my afternoon standards); stifled as from lack of oxygen. There are people in this office I might kill for access to an open window…

At home, I tend to keep my curtains drawn to get rid of glare, but that’s still very different from this dank enclosure. I have light curtains; they remove glare, but admit enough light that I have no need for artificial lighting at home during the day. When I do turn on lights, of course, they’re incandescent, or compact fluorescent lamps that emit light more similar to incandescents than to fluorescent ceiling lights. In this cubicle, I’m stuck under unchanging fluorescent light.

When I next look for a job, whenever that may be (certainly not right now! —I’ve a project to finish), I will definitely look at the physical work environment as one criterion. I want an office—I’d like some privacy, but most of all I want real air and real light. This hole is just depressing.

haggholm: (Default)


somedb=> select date('2009-05-27') + 7;
(1 row)


mysql> select date('2009-05-27') + 7;
| date('2009-05-27') + 7 |
|               20090534 | 
1 row in set (0.00 sec)

My current task, which involves date calculations on items in the database, is going to be a bit complicated by the fact that MySQL’s date arithmetic sophistication is such that it thinks that one week from today is May 34.

Update: I can, of course, and probably will use MySQL’s builtin functions (DATE_ADD() et al), but this forces me to use non-standard functions rather than standard SQL operations. (I will get away with this because, and only because, this module is restricted to MySQL only, unlike our core system.) Furthermore, I fail to see, if they have implemented the proper arithmetic in functions, why they left the operations with a completely idiotic default.

haggholm: (Default)

Chad: Hey everyone,
Chad: Big news
Petter: ?
Chad: We've decided to ditch IE6
Adam: big GOOD news, or big BAD news
Adam: whaaa?
Petter: That's AWESOME news.
Petter: That's the second best possible news, in fact! (--Coming up just behind “You’re all getting a raise”.)
Chad: For the assignments piece, we're going with IE7+. I just got off the phone with Derek and we'll take some heat, but it's not worth our development time...especially if we can't test it well.

haggholm: (Default)

From: [Me]
To: [People]
Subject: [something pertaining to Excel spreadsheet problems]

…The issue is that the data stored are not the same as the data displayed. The Excel parser we use does not convert date cells to strings we can parse. And the reason why we've never encountered it before is that we always used CSV files rather than Excel spreadsheets...

However, it DOES have access to the format, e.g. date cells are tagged as type 3, and I managed to find out that Excel stores dates as the number of days since January 1, 1900, so I have modified the parser to convert type-3 cells to formatted datestamps offset from that date. (Actually, it wasn't quite that simple since PHP for stupid reasons cannot represent the year 1900 in datestamps!, so I had to use a workaround wherein I used the Unix Epoch as an offset...but the basic principle remains the same.)

I should have this tested, reviewed, and uploaded before lunchtime.

A bit of a weird and frustrating problem, but I love this stuff, deep down. It’s interesting.

haggholm: (Default)

Parse error: syntax error, unexpected T_PAAMAYIM_NEKUDOTAYIM, expecting '}' in /var/www/htdocs/webeval/erez/classes/assignment/assignmentSQLgen.php on line 251

According to Wikipedia,

Paamayim Nekudotayim (פעמיים נקודתיים pronounced [paʔamajim nəkudotajim]) is a name for the Scope Resolution Operator (::) in PHP. It means "twice colon" or "double colon" in Hebrew.

Nekudotayim (נקודתיים) means 'colon'; it comes from nekuda (IPA: [nəkuda]), 'point' or 'dot', and the dual suffix ayim (יים-), hence 'two points'. Similarly, the word paamayim (פעמיים) is derived by attaching the dual suffix to paam (IPA: [paʔam]) ('one time' or 'once'), thus yielding 'twice'.

The name was introduced in the Israeli-developed Zend Engine 0.5 used in PHP 3. Although it has been confusing to many developers, it is still being used in PHP 5.

…Of course.

haggholm: (Default)

When I was hired to write the assignment module for eRezLife, in 2007, it had a fairly different scope than it does now. The core is still the same—we have a set of students and a set of residences; students have preferences about the sort of rooms and roommates they want, and residences have rules (preferences) concerning what sort of students should be assigned to them; and we want to efficiently produce a good mapping between the sets. That core hasn’t changed.

What has changed is everything around it—so that we now want to track history, run multiple concurrent sessions, and so on. This doesn’t entail a lot of fundamental changes, but the algorithm does now have to track the state between subroutines, and a lot of the selection procedures are affected (parallel and consecutive sessions affect room availability). Additionally, the need for an event history necessitates the creation of a whole bunch of new tables, more logic at certain junctures, and so on.

Of course, none of this is very surprising. Projects have a way of growing, and new requirements emerge; additionally, even though we have some pretty solid write-ups by now defining the behaviour, workflow, UI, and so on, there are always minor details where it turns out that my boss had implicitly assumed something that just wasn’t so. This is not a criticism of the development process, requirements gathering process, or any such thing—we started out weak but have worked out a pretty solid process by now; it’s been a very real process of growth and learning for the company.

My current frustration comes from communication issues, mostly from the higher-ups a bit further from the development process, but occasionally from co-workers. Let me prefix this by saying that a very big part of this is my own psychology: I’m not trying to assign blame; I’m trying to figure out why it is that I keep feeling irritated in order to see what I can do about it. That said, here’s what happens:

When I started working on this project, alone, and with a different scope, I naturally designed it to meet the requirements as they were formulated at the time. Since then, they have changed both in what they are and in terms of how explicitly and specifically we have laid them out, so that I now work towards a different set of requirements. This is fine and natural and how things are. The problem occurs when someone says something like We’re going to have to display X. Our system tracks that, right? or I assume that the system can do Y. Quite frequently, I do not in fact track X, and I have not designed the system to do Y. I don’t feel guilty about that, since those were not part of the original requirements as presented to me. However, when you present it in the form of an assumption rather than a question (Do we track X? Can we do Y?), I can’t answer No, but I’ll make it so, Captain; instead I am put on the defensive—No, your assumption is flawed; the system can’t do that because it was never specified in such a way.

This may sound a little over-sensitive, but I think it’s a fairly natural way to feel. Suppose that you and I are flatmates and handle the dishes communally; suppose that on a given day, it’s unclear whose job it is. If I ask you Have you done the dishes? you can answer No; I can then ask Could you do them today, please?, and there needn’t be a problem. If, on the other hand, I come up and say I assume you’ve done the dishes, odds are that you’ll be a bit put off by the notion that I’ve made this assumption without your consent.

Additionally, I’m a bit extra sensitive because the backend system of this project is my baby—no one has worked on it but me, and while it’s rife with flaws that I see in retrospect, and doesn’t match the changed requirements, I’m still fairly proud of it: I think it’s fundamentally a good system. Present an idea as a request for something new and I can happily say I’ve already done that! or Sure, I can make it happen. Present it as an assumption and non-compliance looks like a flaw in what I have created, a criticism of my work, and I’m easily offended by that.

Of course, there is one fundamental problem in expecting communication to work the way I wish—viz, the reason why these things are framed as assumptions is presumably not out of a desire to communicate offensively, but because these things are assumed, which is a problem of knowledge diffusion (inescapable in some cases, as my boss is not an engineer and wouldn’t understand the technical details, nor has any need to). Ultimately, I suppose, the take-home lesson from this would be something like the following: Be conservative in making assumptions, and err on the side of posing questions.

I do not think that the utility of this is limited to communicating with me.

haggholm: (Default)
// 1.
$map[$value] = ($value == $req['value']) ? 1.0 : 0.0;

// 2.
$map[$value] = ($value == $req['value']) ? 1.0 : 0;

Can anyone think of any reason whatsoever why these two statements should behave differently? If you had told me they would, I would have laughed derisively. And yet, PHP 5.2.6† at least thinks that they are not merely different, but critically so: While (2) works, (1) results in a syntax error:

Parse error: syntax error, unexpected T_DNUMBER in [...].php on line 232

Note that

  1. the literal 0.0 is not illegal in general, and
  2. the statement fails with other floating-point literals, too—it may be irrelevant to write 0.0 rather than 0, but I also couldn’t write 0.5 if that were what I needed.

What the hell is this lunacy‽

Update: This must be a bug, not (another) idiotic design feature: It raises a parse error when I run it through Apache/mod_php‡, but not with the CLI version of the PHP interpreter. On the other hand, why on Earth should the two use different parsers…? The mystery only deepens.

petter@petter-office:~/temp$ php --version
PHP 5.2.6-2ubuntu4 with Suhosin-Patch (cli) (built: Oct 14 2008 20:06:32)
Copyright (c) 1997-2008 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2008 Zend Technologies
    with Xdebug v2.0.3, Copyright (c) 2002-2007, by Derick Rethans

‡ I often wonder if it isn’t really mod_intercal. PHP is but a PLEASE and a COME FROM away from being INTERCAL 2.0 (for the Web).

Idle talk

Oct. 1st, 2008 05:27 pm
haggholm: (Default)
(05:19:55 PM) David: another case of PHP informing me of nothing - i'm using file_put_contents only I had the params filename and the filecontent mixed... do I get an error msg... NOOooOooooO
(05:21:06 PM) Petter: Of course not. It might upset you to get error messages.
(05:25:38 PM) David: It should be called 'Delusional Programming'
(05:26:53 PM) Petter: Programming With Never-Ending Denial
(05:26:59 PM) Petter: (note handy acronym)
haggholm: (Default)

…Particularly when you’re never quite sure whether some function or object expects (or returns) local or UTC time, and when your database uses at least three different formats for storing them. (I officially hate timezones, and will let this be a reminder never to use anything but UTC for storing dates in an application, ever.)

Subject: The time box issue/GM date problem

I *think* I finally fixed it -- please test and verify. (I checked the
configuration tab, which seems to behave properly; I verified that the
profile displays matched, and I verified that saving a time returns the
same time rather than an off-by-an-hour one.)

That was an incredibly irritating bug, and I missed two jiu-jitsu
practices over it, but I introduced it by making a bad assumption, so I
suppose that's fair.

My bad assumption was that I could take a time T, take its timezone
offset, and by adding (hours*3600+minute*60+seconds) and adding the
offset back, I'd get a good time. That was stupid.

The problem is that different timestamps, from different dates, come
back with different timezone offset. I would suspect this may have to do
with policy changes in things like timezones and DST. Specifically, the
timezone offset from UTC on the same machine differs by 3600 seconds --
one hour -- for timestamps in 1970 and timestamps in 2008, respectively.

On the bright side (and the reason why I made those changes in the first
place), we can now say things like


instead of

	date('G', $time_inseconds)*3600

which I think is worth it in the long run. Or so I tell myself to
console myself for the two missed jiu-jitsu practices.

Petter Häggholm
eRezLife Software
haggholm: (Default)

I read an interesting article by Joel Spolsky, In Defense of Not-Invented-Here Syndrome. It’s worth reading, but in case you're feeling lazy, here are the highlights:

When you're working on a really, really good team with great programmers, everybody else's code, frankly, is bug-infested garbage, and nobody else knows how to ship on time. When you're a cordon bleu chef and you need fresh lavender, you grow it yourself instead of buying it in the farmers' market, because sometimes they don't have fresh lavender or they have old lavender which they pass off as fresh.


The best advice I can offer:

If it's a core business function -- do it yourself, no matter what.
Pick your core business competencies and goals, and do those in house. If you're a software company, writing excellent code is how you're going to succeed. Go ahead and outsource the company cafeteria and the CD-ROM duplication. If you're a pharmaceutical company, write software for drug research, but don't write your own accounting package. If you're a web accounting service, write your own accounting package, but don't try to create your own magazine ads. If you have customers, never outsource customer service.

Now, I absolutely agree with his best advice—if it’s a core business function, do it yourself. After all, it’s your business to perform core business functions well, and if you can’t (or don’t) do it better than your competitors, you don’t have a selling point. But to imply even loosely (he does take the edge off it slightly later in his article) that because you write software, all of software is your core business function, is, I think, so hyperbolic as to be completely misleading. The worst part of the article follows:

The only exception to this rule, I suspect, is if your own people are more incompetent than everyone else, so whenever you try to do anything in house, it's botched up. Yes, there are plenty of places like this. If you're in one of them, I can't help you.

How about an exception for when your people have better things to do? We use all kinds of libraries, internal and external. We use an external DB compatibility layer, various internal tag generation and Javascript DOM libraries, an internal ORM, an external email library with internal wrappers… According to the Spolsky logic, here, we should stick with the internal Javascript library, because our devs know our needs best (true); we should stick with the internal ORM; we should possibly even contemplate our own internal DB layer…this has in fact been brought up, and I think it’s a terrible idea.

Because, let’s face it, DB layers and fancy DOM manipulations are not our core business functions. Our core business functions are things like writing very flexible management systems for medical schools and university residences with client-defined workflows and customiseable forms and reports. This is what we do. This is what we sell. This is what our clients pay us to do, and what they cannot get from anyone else (at competitive prices).

This is why I think we should phase out our internal DOM stuff in favour of jQuery (which we recently adopted), why we should stick with MDB2 for a DB layer, and why I'd be happier if we had a third-party ORM. Is it because I think that the MDB2 people, the jQuery people, and the hypothetical ORM people are smarter than us, or better programmers? Not at all. It’s because that’s not what we’re paid to do, and unless the tools are really bad, it’s going to be vastly more productive to work with the third-party tools, find other third-party tools, or help fix the current tools, in some order of preference.

Maybe Microsoft can afford a team to make a custom compiler for Excel. We sure as hell can’t. We’ve got enough bugs and feature requests to worry about as it is, and almost none of them are ultimately the fault of any third-party library. (I’ve made three bug reports to the MDB2 project, I believe—I can assure you that we have rather more bugs than that in our tracker.)

Now, I’m sure that Joel Spolsky is a smart guy who knows all this, and may have known it before I first used a keyboard (there’s a reason I don’t send this as an email to Joel Spolsky). However, given all these ifs, buts, and general caveats, what’s the point of that article? If the real gist of the article, put succinctly and honestly, should be Not-Invented-Here Syndrome is not a valid concern for your company’s core competency, nor if adequate third-party solutions cannot be found or afforded, I think this is a job for…

Brief rant

Aug. 27th, 2008 04:23 pm
haggholm: (Default)

Dealing with a large volume of bugs in one day is kind of stressful—we got sixteen show-stopper bug reports today, half of which were duplicates, and all of which were initially blamed on my refactorings. Some of them were due to me (so I'm annoyed because I broke a few things), though I've fixed all those now; some of them were due to a bug no less than four years old (so I'm annoyed because I was initially blamed for things that were done years before I even started working here).

But what currently annoys me is that all of them were communicated to me poorly—in two massive emails with tons of irrelevant conversations cited rather than point-by-point, and all top posted (reply at the top, quoted message responded to at the bottom).

I realise that top posting is used by a huge number of people—probably the majority of non-geeks. It is nonetheless a terrible practice, because one of the following must be true:

  • The quoted message is not needed for context, and its inclusion is redundant, making the email unnecessarily large. Since I don't know if there's anything important below, I have to scroll down to check just in case.
  • The quoted message is needed for context. I have to scroll down to the middle of the email, read the quoted message, then scroll back up to read the reply. If it is a long email, I may have to scroll back and forth to keep track.

Here's what top-posting looks like. Note that when we start reading, we have no idea what is supposed to be a good idea unless we send so little email that the referent for the pronoun that is obvious.

Yes, I think that's a good idea. It makes it easier for me
to write, and I'm lazy.

-----Original Message-----
From: Petter Häggholm [mailto:petter@fake.com] 
Sent: August 27, 2008 7:12 PM
To: Top Poster
Subject: Something

Do you really think that top-posting is a good idea?
Why or why not?

What inline posting looks like:

Petter Häggholm wrote:
< Do you really think that top-posting is a good idea?

Of course not.

< Why or why not?

Not only does it force scrolling, it also makes it hard to
respond in a concise, point-by-point manner *inline* with
the message.

Honestly, I don't think top-posting is ever a good thing. The good alternative to bottom-posting (or inline posting) is to reply in such a comprehensive way that the quoted message is not necessary for context, and thus should not be included (if the sender wants to refer to it, he presumably has the power to keep it archived). This would entail writing an email like a regular letter—I do so on occasion, but that's not the nature of technical communication.

Nota bene

Apr. 30th, 2008 09:28 am
haggholm: (Default)

The comment

// @note Good Class :)

is not a proper substitute for actually writing good code.

On a side note, no documentation generator ever used here recognises tags in ‘double-slash’ (//) comments.

haggholm: (Default)

I already deleted it from the codebase, and I can't be bothered to look through Subversion for the exact code (the real deal was unfinished and—thankfully—never invoked), but what I found looked rather like this:

function countChars($str)
	$count = 0;
	for ($i = 0; $i < count($str); $i++)
	return $count;
haggholm: (Default)
class someClass
     * This is the class constructor.
    public function __construct()
        // ...

     * This method takes two parameters, an integer
     * $my_id and a string $some_name.
     * @param int $my_id My ID.
     * @param string $some_name A name to save.
    public function foo($my_id, $some_name)
        // ...

I'm glad the comments are there to tell me these things.


RSS Atom

Most Popular Tags