haggholm: (Default)

Last night I hit the Zone for the first time in a very long time, hacking away at a geeky product sheerly for my own pleasure. (I’m refactoring the Python/mod_wsgi backend for my website, and for a private side project; last night I cleaned it up considerably, added decorators to set proper Content-Types and invariant data, fixed cookie generation, and implemented a login and session management system.) I reached the point where I forgot hunger and thirst, and where it took some effort, at 1:30 am, to finally force myself into bed.

I’ve missed the Zone.

haggholm: (Default)

Do any of you guys know much about Apache and mod_rewrite? I could use some help.

Update: Chutz asked me the rather obvious question, had I tried turning off all the RewriteConds? The rather sad answer is that no, I’d missed that obvious debugging step. When I did, the RewriteRules worked… With a bit of help from a very high log level, it turned out that while mod_rewrite applies the RewriteBase to the URI (here, truncating the directory) when applying the rule, it does not apply the RewriteBase when matching against a RewriteCond. Thus, the solution is to write my rules as below, but to insert the directory name—the same directory name as the RewriteBase!—in the matching rules, e.g. ^/newsite/\w+.

I’m playing around with some stuff (on my local box, so far, though I’ll be replicating it at Webfaction…if I can get the damned thing to work) with a dynamic website that uses mod_rewrite to take extensionless URIs and turn them into script invocations (mod_wsgi, as it happens, moving away from the largely-deprecated mod_python). This works beautifully when I only have one site. Now, however, I want to have two sites in different <Directory> sections in the same <VirtualHost>, and things aren’t working so smoothly. In fact, as soon as I change my DocumentRoot to something other than the path of the <Directory> the RewriteRules seem to stop working, even without adding a second <Directory> section.

All I get for every request in the mod_rewrite log is a notification that it passed through:

127.0.0.1 - - [03/Apr/2009:21:48:07 --0700] [localhost/sid#217fc08][rid#24dc068/initial] (1) [perdir /var/www/localhost/htdocs/wsgi/newsite/] pass through /var/www/localhost/htdocs/wsgi/newsite/index

(In the Apache error.log, of course, I get the expected error messages about requests for resources that can’t be found.) I’ve tried to add an appropriate RewriteBase, but so far to no avail. My current setup looks like this, and doesn’t work:

Listen 80
LogLevel info
LoadModule wsgi_module /usr/lib64/apache2/modules/mod_wsgi.so

WSGIPythonPath /home/petter/projects/newsite:/home/petter/projects

NameVirtualHost 127.0.0.1:80
<VirtualHost 127.0.0.1:80>
	ServerAdmin webmaster@localhost
	RewriteLog /tmp/rewrite.log
	RewriteLogLevel 2
	
	DocumentRoot /var/www/localhost/htdocs/wsgi
	<Directory "/var/www/localhost/htdocs/wsgi/newsite">
	        Options Indexes FollowSymLinks ExecCGI

		AddHandler wsgi-script .wsgi

		Order allow,deny
		allow from all

		RewriteEngine On
                RewriteBase /newsite

                # Really, really annoying; the trailing slash fixes don't seem
                # to work on the server's document root...
                RewriteCond %{REQUEST_URI} ^$
                RewriteRule ^.*$ test.wsgi?page=index [QSA]

                # Redirect .py files
                RewriteCond %{REQUEST_URI} ^\w+\.py$
                RewriteRule ^(\w+)\.py$ test.wsgi?page=$1 [QSA]

                # Redirect extensionless URLs, unless they're for directories
                RewriteCond %{REQUEST_URI} ^\w+$
                RewriteCond %{REQUEST_FILENAME} !-d
                RewriteRule ^(\w+)$ test.wsgi?page=$1 [QSA]

                <Files *.xml>
                    Order Deny,Allow
                    Deny from All
                </Files>
	</Directory>
</VirtualHost>

haggholm: (Default)

Parse error: syntax error, unexpected T_PAAMAYIM_NEKUDOTAYIM, expecting '}' in /var/www/htdocs/webeval/erez/classes/assignment/assignmentSQLgen.php on line 251

According to Wikipedia,

Paamayim Nekudotayim (פעמיים נקודתיים pronounced [paʔamajim nəkudotajim]) is a name for the Scope Resolution Operator (::) in PHP. It means "twice colon" or "double colon" in Hebrew.

Nekudotayim (נקודתיים) means 'colon'; it comes from nekuda (IPA: [nəkuda]), 'point' or 'dot', and the dual suffix ayim (יים-), hence 'two points'. Similarly, the word paamayim (פעמיים) is derived by attaching the dual suffix to paam (IPA: [paʔam]) ('one time' or 'once'), thus yielding 'twice'.

The name was introduced in the Israeli-developed Zend Engine 0.5 used in PHP 3. Although it has been confusing to many developers, it is still being used in PHP 5.

…Of course.

haggholm: (Default)
// 1.
$map[$value] = ($value == $req['value']) ? 1.0 : 0.0;

// 2.
$map[$value] = ($value == $req['value']) ? 1.0 : 0;

Can anyone think of any reason whatsoever why these two statements should behave differently? If you had told me they would, I would have laughed derisively. And yet, PHP 5.2.6† at least thinks that they are not merely different, but critically so: While (2) works, (1) results in a syntax error:

Parse error: syntax error, unexpected T_DNUMBER in [...].php on line 232

Note that

  1. the literal 0.0 is not illegal in general, and
  2. the statement fails with other floating-point literals, too—it may be irrelevant to write 0.0 rather than 0, but I also couldn’t write 0.5 if that were what I needed.

What the hell is this lunacy‽

Update: This must be a bug, not (another) idiotic design feature: It raises a parse error when I run it through Apache/mod_php‡, but not with the CLI version of the PHP interpreter. On the other hand, why on Earth should the two use different parsers…? The mystery only deepens.

petter@petter-office:~/temp$ php --version
PHP 5.2.6-2ubuntu4 with Suhosin-Patch 0.9.6.2 (cli) (built: Oct 14 2008 20:06:32)
Copyright (c) 1997-2008 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2008 Zend Technologies
    with Xdebug v2.0.3, Copyright (c) 2002-2007, by Derick Rethans

‡ I often wonder if it isn’t really mod_intercal. PHP is but a PLEASE and a COME FROM away from being INTERCAL 2.0 (for the Web).

Idle talk

Oct. 1st, 2008 05:27 pm
haggholm: (Default)
(05:19:55 PM) David: another case of PHP informing me of nothing - i'm using file_put_contents only I had the params filename and the filecontent mixed... do I get an error msg... NOOooOooooO
(05:21:06 PM) Petter: Of course not. It might upset you to get error messages.
(05:25:38 PM) David: It should be called 'Delusional Programming'
(05:26:53 PM) Petter: Programming With Never-Ending Denial
(05:26:59 PM) Petter: (note handy acronym)
haggholm: (Default)

…Particularly when you’re never quite sure whether some function or object expects (or returns) local or UTC time, and when your database uses at least three different formats for storing them. (I officially hate timezones, and will let this be a reminder never to use anything but UTC for storing dates in an application, ever.)

Subject: The time box issue/GM date problem

I *think* I finally fixed it -- please test and verify. (I checked the
configuration tab, which seems to behave properly; I verified that the
profile displays matched, and I verified that saving a time returns the
same time rather than an off-by-an-hour one.)

That was an incredibly irritating bug, and I missed two jiu-jitsu
practices over it, but I introduced it by making a bad assumption, so I
suppose that's fair.

My bad assumption was that I could take a time T, take its timezone
offset, and by adding (hours*3600+minute*60+seconds) and adding the
offset back, I'd get a good time. That was stupid.

The problem is that different timestamps, from different dates, come
back with different timezone offset. I would suspect this may have to do
with policy changes in things like timezones and DST. Specifically, the
timezone offset from UTC on the same machine differs by 3600 seconds --
one hour -- for timestamps in 1970 and timestamps in 2008, respectively.

On the bright side (and the reason why I made those changes in the first
place), we can now say things like

	$d->toTimestamp()

instead of

	date('G', $time_inseconds)*3600
	         +$minutes_default*60
	         +$seconds_default;

which I think is worth it in the long run. Or so I tell myself to
console myself for the two missed jiu-jitsu practices.

-- 
Petter Häggholm
eRezLife Software
http://www.faqs.org/docs/jargon/T/top-post.html
haggholm: (Default)

Since Internet Explorer is completely incapable of accepting the proper Content-type for XHTML, application/xhtml+xml, and since you should never send XHTML 1.1 as text/html, I decided to change my website from XHTML 1.1 to XHTML 1.0 Strict. I didn’t really have to change anything besides the DOCTYPE, so it was a very small deal—small enough that even I can be bothered to do that much to accommodate Internet Explorer (p.b.u.i).

However, this just meant that I could legitimately send a content-type of text/html, which IE understands. It didn’t mean that my site worked in IE, and in fact, it did not.

It turns out that IE has a really, really stupid parsing bug—I cannot call it anything else. While you might easily imagine that anyone writing an X[H]TML parser would have a generic tag-parsing mechanism, where a <short_tag/> is just a <long_tag></long_tag> with an empty (perhaps null) content field. That’s how I do it. Not, it appears, IE: IE cannot parse the tag <script src="…"/>. Easy fix, but who the hell would think to look for it? —This is, by the way, a bug not just in IE6, but also in IE7. I can only hope that they’ve fixed it in IE8.

Speaking of IE8, if anyone reading this has a copy of it, I’d be interested to hear how it renders my site as compared to IE7 and/or compared to another browser (Firefox or Opera—I know it works in those). In particular, I’m not sending data as text/html to IE8 at this point; until I know that it’s broken, I’ll give them the benefit of the doubt and treat them as a proper browser capable of dealing with modern web standards.

haggholm: (Default)

I read an interesting article by Joel Spolsky, In Defense of Not-Invented-Here Syndrome. It’s worth reading, but in case you're feeling lazy, here are the highlights:

When you're working on a really, really good team with great programmers, everybody else's code, frankly, is bug-infested garbage, and nobody else knows how to ship on time. When you're a cordon bleu chef and you need fresh lavender, you grow it yourself instead of buying it in the farmers' market, because sometimes they don't have fresh lavender or they have old lavender which they pass off as fresh.

[…]

The best advice I can offer:

If it's a core business function -- do it yourself, no matter what.
Pick your core business competencies and goals, and do those in house. If you're a software company, writing excellent code is how you're going to succeed. Go ahead and outsource the company cafeteria and the CD-ROM duplication. If you're a pharmaceutical company, write software for drug research, but don't write your own accounting package. If you're a web accounting service, write your own accounting package, but don't try to create your own magazine ads. If you have customers, never outsource customer service.

Now, I absolutely agree with his best advice—if it’s a core business function, do it yourself. After all, it’s your business to perform core business functions well, and if you can’t (or don’t) do it better than your competitors, you don’t have a selling point. But to imply even loosely (he does take the edge off it slightly later in his article) that because you write software, all of software is your core business function, is, I think, so hyperbolic as to be completely misleading. The worst part of the article follows:

The only exception to this rule, I suspect, is if your own people are more incompetent than everyone else, so whenever you try to do anything in house, it's botched up. Yes, there are plenty of places like this. If you're in one of them, I can't help you.

How about an exception for when your people have better things to do? We use all kinds of libraries, internal and external. We use an external DB compatibility layer, various internal tag generation and Javascript DOM libraries, an internal ORM, an external email library with internal wrappers… According to the Spolsky logic, here, we should stick with the internal Javascript library, because our devs know our needs best (true); we should stick with the internal ORM; we should possibly even contemplate our own internal DB layer…this has in fact been brought up, and I think it’s a terrible idea.

Because, let’s face it, DB layers and fancy DOM manipulations are not our core business functions. Our core business functions are things like writing very flexible management systems for medical schools and university residences with client-defined workflows and customiseable forms and reports. This is what we do. This is what we sell. This is what our clients pay us to do, and what they cannot get from anyone else (at competitive prices).

This is why I think we should phase out our internal DOM stuff in favour of jQuery (which we recently adopted), why we should stick with MDB2 for a DB layer, and why I'd be happier if we had a third-party ORM. Is it because I think that the MDB2 people, the jQuery people, and the hypothetical ORM people are smarter than us, or better programmers? Not at all. It’s because that’s not what we’re paid to do, and unless the tools are really bad, it’s going to be vastly more productive to work with the third-party tools, find other third-party tools, or help fix the current tools, in some order of preference.

Maybe Microsoft can afford a team to make a custom compiler for Excel. We sure as hell can’t. We’ve got enough bugs and feature requests to worry about as it is, and almost none of them are ultimately the fault of any third-party library. (I’ve made three bug reports to the MDB2 project, I believe—I can assure you that we have rather more bugs than that in our tracker.)

Now, I’m sure that Joel Spolsky is a smart guy who knows all this, and may have known it before I first used a keyboard (there’s a reason I don’t send this as an email to Joel Spolsky). However, given all these ifs, buts, and general caveats, what’s the point of that article? If the real gist of the article, put succinctly and honestly, should be Not-Invented-Here Syndrome is not a valid concern for your company’s core competency, nor if adequate third-party solutions cannot be found or afforded, I think this is a job for…

haggholm: (Default)

Emphases added—I have nothing else to add.

The style I follow is to look at all the things the class should do and test each one of them for any conditions that might cause the class to fail. This is not the same as test every public method, which some programmers advocate. Testing should be risk driven; remember, you are trying to find bugs now or in the future. So I don't test accessors that just read and write a field. Because they are so simple, I'm not likely to find a bug there.

This is important because trying to write too many tests usually leads to not writing enough. I've often read books on testing, and my reaction has been to shy away from the mountain of stuff I have to do to test. This is counterproductive, because it makes you think that to test you have to do a lot of work. You get many benefits from testing even if you do only a little testing. The key is to test the areas that you are most worried about going wrong. That way you get the most benefit for your testing efforts.

It is better to write and run incomplete tests than not to run complete tests.

[…]

When do you stop [adding tests]? I'm sure you have heard many times that you cannot prove a program has no bugs by testing. That's true but does not affect the ability of testing to speed up programming. I've seen various proposals for rules to ensure that you have tested every combination of everything. It's worth taking a look at these, but don't let them get to you. There is a point of diminishing returns with testing, and there is the danger that by trying to write too many tests, you become discouraged and end up not writing any. You should concentrate on where the risk is. Look at the code and see where it becomes complex. Look at the function and consider the likely areas of error. Your tests will not find every bug, but as you refactor you will understand the program better and thus find more bugs. Although I always start refactoring with a test suite, I invariably add to it as I go along.

Martin Fowler, Refactoring

What we really need to do at work, however—or rather, what we need to figure out how to do, which is far from trivial as our bugs tend to rely on huge amounts of database state, some from clients that don't give us easy access to it:

…When I get a bug report, I begin by writing a unit test that causes the bug to surface. I write more than one test if I need to narrow the scope of the bug, or if there may be related failures. I use the unit tests to help pin down the bug and to ensure that a similar bug doesn't get past my unit tests again.

When you get a bug report, start by writing a unit test that exposes the bug.

Ibid.
haggholm: (Default)

I bet you have! I bet, furthermore, that you've wondered how to do it gracefully, especially with XHTML, which doesn't allow document.write() which is the only Javascript-based solution that LiveJournal offers out of the box.

Why do I want to use Javascript? Well, I could use a server-side script to extract the posts, but that's precisely the problem—it'll be serverside, and why should I make it so? Ultimately, the transaction is between the client and the LiveJournal server. There's no logical need for my webserver to fetch the contents, then paste and send the contents of the blog when the client is perfectly able to fetch it himself. Other solutions, like <iframe>s, are little less unsatisfying: They're ugly and break the page layout.

But how can I gracefully embed the blog in a webpage without document.write()? I thought for a while of various horrible schemes—could I, for instance, create a hidden <iframe> and extract its contents on page load? I'm still not sure if it's possible; it would certainly be cumbersome, and if it is possible it seems to expose the user to cross-site scripting attacks. (If it is possible, and if my <iframe> source referred to anyone else's LiveJournal, I could Ajax it right back to my server, hidden posts and all—because frame solutions use the client's cookies to determine privelege levels.)

The answer is here, it uses JSON, and it is customiseable, although it takes a wee bit of tweaking and (unfortunately) some poking at S2 layers. The credit belongs with [livejournal.com profile] slothman.

One of these days I'll actually stop tweaking the mod_python-based version of my website and launch it…

Nota bene

Apr. 30th, 2008 09:28 am
haggholm: (Default)

The comment

// @note Good Class :)

is not a proper substitute for actually writing good code.

On a side note, no documentation generator ever used here recognises tags in ‘double-slash’ (//) comments.

Threads

Apr. 25th, 2008 12:18 am
haggholm: (Default)

People who actually understand threads are rare and strange and twitch a lot. They tend to wake in a cold sweat in the middle of the night and start raving about race conditions.

Paul Harrison, in a Daily WTF comment
haggholm: (Default)

I already deleted it from the codebase, and I can't be bothered to look through Subversion for the exact code (the real deal was unfinished and—thankfully—never invoked), but what I found looked rather like this:

function countChars($str)
{
	$count = 0;
	for ($i = 0; $i < count($str); $i++)
	{
		$count++;
	}
	return $count;
}
haggholm: (Default)
class someClass
{
    /**
     * This is the class constructor.
     */
    public function __construct()
    {
        // ...
    }

    /**
     * This method takes two parameters, an integer
     * $my_id and a string $some_name.
     *
     * @param int $my_id My ID.
     * @param string $some_name A name to save.
     */
    public function foo($my_id, $some_name)
    {
        // ...
    }
}

I'm glad the comments are there to tell me these things.

haggholm: (Default)

So I want to move my personal website over to mod_python, because Python is neat; but I don't want to break existing URIs. This seems like a pretty simple task—in fact, it is a pretty simple task. I just need to capture requests for .html files and run them through the Python script that does stuff.

<Directory "/my/htdoc/dir">
    AddHandler mod_python .html
    PythonHandler my_handler
</Directory>

Piece of cake! I've got this running on my local Apache server and it works beautifully. The only tiny issue is that for various reasons, I have a bunch of subdirectories over at petterhaggholm.net which could contain, among many other things, static HTML files that I don't want to run through mod_python. This is where things start to get hairy. I imagined that I could do something like this to apply to all subdirectories:

<Directory "/my/htdoc/dir/*">
    AddHandler None .html
</Directory>

The None handler would clear the mod_python handler and restore default behaviour, which is to have Apache send the static HTML. So far, so good—but the rule also applies to the base directory, /my/htdoc/dir! In other words, this second rule overrides the first and completely sabotages the mod_python rule. Worse yet, the <Directory> rules seem to match ordinary files and not just directories; for instance,

<Directory "/my/htdoc/dir/a*">
    # ...
</Directory>

will turn out to match /my/htdoc/dir/aboutme.html. (This is presumably why it seems to apply to the whole directory when I just use *: It matches all files /my/htdoc/dir/* instead of all directories /my/htdoc/dir/*; a trailing slash, by the way, makes no difference.) I find this rather bizarre, and I can't seem to find a way to fix it. I can circumvent it by setting up overriding rules referring to specific subdirectories, but I don't want to have to do that, and in fact I don't want my mod_python handler to apply to any subdirectories—but I do want it to apply to the root. I can do one better and fix it for files in subdirectories:

<Directory "/my/htdoc/dir/*/*">
    AddHandler None .html
</Directory>

This will work properly for /my/htdoc/dir/sub/index.html, but if you go to /my/htdoc/dir/sub/ it won't give you index.html, but a mod_python error message, because the request for sub/index.html within the directory /my/htdoc/dir fails! Why it interprets this as a request for sub/index.html in /my/htdoc/dir rather than a request for index.html within sub is quite beyond me.

Does anyone know a proper solution to my problem? If you help me, I will give you cake¹.

¹ The cake is a lie.

haggholm: (Default)

At work, as previously mentioned, I'm a great champion of phpDocumentor, and I take some pride of being the moving (read: nagging) force that resulted in our now having a repository of generated documentation for the core parts of our system. Unfortunately, there's a bug that prevents us from running it on the entire codebase, so we have to run it in pieces on the really interesting bits—run it on our whole codebase and it'll eat up all memory and die horribly. (It even dies on my desktop, which has a respectable 4 GiB of RAM. It really shouldn't. Our codebase isn't huge.)

One solution that has been proposed (as seen on the previously referenced bug page) is to move the data handling part of the documentor to a[n] SQLite database instead of keeping it all in memory. Apparently, PHP's string handling is rather inefficient, and there are a lot of cross references—this is why we use a parsing documentor rather than something that merely reads docblock headers, after all!

So I figure, what the hell, I have some free time to kill, I'm an open source geek, and improving this would make work life easier. At the same time, I can't afford to do this at work—or rather, my company can't afford to spend developer time on external tools rather than our own software—so I figured I might spend some evenings taking a look at this and seeing if I can make any headway.

A quick look at the codebase tells me that…well, the phpDocumentor code references all the memory bits through arrays rather than method calls, and it makes use of a lot of public class members. In fact, if I make all the member variables of one of the classes I suspect would need some surgery protected, the program ceases to work. There are many classes, many with a good ten or twenty public variables, promiscuously referencing the public members of other classes (ohoho), and no data encapsulation API whatsoever to hijack.

I'd like to see this improve. I'm presently a bit uncertain of whether I'd really like to spend my free time on it…

haggholm: (Default)

If you encounter a particularly nasty bug in your code, one that is difficult to track down, and especially if you're looking for it in module A and end up finding it in module B (for whatever granularity of module you like), this is a sign that module B is not tested sufficiently, and you should immediately add tests to B sufficient to catch this sort of bug.

Unit tests are good for testing correctness, but they are also extremely, critically useful for tracking down bugs and narrowing down the scope of the code you need to inspect to figure them out.

Of course, in a perfect world we all have comprehensive unit testing anyway, but this is not a perfect world, we don't all practice test-driven development, and even those of us who are so good that they always do it (I'm not) may have to work on codebases missing those tests we would have written—not my present situation: I got nailed because I didn't write enough tests. My first work of restitution was to immediately slap unit tests on not only the function whence came the problem proper, but also the other function related to it that I realised was also missing a proper test. Having written this and so calmed down a bit, it's time to turn my attention to justifiably removing the Incomplete test flag from those…

haggholm: (Default)

I think this has already been by most blog-heavy day yet (five posts!—not counting this one); and when I first created this blog, posting was less than a monthly occurrence. Since today has largely been a list of vented frustrations (apart from that PHPUnit post), let me just say that it's nice to rediscover the sense of accomplishment one gets from hacking together low-level bits of code, now and then. HTTP GET may be recalcitrant, but damn it, my client now handles the three content length specifications (unless you do something exotic with your Transfer-Encoding; I submit User-Agent: Crude-RSS 0.1 for a reason…

I can also parse valid and well-formed XML documents (I make no attempts at performing any sophisticated verification: It assumes valid XML and behaviour is undefined in all other cases, though it'll puke with an error message if it's something obvious like mismatched tags), collect the elements in a tree, and…do something with it. All I've done so far is download the Slashdot RSS feed (from http://rss.slashdot.org/Slashdot/slashdot), parse it, and spit out an HTML document based thereon (with a <h3> for every <title>, and so on). I'm toying with the idea of adding an SMTP client to email a version somewhere (with all formatting stripped out). I don't feel like setting up a mailserver, though; I should telnet around a bit and see if I can find an SMTP server that contents itself with the very basics…

I should reassure the reader unfamiliar with this stuff that none of it is as difficult as it sounds; it's a few evenings' after-work hobby programming by someone whose C++ is decent but a tad rusty; it's all under 750 lines (in spite of being rusty C++); and parsing a valid and well-formed XML document is a lot easier than it may sound. I don't want to write an HTML parser…

As a postscript, I notice that I committed a sin that would surely have poor Strunk spinning in his grave (in addition, surely is an adverb): Three paragraphs, all ending in ellipses. Each seemed a good idea at the time, and I am much too tired to alter them. To the reader sensitive to style: My apologies, and good night.

haggholm: (Default)

HTTP 1.1, as per RFC 2616, supports no less than three ways of specifying the length of the response to a simple GET request:

  • In a Content-Length header, as in HTTP 1.0;
  • By means of the header Connection: close, which means content follows and ends only when we close the connection on you;
  • By means of the header Transfer-Encoding: chunked, which means that the message is transferred in chunks, each of which begins with a chunk size—in hex, unlike the decimal representation of Content-Length—and ends with good old CRLFCRLF.
I do not know why this is so, but it annoys me, particularly as it turns out that some servers will ignore you if you specify, in your GET request, HTTP version 1.0 (I'm looking at you, Slashdot—as the first example I came across).

Syndicate

RSS Atom

Most Popular Tags