Entries tagged with programming

Crossposts: http://petter-haggholm.livejournal.com/256570.html

Entry tags:

programming

Python IDEs

I’m considering trying an IDE for a change, rather than just gvim, because exploring options is good. I gather the top three IDEs that get mentioned in the context of Django are Wing, PyCharm, and Komodo (none of which are free), or Eclipse with PyDev (which is free, but has the even higher cost of being fucking Eclipse: I think using Eclipse is a much higher price to pay than money).

PyCharm, by the IntelliJ people, looks more full-featured than I knew any Python IDEs were. It’s really quite nice, and its code intelligence is impressive. The only obvious drawback is that it’s slow (is it because it’s written in Java? or just because it does so much introspection to provide such good hinting for so dynamic a language as Python?), which might or might not drive me crazy; and that when its code intelligence gets things wrong, it can be irritating. For example, it’s wonderful that it warns me if a variable is nowhere declared, but annoying when I know damn well that it gets injected by a decorator.

Wing IDE feels much lighter and faster. It has an impressively accurate vim input mode. (PyCharm also has vim emulation, need to try it.) It also feels like a more native application. In addition, when I open a Django project, the Django support is…I don’t know if it’s superior, but it’s more explicit: I get a Django menu with manage.py actions like validating models, generating SQL, and so on. Its code completion feels slightly more limited, though—import a symbol from a project-local module and PyCharm clearly knows more about it than Wing does.

Both Wing and PyCharm do Django; highlight a template reference in PyCharm and it will go to the template, for instance.

Komodo…I wonder why discussions on IDEs with support for Django mention Komodo. It looks like a great editor, it may even be a great generic IDE, but Django support? My brief search for making Komodo deal intelligently with Django revealed this amazing screencast which seems to conflate syntax highlighting support for Django for supporting Django with the IDE. No: Syntax highlighting is a feature a GUI text editor can have. For an IDE to support a library should mean more. For example, if I open a Django project, it might at least have the decency to figure out what running it should mean (hint: not running the module I’m currently editing). Since it doesn’t do this right off the bat, nor make it trivially discoverable, I infer that even if there exist better Django integration features, it’s so far off the priority radar that I can not only discount Komodo as an option, but reiterate my surprise that it even gets brought up.

Currently I’m leaning toward PyCharm as the top choice, but feeling vaguely guilty about it.

Entry tags:

Django vs. TurboGears, first impressions

My website has gone through many, many iterations: Major backend changes are vastly more frequent than major content changes. For a while, I wrote some basic framework and generation stuff of my own, as a learning experience; more recently, I ported all the content to Django.

The only part of my website that makes real use of dynamic features is my book list, where I keep track of books I’ve read, and am starting to add features like automatically linking to Amazon, Chapters, Project Gutenberg, WorldCat, and so on; I’m also going to add my own personal ratings and reviews. (All, of course, because it pleases me to do so. I have no visitors to speak of.)

However, that book list—the only part that needed any tools at all beyond a menu preprocessor—was suffering from rather awful performance. The weak link turns out to be Django’s ORM. The list contains close to 700 books, correlated with authors, series (with volume information), languages, translations and translators… Of course, since pretty much all of the information is needed to display the page, it should be possible to fetch pretty much all the data in one query. Unfortunately, that’s not the case.

The page needs to fetch all Book objects. Ideally, the ORM would simply fetch all the related objects belonging to those books in the same query—Person and Language objects and so forth. Django, as far as I can tell, has no way of doing this automatically (by setting options or properties on the objects). It does expose a method to sort of do it, the select_related() method on the QuerySet…but it turns out to have a glaring weakness: It supports only simple foreign key relationships. There appears to be no way at all to invoke select_related() and fetch objects via many-to-many relationships! Since my database is full of those, this becomes a problem: The ORM made individual calls to fetch related data for each of almost 700 objects; a total of thousands of database calls per request—where only one call should be necessary!

Of course, I could easily make a custom query to present the list. However, after running up against the aforementioned obstacle with Django, I decided to have a look at TurboGears. The reason is twofold. First, since it’s the only part of my site that needs a framework at all, it does make sense to pick a framework that works well for that particular page. Second, it’s an excuse to explore a new framework—a learning experience. The latter alone is a sufficient reason to me; together, they were compelling.

Getting set up and porting my application to TurboGears was so trivial that it hardly bears speaking of. There was not much logic for most pages; setting up the URL and controller stuff was trivial. I do like the TurboGears style: Somehow it feels more natural than Django.

I first ported the templates from Django to Genshi; then, finding performance problematic, to Mako. I had some brief initial reservations about Mako. A template language that allows raw Python? My first thought was that it would invite worse practices, in that it allows a developer to put non-presentation code amidst presentation. However, in brief practice it felt helpful in that it allowed me to put some Python logic in the template that could not possibly be expressed in a Django template, but was really, truly not concerned with anything but the specific page’s presentation.

The most significant change is, of course, that TurboGears ships by default with SQLAlchemy, which is my poster child for what an ORM should look like. Using the declarative style, it’s simple to do simple things; but importantly, it allows you to accomplish whatever you damn well please. In particular, pulling in related objects—apparently impossible with the Django ORM—is trivial in SQLAlchemy (pass lazy='joined' to the relationship). Thus, loading my page requires one database query rather than thousands.

All is not rosy, however. I’m not enamoured with the style of Django’s documentation (I sometimes find it hard to locate what I need), but it is very comprehensive—and very accurate. With TurboGears, I’m a lot less impressed. I spent a lot of time following and reading information on Routes only to find, in the end, that support is limited and broken. Fortunately the default URL mapper gets me by, but I’m not entirely happy. Worse, following the documentation to customising the admin views, I have so far been unable to get anything working at all. When I followed the very examples from the docs, nothing happened at all!

At the moment, I have two very similar implementations of my site in Django and in TurboGears. The TurboGears version has a vast edge in performance: Generating the booklist page is about four times faster on initial load, and five times faster on subsequent access, before caching; accessing other pages, too, is much faster. On the other hand, the admin pages are currently not usable, so I have to rely on the Django admin views as they share the same DB. Annoying to be sure, though keeping the Django ORM definition up to date is not really a large maintenance burden to have a very slick admin interface.

I’m not very sure of where I want to go from here. The TurboGears version is faster; I prefer SQLAlchemy; and there are style aspects of TurboGears I prefer to Django. I like Mako in that I can put custom display stuff in defs in templates: In Django, where emphasis is placed on making templates presentation-only, I don’t have defs and so must place display-specific logic in separate files! On the other hand, Django seems a bigger and more reliable project, its documentation is superior—and TurboGears has some annoying glitches, as seen in my failure to get Routes and admin views to work (whether those failures are due to bugs proper, or inadequate/erroneous documentation)—and the deployment, though now safely automated, was just awful, with strange complaints of setuptools versions that were already in place.

I think I prefer TurboGears, but Django’s reliability, ease of deployment, and admin interface make me hesitate.

Entry tags:

PHP and “string” comparisons

PHP, among other problems, is a dynamically and (problematically) weakly typed language. What this means is that variables are cast, willy-nilly, to work in whatever fashion the programmer or the PHP interpreter feels is appropriate for the occasion. For example, a string "1" is equivalent to the integer value 1. Or at least equivalent-ish.

The equality test operator, ==, is defined in PHP for strings as for other built-in types. However, as the official documentation states,

If you compare a number with a string or the comparison involves numerical strings, then each string is converted to a number and the comparison performed numerically. These rules also apply to the switch statement.

And:

When a string is evaluated in a numeric context, the resulting value and type are determined as follows.
If the string does not contain any of the characters '.', 'e', or 'E' and the numeric value fits into integer type limits (as defined by PHP_INT_MAX), the string will be evaluated as an integer . In all other cases it will be evaluated as a float .
The value is given by the initial portion of the string . If the string starts with valid numeric data, this will be the value used. Otherwise, the value will be 0 (zero). Valid numeric data is an optional sign, followed by one or more digits (optionally containing a decimal point), followed by an optional exponent. The exponent is an 'e' or 'E' followed by one or more digits.

In the typical PHP context, where scripts are expected to deal with form input and so forth, this seems to make a lot of sense—everything arrives as string data, but the string "123" clearly encodes a number. Well, if it all worked properly, maybe it wouldn’t be so bad. But note that little subtlety above, that you might not expect if you hadn’t either seen it or read it in the docs: If the string starts with valid numeric data, this will be the value used. Otherwise, the value will be 0 (zero). This means that the following are all true:

"1" == 1
"a" != 1
"a" == 0

Yes—because any string that isn’t a number gets converted to zero, this is what you get. I saw this cause a nasty bug only today. (Personally, I prefer strcmp() et al for string comparisons. It’s clunky, but at least I know what it does, in all cases…I think. This is PHP, so one can never be quite sure.)

Another subtle consequence of the (accurate) definition from the documentation: If you compare a number with a string or the comparison involves numerical strings…, then it performs a numeric conversion. Thus, if at least one operand is an integer, the comparison is numeric; if both operands are numeric strings, the comparison is numeric; if both operands are strings, but only one of them is numeric, then it’s a regular string comparison. This makes sense…sort of…but combined with the 0 quirk above, this means that equality in PHP is not always transitive!

"a" ==  0  // true
  0 == "0" // true
"a" == "0" // false!

Normally you expect equality to be transitive, that is, if A = B and B = C, then obviously A = C. In PHP, though, this is not necessarily true: "a" = 0 and 0 ="0", but "a"≠"0"! This follows the specification presented by the PHP documentation (the last comparison fails because both operands are strings but only one of them is numeric, so the comparison is lexical), but it doesn’t make much mathematical or common sense.

In fact, since a given binary relation ~ on a set A is said to be an equivalence relation if and only if it is reflexive, symmetric and transitive [Wikipedia], the “Equal” operator == in PHP is not, in fact, a valid equivalence relation at all.

This is not the only problem, however. A different problem—and one that, unlike the 0-comparisons above, I do not find mentioned or justified in the documentation, is that integers are parsed differently by the regular parser and the string conversion parser. This baffles me; not only is it stupid and weird, but it’s also strange that they don’t just reuse the same routines. The problem is introduced by the fact that PHP, like many other languages, accept integer literals in base 10 (decimal), base 16 (hexadecimal), and base 8 (octal). Octal integer literals are denoted by prefixing them with a 0, thus 01 means “1 in octal notation” (which equals 1 in decimal notation), 010 means “10 in octal notation” (which equals 9 in decimal notation), and 09 is invalid—it means “9 in octal notation”, which makes no sense.

Well, it turns out that for reasons best known to the PHP developers themselves, the automatic conversion of strings to numbers in PHP is handled by something analogous to the C library function strtod(), whose input format is described as such:

The expected form of the (initial portion of the) string is optional leading white space as recognized by isspace(3), an optional plus ('+') or minus sign ('-') and then either (i) a decimal number, or (ii) a hexadecimal number, or (iii) an infinity, or (iv) a NAN (not-a-number).

In other words, integer literals in PHP accept octal notation, but automatic conversions of strings to integers do not. Thus,

   01 == 1   // true, fine
 "01" == 1   // true, fine
  010 == 10  // false, fine -- it's equal to 8
"010" == 10  // true! The conversion assumes decimal notation
"010" == 010 // false!

This also means that casting a string $s to an integer, $x = (int)$s, is not equivalent to evaling it, eval("\$x = {$s}").

On a side note, octal numbers are handled in a pretty weird way to begin with. As the documentation warns you,

If an invalid digit is given in an octal integer (i.e. 8 or 9), the rest of the number is ignored.

Thus,

"09" == 0  // true
"09" == 09 // false; recall that "09" is decimal

This form of behaviour is why I dislike PHP so intensely. As the Zen of Python reminds us, Explicit is better than implicit, and Errors should never pass silently (Unless explicitly silenced). A language that silently squashes errors and returns 0 or null or some similar “empty-ish” value instead of warning you that something went wrong is a language that is not engineered to help you discover your errors, a language that would rather let you produce incorrect output than crash. (Crashing is way better than incorrect output. At least you know something is wrong. Silent logic errors kill people and crash space probes.)

Keep in mind when you code that in general you don’t know whether some string may be numeric or not—if it’s input (direct user input, data from a database, what have you), then the string might happen to be numeric, and you won’t know unless you check (e.g. with is_numeric()).

If you can’t get away from PHP (always an attractive option), I suggest that you stick with strcmp() and its relatives (strncmp(), strcasecmp(), and so on) if you want to compare strings, and explicit casts to integers (or floats), with validation (cf. is_numeric()), if you want to compare numbers. The bugs that are likely to arise from the inconsistencies above may be rare, but they can be subtle and they can be damnably annoying.

For the sake of completeness, the script that I used to discover and verify the above:

<?php

function run_test($test_string) {
	eval("\$result = ($test_string) ? 'true' : 'false';");
	echo "$test_string => $result\n";
}

$tests = array(
	'      "1" == 1        ',
	'      "a" == 1        ',
	'      "a" == 0        ',
	'      "a" == "0"      ',
	'      "0" == 0        ',
	'     "01" == 1        ',
	'    "010" == 10       ',
	'      010 == 10       ',
	'    "010" == 010      ',
	'   "0x10" == 0x10     ',
	'     "09" == 09       ',
	'      "0" == 09       ',
	'      "0" == "09"     ',
	'      "a" == 1e-1000  ',
	'  1e-1000 == "1e-1000"',
	'"1e-1000" == "0"      ',
	'"1e-1000" == "a"      ',
);

foreach ($tests as $test) {
	run_test($test);
}

$s = "010";
echo "\"$s\" == (int)\"$s\" ? " . ($s==(int)$s ? 'yes' : 'no') . "\n";
eval("\$x = {$s};");
echo "\"$s\" == $s ? " . ($s==$x ? 'yes' : 'no') . "\n";

Entry tags:

programming,
work

The Origin of Defects

My pet peeve, and current candidate for leading cause of bugs that are subtle and difficult to track down:

Poor naming.

It may sound trivial (if it doesn’t, you’re already on my team), but having proper variable names, and especially proper function and method names, is in my opinion critical to having a stable and maintainable system. We’ve all seen and laughed at Daily WTF samples of tables named table47; we’ve all cringed at people who named their variables foo and bar…and these are bad, they impede understanding, but what’s even worse than incomprehensible names are misleading names.

It’s been said before but bears repeating (and repeating, and repeating): The names of entities in code are an extremely important part of your documentation. Code, it’s often said, is never out of date, unlike any other kind of documentation—this isn’t really true, but it should be true. If you write a function called getPersonId(), then it had damned well return a person ID or I will come down on you like the wrath of the heavens.

Of course, if things have entirely the wrong names (e.g. because the author of the code was an idiot), then it tends to be pretty obvious. If you request an object ID but receive a table row, you’ll catch on pretty quickly to the fact that the function does not do what you expect it to do. But hopefully, the code you work on was not written by idiots at all, but by at least reasonably competent developers who named things in a way that reflects what the code actually does. And code does not go out of date. Right?

Here’s the problem: Unless you unit test your code, and do so comprehensively, the code can perfectly well go out of date. Here’s what, in my experience, happens: A developer writes a function to accomplish a task. He names it properly, uses it properly, and if possible slaps a few unit tests on it. Later, he discovers that while the function does what it should, the task isn’t quite what he expected. Perhaps fetchFooEntities(), rather than looking up all Foo entities, was written for a piece of the system that should really look up just the subset of active Foo entities. So he refactors the code accordingly. No other code needs refactoring because his was, to date, the only one that called this function.

And voilà!—the system now has a misleading function name. The code, at least the function name, is out of date, because fetchFooEntities() does the job of a function that should be called fetchActiveFooEntities(). The next unsuspecting developer who comes along will see that there’s a function to fetch Foo entities, and that (since it’s not parameterised) it it fetches them all. The function has a straightforward name, but what it actually does is subtly different—therefore there will be bugs. And because the difference is subtle, the bug will be subtle, too.

Please make sure that you give your functions and your variables appropriate and descriptive names. And please, if you change the semantics of those functions or variables, change the names accordingly.

Entry tags:

programming

A none-too-profound musing on requirements mutability

It’s one of the best-known truisms of software development: No matter how carefully you gather your requirements, they’ll likely be wrong and always, always be incomplete. No battle plan survives contact with the enemy; no development plan survives contact with the users. The moment your client sees what you are making, he will realise that it’s not quite what he wants and your requirements will changed. This is well-known; it’s why agile methodologies are praised and inflexible methodologies are disparaged.

What I realised biking home from work today is that this is not just true in practice, but in fact necessarily true.

When you design a software system, its purpose is to help users solve a problem. Your model, then, must take into account

The problem domain itself.
The user’s existing tools, which determine input and output for your system, provide opportunities to make use of, and present obstacles and inefficiencies to resolve.

The purpose of your software system is, of course, to address the shortcomings of the user’s tools in solving the problem. But the moment the user begins to use your software system, it becomes part of the toolchain and thereby part of the system you are analysing and building a model of. As soon as it springs into existence, its opportunities, inefficiencies, powers, and foibles become part of the things you need to leverage and to resolve. Your model now needs to take into account

The problem domain itself.
The user’s existing tools.
The model itself.

This obviously means that it is impossible to fully model the system in the absence of the software system that is to integrate with it, because the software system you are designing is part of the system. It’s not merely practically, but logically impossible to fully specify the system without considering your software…and your software cannot be designed and presented for consideration without some set of requirements. That set of requirements, then, is by logical necessity incomplete.

Much like a mechanical problem moves from mathematical certainty to the vagaries of perturbative methods the moment you add a third body, so software design necessarily relies on iterative and perturbative methods the moment your own system enters the picture, as it must, if it is non-trivial. (Maybe biology is a better metaphor: The solution to the chicken-and-egg problem is to become a chicken by degrees, over generations.) Of course you could in principle carry out this iterative design on paper (literally or otherwise), but to my mind this shifts the picture from one in which you might ideally design everything in one fell swoop but are practically constrained to doing iteratively (one way or another), to one where it is only possible to do it iteratively (again, one way or another).

Of course, the practical outcome remains absolutely unchanged, so my little thoughts have no practical consequences. But it amuses me, sometimes, to think of stuff like this.

Entry tags:

programming

Beej’s Guide to Network Programming

Every time I need to go back and refresh myself on socket programming, I avoid my clunky textbooks and go straight to Beej’s Guide to Network Programming. It’s accessible, it goes into sufficient detail without being bogged down in theory (if I want theory I will consult my textbooks)—basically it covers exactly what I need and want, neither more nor less, and does it in a friendly manner.

Since I’m refreshing myself on C/C++, I decided to write a little server/client app, because it nicely forces me to cover a lot of bases, so of course I returned to good old Beej. This time, I discovered that although the online version is still freely available, he’s also published a print version through the POD publisher Lulu.

I’m placing an order, both because I think I will get lots of use out of it, and because the author deserves some material thanks from me after all the times I already have found it useful. If you ever need to refresh your memory on POSIX (or POSIX-like) socket programming, buy one, too!

Entry tags:

programming

Unpleasant C++

I really enjoyed Effective C++ and stand by what I just said about it. However, the book reminded me not only of joys, but also, it must be admitted, of frustrations.

Let’s look at one last typename Example, because it’s representative of something you’re going to see in real code. Suppose we’re writing a function template that takes an iterator, and we want to make a local copy, temp, of the object the iterator points to. We can do it like this:

template<typename IterT> void workWithIterator(IterT iter) { typename std::iterator_traits<IterT>::value_type temp(*iter); // ... }

Don’t let the std::iterator_traits<IterT>::value_type startle you. That’s just a use of a standard traits class…

(…And I think that if “typename std::iterator_traits<IterT>::value_type” is “standard”, you should strive to make your standard simpler, cleaner, and more readable…)

…If you think reading std::iterator_traits<IterT>::value_type is unpleasant, imagine what it’s like to type it. If you’re like most programmers, the thought of typing it more than once is ghastly, so you’ll want to create a typedef. […]

template<typename IterT> void workWithIterator(IterT iter) { typedef typename std::iterator_traits<IterT>::value_type value_type; value_type temp(*iter); // ... }

Many programmers find the “typedef typename” juxtaposition initially jarring, but it’s a logical fallout from the rules for referring to nested dependent type names. You’ll get used to it fairly quickly.

With all due respect, Mr. Meyers, I hope never to have to see such monstrosities often enough to get used to them! Some lines of code should just never be written, should never have to be written, and that’s one of them:

typedef typename std::iterator_traits<IterT>::value_type value_type;

In all fairness to Scott Meyers, who’s a very good writer, you do end up having to read and write code like that if you write enough C++ using the ‘right’ parts of the language. I’ve written similar things—and I’ve written things that were not only uglier, but also worse.

My personal opinion is that C++ can be a useful language, but if you are to use it you should strive to avoid this sort of thing in the first place. Personally, I prefer to use Python for expressive power, or C if I need something truly low-level—at least it’s simple. C++ is certainly powerful and expressive, but when that dog starts waving its tentacles at me, my aesthetic sensibilities are offended.

Then again, at least it’s not PHP…

Entry tags:

programming

Current reading: Effective C++

I haven’t really worked in C++ for a number of years (and never professionally), but when I felt an urge to read some quality programming/software engineering books and went to the local Chapters, I came home not only with Code Complete (2nd ed.), but also Scott Meyers’s Effective C++ (3rd ed.).

There are several reasons why I’m very much enjoying this book: The fact that it’s accessible and well-written, the feeling of going back to my programming roots, and so forth. But I also enjoy reading it because it has a lot to say about resource management—things that I’ve only heard discussed in the context of C++, but certainly relevant outside of that context.¹

Well, of course good C++ books talk a lot about resource managent. After all, C++ offers all the wonderful ways of leaking memory that C has on the menu, with the addition of gotchas like delete vs. delete[] and the addition of an exception system to bypass your careful resource deallocation. If you can’t write C++ code that does a very good job of resource management—and this takes both knowledge and discipline—you can’t write good C++ code.

But, you may be thinking, this is irrelevant and uninteresting to me: I use modern languages with garbage collection, so all this complicated stuff about exception-safe resource deallocation is only another reason not to learn or use C++ at all. An understandable thought, if so, but one I disagree with. What people can easily forget is that memory management is only one form of resource allocation, and a program can leak other resources in much the same way as it can leak memory: Database and socket connections or file handles left open, GUI resources left unreleased to the OS, and so on. Some of these are in fact a lot worse than mere memory leaks, since a file your program leaves locked may stay locked even after your program terminates (freeing up even leaked memory).

In other words: Because of the omnipresent danger of memory leaks, C++ forces you to be aware of proper resource management—but these lessons are relevant to all manner of other resources. Learning not to leak memory in the face of virtual and non-virtual base class destructors, or in the face of multiple return paths and unexpected function termination via exceptions, may be tedious and seem like busywork, but will instil a solid grasp of all the myriad gotchas of general resource management.

Of course, if you’re working in a web environment you may be relying on the fact that $YOUR_DYNAMIC_LANGUAGE_INTERPRETER will free up all its resources upon termination, i.e. upon completion of each request handled. However, this is not only potentially wrong (locked files, &c.), but should make you feel a little bit dirty and ashamed anyway: That’s no way to program! It’s not hygeinic. And heaven help you if you acquire those bad habits and are ever forced to write a long-running program.

¹ Everything I said in this preceding paragraph applies equally well to Herb Sutter’s Exceptional C++ and More Exceptional C++, only Meyers’s book seems to deal with somewhat less mindboggling material. I should explain this remark by saying that when I refer to all three of these books as accessible, I mean that the material is approached in the most accessible fashion I can imagine. The material itself, especially in Sutter’s books, is occasionally rather mindboggling, and the octopus-from-a-dog joke comes to mind.

Entry tags:

Web framework musings: Django time?

Starting to get tired of wrestling with the recurrent bugs in my homebrewn framework that prematurely terminate sessions. I’m not too concerned about them since they do not threaten security—I wrote it in a rather paranoid fashion, and all session or authentication related bugs since the first week or so have had to do with premature termination rather than excessive permissiveness—but they do annoy me.

Perhaps it’s time I refactored my website, and maybe a webapp or two, to use Django. I’m sure I could fix these bugs with the help of improved logging, but is it worth the effort? Beyond the “just for fun” reason, I wrote my framework in order to learn about framework development, and to get an inside understanding of session management and security concerns like proper password management, authentication, CSRF protection, and so forth. I did not write it with either a belief or an intention that I would write my own production-worthy system to rival a major project like Django.

I’ve learned a lot of lessons¹, and written some decent code², but if I want to keep working on this framework, I will have to start to Do Things Properly—add unit tests, track down these pesky session termination bugs, and so forth. I consider unit tests, proper logging, and so forth essential to production code, but not to fun exploration projects. I’m rather beginning to think that my framework has reached the point where it should either be made serious, or phased out. The latter sounds more sensible, and less symptomatic of NIH.

Besides, it can’t hurt to learn Django, can it?

¹ Apart from learning about CSRF protection, the most interesting problem I got to solve was probably SQLObjectInherit, which provides (in SQLAlchemy language) single table inheritance for SQLObject, using decorators. It’s not perfect (there are some edge cases where you request an object from some class C and expect a subclass D, but erroneously get the parent class), and I was contemplating switching to SQLAlchemy largely for this reason. It’s also the one thing that makes me question the Django decision, but the lack of single table inheritance is probably a smaller deal, in the long run, than all the myriad problems it solves.

² Decent for a non-unit tested system, which is very different from a production ready or production worthy system.

Entry tags:

programming

This is why programmers don’t want to deal with dates

The subject matter is hideously complex. (On top of that quicksand foundation, many of the implementations are rickety in and of themselves.)

Edit: It occurs to me that I just effectively declared that “programmers don’t like dates”. I feel as though I’m channelling some sort of stereotype…

Entry tags:

Speed in unit tests matters

It’s extremely frustrating to have to wait for over ten minutes when you’re ready to commit some new code, just because you have to wait for a big, slow unit test suite to complete. It’s also frustrating when you’re actively addressing a known bug that’s been exposed by unit tests and, having made a change that will hopefully fix it, sit and twiddle your thumbs as the tests re-run. Efficiency matters, even in unit tests.

I’ve spent a few workdays attacking the test suite for the module I’m working on with the proper tools—a profiler and KCacheGrind, a profiling data visualiser. By figuring out where the test suite spent most of its time and optimising the slow parts (largely by caching data that were recomputed superfluously, caching prepared statements, etc.), I cut down the expected running time for company-wide unit tests by an estimated 10% and my own module’s tests by approximately 80%—an improvement by a factor of 5, from 12:31 to 2:40!

Of course this number is going to creep up as the test suite grows, coverage improves, and setup becomes more involved. However, that’s all the more reason to do this, and just means that it may become relevant to do it again at some point in the future.

As a bonus, the majority of the performance improvements were to business code exercised by the unit tests rather than code exclusive to the test framework, so application performance will be improved as well. I should be cautious in my conclusions here, though: While there will be improvements, some of the code exercised very heavily by unit tests is not run very frequently by users.

Entry tags:

SQLObjectInherit

I just threw a little code snippet onto my website: SQLObjectInherit, to add inheritance without foreign key relationships to SQLObject. Follow yonder link if you are curious (download link available here).

Entry tags:

Today I give thanks

…To whoever came up with the decorator module for Python. (“Whoever”? It seems to be some guy named Michele Simionato.)

My page generation library uses a lot of decorators. For instance, if a method is invoked to generate a page, it is decorated with @makepage, and voilà! the proper methods for generating the page as a whole are invoked, and though the method only returns some content to go in the content <div>, it will be a proper XHTML document with menus, etc. A method that generates Javascript? @makejs and it returns it with the proper MIME type. Need to check permissions? @checkPerm('admin') ensures that mere users cannot delete what they should not be able to delete even if they craft their POST requests to target methods they shouldn’t.

The problem is that this interferes with another mechanism my pages use. POST data are used for various parameters: Some special variables are used to determine call type and authentication; some are used for __init__() parameters to set up page objects; others are used as arguments to the methods subsequently invoked. In order to figure out what should go where, the framework relies on inspect.getargspec() to figure out what the parameters to a method may be. Currently it can’t handle methods that take *args and/or **kwargs; if I ever need it I’ll add it. The problem is, when you write general decorators, the signature of the decorated functions will tend to end up in the form (*args, **kwargs)… Now my framwork using getargspec() is unable to figure out what POST variables should be passed in and, consequently, passes in no arguments.

Fortunately, it turns out that someone else had recognised that this was a general problem, and the decorator module is written precisely to solve the problem of decorated functions losing their signatures. The module page describes both problem and solution in greater detail. Go forth, enjoy, and stay Pythonic!

Entry tags:

programming

Truth of the day

PHP Must Die

Entry tags:

Well, that describes me rather well

Sometimes, The Better You Program, The Worse You Communicate.

Entry tags:

programming,
work

Schrödinger’s logic: Neither IN nor NOT IN a tuple

Interesting and peculiar. It turns out that Tonya’s way of deleting entries is to just delete everything that is not resubmitted. This should work, but it fails on the last entry. The reason why it doesn’t work is a little bit subtle and weird. The query in question is db()->execPrintf('DELETE FROM am_releases_templates WHERE release_id = %i AND id NOT IN %@i', $release_id, array_keys($template_ids)); The question is, what happens when $template_ids is empty? What does printfQuery() do? printfQuery() is mine, of course, so I should know, and what I did was to pass in the tuple (NULL), since SQL considers NULL not equal to anything. So, I thought, for any value x, `x IN (NULL)` should be false—and consequently, `x NOT IN (NULL)` must be true. Stupidly, I didn’t test and verify this. It turns out that MySQL returns an empty result set when you compare against the tuple (NULL). That is, `...AND id NOT IN (NULL)` is *not* the complement of `...AND id IN (NULL)`, so the union of `x and not x` is...an empty set, rather than all the elements. This is rather weird.

Conclusion: I really don’t like MySQL.

Update: Not just MySQL, but SQL in general, it seems.

Entry tags:

Why I dislike MySQL: An example

Postgres:

somedb=> select date('2009-05-27') + 7;
  ?column?  
------------
 2009-06-03
(1 row)

MySQL:

mysql> select date('2009-05-27') + 7;
+------------------------+
| date('2009-05-27') + 7 |
+------------------------+
|               20090534 | 
+------------------------+
1 row in set (0.00 sec)

My current task, which involves date calculations on items in the database, is going to be a bit complicated by the fact that MySQL’s date arithmetic sophistication is such that it thinks that one week from today is May 34.

Update: I can, of course, and probably will use MySQL’s builtin functions (DATE_ADD() et al), but this forces me to use non-standard functions rather than standard SQL operations. (I will get away with this because, and only because, this module is restricted to MySQL only, unlike our core system.) Furthermore, I fail to see, if they have implemented the proper arithmetic in functions, why they left the operations with a completely idiotic default.

Entry tags:

Securing webapp credentials

This seems to be an oddly neglected topic on which I can’t find much useful information: How do you secure your application’s credentials? I don’t mean user credentials—you can find any number of articles detailing why secure hashes salted with nonces is the only way to go, and so on. I mean something more fundamental: My application sits on a server somewhere, on a shared server to be specific, and it has to connect to the database where all these deliciously salted and secure passwords are stored. All the user authentication in the world won’t save me if anybody with an account on the same server can access the config files where the application’s own credentials are stored, and since that file has to be readable by the webserver (user apache or group www-data or whatever the local case may be), odds are that this is indeed possible.

I realise that this is, of course, highly dependent on the environment. My own environment of interest is a Linux server running apache 2.0.52 (or so) with a custom Python framework running on mod_wsgi. I am primarily curious about people’s solutions within that sort of context, but I am also generally curious: How do you manage your application credentials?

Entry tags:

Do any of you guys use PostgreSQL?

I could use some help.

Entry tags:

Email snippet

From: [Me] To: [People] Subject: [something pertaining to Excel spreadsheet problems] …The issue is that the data stored are not the same as the data displayed. The Excel parser we use does not convert date cells to strings we can parse. And the reason why we've never encountered it before is that we always used CSV files rather than Excel spreadsheets... However, it DOES have access to the format, e.g. date cells are tagged as type 3, and I managed to find out that Excel stores dates as the number of days since January 1, 1900, so I have modified the parser to convert type-3 cells to formatted datestamps offset from that date. (Actually, it wasn't quite that simple since PHP for stupid reasons cannot represent the year 1900 in datestamps!, so I had to use a workaround wherein I used the Unix Epoch as an offset...but the basic principle remains the same.) I should have this tested, reviewed, and uploaded before lunchtime.

A bit of a weird and frustrating problem, but I love this stuff, deep down. It’s interesting.