November 06, 2004

Programming

Today was the ACM programming contest!

What. A. Day.

So, we met at SMC at 7am to hop in a van and drive down to Urbana. When we got there, the usual introductory stuff happens, and we start doing the usual stuff like opening the problem packets, going to make photocopies, and so on. It was a little bit more disorganised this year, because the grad student who basically ran it last year has since graduated, and they lost a lot of their institutional memory. Eventually, we got everything copied, and went down to get acquainted with the judges' room.

There, the UIUC coordinator was trying to set up a program called PC-squared, which they used last year to run the judging of the contest, with great success. Unfortunately, there were some problems setting it up this year, and the initial diagnosis---which never really made sense to me---was that the whole problem was simply due to line terminations in some files being DOS and some being Unix. John (Dooley, of Knox) spent the next hour helping Ari (the coordinator) set it up. At some point we thought we were close, so we let the students into their lab (this marked 15 minutes before the start of competition).

We also had a discussion over whether to make them do I/O from what is called "standard input" or to use files. Although we preferred standard in, the problem specifications---which were assembled by the regional coordinator---all had filenames on them, so we thought it might be less confusing if we just went with that instead.

Let me reiterate that: the regional said it was up to us, but they had made up problem handouts that indicated file-based I/O. This will be important later.

PC2 was being recalcitrant and not dealing well with files, so we decided to just check the output by hand. The program automatically downloaded their code to the judging machine, compiled it, and ran it there, so we could go into the directory where it was and look at the output file that was generated. I wrote a little script to automate the process, and as the contest got started we checked everything by hand to make sure the script wasn't messing up; everything checked out perfectly.

It seemed a little strange that all the submissions were correct, actually. Usually at least some teams submit something that generates incorrect output, and we have to mark it wrong and send it back. But it's not that out of the ordinary for everyone to get the easy problems correct.

After about two hours of the five-hour contest, we got a program that took longer than the stated one-minute maximum. An error! We felt a lot better after that. We shouldn't have.

Going into hour four, it was seeming increasingly bizarre that the only two errors were timeouts. Had nobody made any logic errors among the thirty or so submissions so far? We checked the next few by hand, and they really were correct.

One of the teams that had only submitted two problems (out of seven) then got a third in, and it was one of the hard ones. Good for them! That must be why they had stalled. Five minutes later, they submitted another correct one. Huh, we said, they must have been debugging on printouts---which led to a good ten-minute discussion on the virtues of debugging code without constantly recompiling.

When this team got yet another one, someone pointed out that it was only a two person team. Where were they getting all this? And where, by the way, less than a half-hour from the end, were the teams that should by now be desperately submitting not-very-working problems?

This wunder-team submitted a seventh problem of seven about fifteen minutes before the end, and something definitely seemed very, very wrong. A few minutes later, one came and wanted to let us know that solutions three through six were just printing the sample output for the problem, and the seventh was even more bogus than that: a thousand repetitions of the phrase "EVERYBODY WANG CHUNG TONIGHT!"

Fuuuuuuuuuuuck.

So it was that at about 5:40, I started madly going through to find the source of the problem, which was likely to invalidate all the results from this site. I was barking orders for people to bring me paper, for the site coordinator to email or call the regional director to not, under any circumstances, publish final results for the region just yet, and in fact to pull the webpage if possible. About three minutes before the end of the contest, I found the source of the problem

PC2 was so completely failing to handle file-based I/O, that after the program ran (and generated an output file called, say, triangle.out), PC2 would then copy into the same directory the known good output file (named, say, triangle.out), with the intent of comparing it with the standard output that it had redirected to a file named stdout.txt.

There is, for the record, no good reason for this; they could just as easily have compared stdout.txt with the version of triangle.out that was already sitting in the judges' known-good directory.

But what was normally a pointless but relatively harmless design decision turned into a nightmare for us, because the "program output" that we had been comparing---automatically or by hand---with the known-good output was in fact always strictly exactly a precise copy of the known-good output.

And it checked out perfectly.

So at this point I had to change my script to delete the existing triangle.out file, re-run the program and compare the new triangle.out file to it. Then I had to, one-by-one, redownload all 56 submissions and run this script on them. The java ones had different calling conventions, so I had to re-tool the script for them. And the java ones that used the ACM-provided (!) I/O library actually sent to standard output anyway, so in the end we had to deal with that too.

Remember how the ACM regional said it was up to us whether to take standard output or file-based output? And then provided problem definitions that assumed file-based output? And provided java support code that assumed standard output? Thanks a lot, dickheads.

So there we are, trying to placate the students, who by now have noticed that strangely, none of them seem to have gotten any incorrect submissions, and the regional coordinator, who wants to know what the hell is going on and why does our region have so many of the "best" teams in the region. And we're trying to re-run every single submission to make sure that this time, they're all right.

Every time we saw another one come up incorrect it was like getting stuck with a dagger, because that team then totally stopped working on that problem (thinking they had it), but of course really couldn't get credit for it. Especially grim were the "presentation errors", where the logic was basically 100% correct, but they had too many spaces or a misspelled word---technically wrong, but easy to fix, and they hadn't fixed it because they thought it was right.

By about 6:30, I was physically sick to my stomach, both from lack of food (thank goodness I'd eaten a cookie just before the final rush) and from the stress of the whole situation, since I'd basically taken charge and become point man for the whole re-evaluation operation. Fortunately, I had two people behind me, one checking my results and one writing them down, and almost everyone else knowing not to bother me with things I didn't care about or couldn't help. Finally we completed the audit, updated all the local scores, and sent them off to the regional.

In the final tally, excepting the last desperate fifteen submissions from the last twenty minutes or so (all wrong), there had turned out to be only about eight judging errors, reasonably spread out among all the teams. Unfortunately, the errors of the two top teams (one from UIUC, one from Rose-Hulman) were both of the "presentation error" variety; while logic errors have no guarantee of ever being caught, it's pretty clear that those teams would have been able to correct those problems.

At about this point, having mailed off the results and awaiting a response from the regional, I went out and got some slices of the Papa Del's that was slowly getting cold out in the atrium. Lots of questions from the students (who still hadn't quite figured out what happened), all met with "no comment"s from me. I brought the food back to the judges' room and was immediately sat down to send off a detailed narrative of what happened and what went wrong (purportedly because I'm a "linguist", though it's not clear to me why that was more relevant than the fact that I discovered and belatedly solved the problem). This I did, and most of the judges went off to explain what happen and at least award ribbons for the intra-site post-correction results.

After I sent off the email, I went out there, but they were wrapping up so I returned and sent off a brief clarification. One team came in to ask what they'd done wrong, so I re-ran their problem (it turns out they'd terminated early on a short input). I'm not sure exactly what the coordinators told the students about what happened, but every one of them were really conciliatory and "ok, I'm not going to dispute that, I was just curious!". Nobody was pissed off, which was gratifying, because they had pretty much every right to be.

The final response from the regional was that the post-correction results had to stand as-is. Even after losing one problem, UIUC's A team still ended up in second place in the whole region, because they're just that cool, and so they'll probably still go to worlds (in Shanghai!). Rose-Hulman's A team (numbered "Two" for reasons not worth elabourating) ended up sixth in the final placing, but as I mentioned earlier, one of their errors was of a variety that was dead easy to correct, and indeed fairly easy to accidentally judge as correct. This would have put them at the top of the regional rankings. Hopefully they'll be able to get a wild-card slot out of this (isn't this sort of thing what wild-cards are for, after all?), but the initial response from the regional was that if our region got allotted a wild-card it would go to the team that placed third in the region, not to Rose-Hulman. Which would suck.

Knox, for its part, made an excellent showing. One of our teams solved four of seven problems and placed 24th in the region, about the same as last year, and the highest-placing liberal arts college. Our other team solved three and placed 40th, also comparable to last year's performance, and quite a good showing. Cheers to our six competitors for their hard work!

So, that was my day. It was a long day. It seems clear that at least one and possibly all three of us here at Knox should become experts on the setup, configuration, and administration of PC2 sometime before next year's ACM contest. And I found myself seriously wondering if there would be any way I could leverage a position as site coordinator for next year, but at UIUC. Or maybe we could host one at Knox; the Cat Lab should hold eight teams pretty easily.

"Liberals once lost elections for supporting civil rights as well and now look back on those losses as badges of honor. Eventually, since young people are far more tolerant of homosexuality than their parents, gay marriage will stop hurting Democrats at the polls." --Peter Beinart

Posted by blahedo at 11:25pm on 6 Nov 2004
Comments
Wow. Sounds exciting ::grimace:: Sorry I couldn't be on the team this year. Posted by Aaron at 8:50pm on 11 Nov 2004
Post a comment









Write this number out in numeral form: two hundred and seventy five
 [?]

Remember personal info?






Valid XHTML 1.0!