The course was pretty okay for the first half - new raccoon (Refactoring Guru > Tom Nook), relatively tame assignment (albeit verbose and frustrating to work through, relatively easy to shrug off. I didn't like how long it was for the purpose it served; an intro to Object Oriented programming. It was easy but too long.), occasionally unnecessarily long lab tasks, etc. I could get past that, but then the course went to shit when the project happened. I'm going to also add that the labs that ran during the project were really long so learning stuff from the labs was lost on the students because of how bad the project ended up being. Retrospectively these only really served their purpose as study material for the exam, and as such the labs were often put out of context with the lectures at the time.
For context, we were told that the automarking process (which wasn't a thing in the previous offering of the course) was needed to ensure greater breadth in testing the correctness of students projects, which in turn awards fairer marks, particularly to those who completed more work. The only problem with these intentions (which I fully support, as they do make logical sense) was the execution was mindbogglingly poor, and the execution didn't achieve either of the objectives I've listed (correctness + fair marks) to varying extents, which will both be addressed below. There are also certain other reasons that I think are potentially partially responsible for the poor execution, but I won't go into those in depth because aren't as pertinent to the course itself as the following reasons. Just touching on them however I think is okay though -- it often felt like there could have been more hands-on support from course administration, especially when the course was in fact going awry but there wasn't for whatever reason (extra work, other commitments etc.). Nitpicking slightly, the announcements were sometimes inconsistent (ie. we won't give you X input / we won't test you on Y case, then those events actually happened, stuff like that).
But anyway, the main spiel:
From the start, the timeline that seemed to be employed should've rung alarm bells. Two weeks per milestone is not bad, though more time is preferable. But when the assignment is split like it is, and the second "half" of the assignment depends hugely on the first "half" (the whole point of the second bit is how well your design in the first adapts to new criteria. To quote the project specification: "60% of your [Milestone 3] automark will come from testing a completed interface which includes all the requirements in Milestone 2, and incorporation of the following new requirements (1.1 to 1.4).") it's imperative that students get feedback really quickly. There are two weeks between the two due dates, and as such two lab sessions. However, due to the structure of the course, we demonstrate our product to our tutors in the lab session immediately following the first due date and receive feedback in the next. Depending on when your session is (or if your tutor decides to give feedback outside lab time), the time remaining to act on that feedback for the final product may vary from anywhere between 4-7 days. This is particularly nitpicky but it certainly isn't the worst part, because that title is reserved for the various shenanigans that automarking created. I have no other words to describe automarking other than genuine shit because a) as stated before the execution was awful, b) the process to remedy this was equally if not more awful and c) the automarks (which genuinely could have been released earlier, unless for an even weirder reason the autotesting suite wasn't available before the automarks were released (this would point to admin unpreparedness)) were released really damn late ie. they were released 5 days from the milestone 3 deadline. This course already has an implicitly high workload attached to it, but these late results made us scramble harder (and unnecessarily so, IMHO, since it was in no way our fault), especially since not many of the errors the autotests raised for groups were particularly helpful in pointing out actual flaws in groups' programs. It was genuinely enraging at the time, and even in hindsight, and remaining somewhat level-headed it's impossible to describe it as anything other than a complete shocker. The flow-on effect of this late release and failure to accomplish the initial rationale set for automarking was that despite it being no fault of the students, students had close to no time to fix these non-errors in milestone 2 because of the looming milestone 3 due date. It became a dilemma between working on milestone 3, which relied on the "buggy" milestone 2, or maximising the previous marks and sacrificing milestone 3. For context, you would have been likely to fail other autotests in milestone 3 similar to those in milestone 2. In the end many groups had no choice but to go with the latter option because of the hanging threat.
Now, addressing the remarking process (ie. "b) the process to remedy this was equally if not more awful") -- the initial remark was slated to be returned on the Saturday before the Monday due date, IIRC, which to a student is absolutely outrageous. The amount of organisational disarray would have been ridiculous. We had no dry runs prior to the submission for Milestones 1 + 2 ie. nothing, even the most basic stuff just to ensure we wouldn't fail on technicality rather than incorrectness. This would have prevented a lot of the problems that arose. The official? reason for not providing a dry run was that it'd give away the testing suite, which seemed weird and remains so. A LOT of groups failed on dumb technicalities, and even a remark wouldn't have solved this because there were so many technicalities that a single remark may have solved one only for your group to uncover another. Despite this literally being in no way the students' fault, it was made out to be as if it was. We weren't allowed to "debug" -- but many groups just wanted to fix the technical errors as opposed to logic errors, ie. the ones that the autotests wouldn't facilitate, which weren't even wrong in the first place. In the end, dry runs were released for milestone 3 (any away from the actual testing suite would have been okay for milestone 2) but these ended up being provided two days after the automarks were actually released and were lacklustre at best. They were just the most basic reused milestone 2 tests.
Other issues related to remarking include but aren't limited to:
- The use of a marking cap to allow for small incremental errors/differences between the tests and groups' work, however, this initiative failed for multiple reasons; as stated elsewhere, because of how the autotests ended up running, one reason this failed is that this came off as an implication of a poor specification, rather than assumption variation. The autotests were capped at 80-90 which wasn't particularly helpful at first since a lot of groups initially got way lower than that. I will concede something below
- There was a remarking penalty for "non-atomic changes" which were often necessary for some groups because the set of changes classed as atomic was (somewhat) objectively narrow. This penalty was kept in place even after the shitshow this ended up being, which I personally thought was rather ridiculous (it wasn't even reduced, but I'd like to think it was adjusted slightly behind the scenes, despite the max 20% penalty still being a thing)
I will concede though, that this whole process would have been acceptable had the autotests worked as intended (with a provided dry run, of course) but as it didn't, it just made everything a whole lot worse. Another concession; you did get the highest mark of all the remarks, but this I think pales in comparison to how bad automarking ended up being.
The last point (ie. "a) as stated before the execution was awful"); the biggest problem here was that a lot of the project was open to interpretation, which a lot of the autotests did not factor in. While there was good breadth in testing, what they ended up doing was going into too much depth, thus by definition making assumptions which in many cases conflicted with the more than valid assumptions made by some students. We were told that we should make assumptions and were encouraged to do so where necessary, then we essentially got screwed for doing the exact thing we were told to do ie. basic errors not cleared up by the specification and were fair assumptions ie. no questions required on the forum were causing autotests to screw up, but we didn't know what these "errors" were. We were also told that the autotests would test "lower level / general stuff" and NO edge cases but this was in general not true (some tests fell under the general umbrella of "edge case", others tested higher level stuff where by definition students' interpretation comes into play). A phrase that I saw another student use that encapsulates this whole saga rather well is that "you're allowed to make assumptions, as long as they're also the ones we make", which is frankly ridiculous. If the specification and autotests needed X assumption to pass autotests, these should have been explicitly stated in every case, not just a select few (which I will give *some* credit for) and vaguely elsewhere. I also saw a student say something along the lines of "the project uses design by contract but essentially expects us to defensively program". It's just a shame because overall, autotesting is worth 14% of your OVERALL grade ie. for some rather extreme context, getting 0 for automarking in total can drop you from 100 almost down to a Distinction. It's even more of a shocker when the autotests didn't do their job properly, and even more so when you realise that autotesting was worth more than design in what is fundamentally a software design course (1.33x more, if I recall correctly).
An example of a really bad test that was actually given:
For context, we made a dungeon crawler game. A particular enemy can spawn and has a chance of spawning with armour. That chance is arbitrarily decided by your group. However, there was a test in the automarking suite you could fail if NONE of the first ten of that enemy spawned with armour ie. if you assumed this enemy had a 10% chance of spawning with armour, you'd fail this test roughly 1/3 of the time. This test was purely luck-based, and just statistically favours those who arbitrarily chose a higher chance of armour spawn. Now, this particular test wasn't worth a lot (given the number of tests in the testing suite), but when this sort of thing crops up multiple times across the testing suite, you can imagine the fury of the students. How this particular test was a good idea, I'll never know.
Other pertinent points:
- The response to criticism was passive and slow. Some feedback ran along the lines of "go read the spec", "don't worry about it", etc. There was also a 15m ish window where the course forum had temporarily disabled public posting/commenting, which seemed really strange given the timing (at the peak of the complaints and student anger). Even considering how long it took to get marks, it felt like it took longer to took forever to get responses and feedback on criticism of the automarking process. In short, lack of transparency, stability and communication
- I personally found it weird that no deadline extension was ever on the table (even though many students had made it clear that an extension wouldn't fix things in private circles). The only one afforded to us was the 5hr one for a 5hr GitLab outage in the first submission. I can guarantee that this ended up slowing students for a lot more than 5 hours, even though a deadline extension would have just extended the pain
- Groups with bigger issues that couldn't be resolved by a remarked automark received manual marking, but on a large scale, this was unfeasible. It felt really selective, and I can imagine that a) some groups may not have been bothered anymore and b) many had bigger issues. It would have been better to have executed this properly the first time given the problems that have existed in this course from previous offerings. Having success after manual marking just felt bittersweet; it felt really damn wrong to have to blunder through all this bureaucratic BS just to get correctly assessed.
- If code coverage was high enough, it's worth wondering if using each group's testing suite may have actually been fine, but that's a point for another time.
It's a shame because this course genuinely has potential; OOP as a concept is pretty interesting, but like many other courses (especially certain ones I've taken previously), off the mark administration ruins the student experience. I took two courses and was still occupied ie. a disproportionate workload. It's hard to believe I was considering taking another course at the start of term, and I couldn't be happier that I didn't after how this turned out. I should also reiterate that this is NOT in any way an attack on the course staff; they clearly had the right intentions and the right rationale for their changes. It just so happens that the final product was a devastatingly poor student experience. I might add; the project is worth 35% of your total grade, the labs are a portion of 10% but I have in fact taken more away from the labs given how panic-inducing this project has been; I've also never seen an effort vs marks ratio this disproportionate, even in some parts of HSC English.