I don't think how far a candidate gets on this particular test determines if they're OK or if they're great. I posit that a great one could finish 2-3 steps and an OK one could finish 4-5.
Because like many people have stated, this is just a hand holding test. You're basically seeing if they can code. So you're testing how fast someone can take a list of 5 steps someone else gives you and translate it into code in under 30 minutes.
And I'll tell you, I've never, in my life, worked on code with a series of 5 requirements and submitted it for review after 30 minutes. That would be bonkers, because I'd probably spend at least an hour sussing out all the ambiguities and contradictions.
"Basically seeing if they can code" is the goal, yes.
> I don't think how far a candidate gets on this particular test determines if they're OK or if they're great.
It doesn't with certainty, but measurement of this kind is always subject to some degree of error. The point of interviewing (especially at the early stage where this interview is applied) is signal, not certainty, and we do seem to be picking up on signal:
-- # of steps completed on the coding task is the strongest single signal of success at later final interview rounds. Candidates who have gotten offers (not from us, to be clear, so this is uncorrelated error) average ~1.1 more steps completed on our coding problem than candidates who don't. That's after we filtered a lot of the weaker results out, so it's sampling biased in slower candidates' favor. This is a larger gap between offers and non-offers than any of the other nineteen individual internal scores we record for each interview, iirc.
-- # of steps completed correlates with results on the rest of the interview in a way that is aligned with reasonable psychometric measures of quality, same as most of the other parts of the interview (see e.g. [1]).
If your point here is just that interviews involve somewhat artificial work and are prone to error - well, yes. Everyone involved in testing as a field already knows that and is working around it.
To use your concrete example, creating a problem involving "sussing out all the ambiguities and contradictions" for an hour would be asking a lot more work from candidates. If we did that, I bet we'd have people complaining about the lengthy workload of doing an interview (or simply not doing it, which would be fatal to us as a business). I would also worry that that's a much fuzzier skill to measure, particularly in a cross-organizational way. It's not that, in isolation, you could never create an interview to test ambiguity-resolution, it's that (at least in my judgment), it would be impractical for us to do so given what we are trying to do.
And finally, to this:
> I posit that a great one could finish 2-3 steps and an OK one could finish 4-5.
Because like many people have stated, this is just a hand holding test. You're basically seeing if they can code. So you're testing how fast someone can take a list of 5 steps someone else gives you and translate it into code in under 30 minutes.
And I'll tell you, I've never, in my life, worked on code with a series of 5 requirements and submitted it for review after 30 minutes. That would be bonkers, because I'd probably spend at least an hour sussing out all the ambiguities and contradictions.