Fun with an Interview Question

Fun with an Interview Question

The other day on Twitter

Most candidates cannot solve this interview problem:

Input: "aaaabbbcca"

Output: [("a", 4), ("b", 3), ("c", 2), ("a", 1)] Write a function that converts the input to the output I ask it in the screening interview and give it 25 minutes How would you solve it?

I thought about it for a few minutes while I was doing other things, came up with the answer, congratulated myself (yay me!) and went on to other things.

Then it was posted on a tech forum. How would you answer the question? Lots of folks weighed in, all with seemingly (to me) long-winded coding solutions. Surely they didn't realize the genius of  my solution in F#! So I posted it.

In F#, t looks to me like “aaaabbbcca” |> Seq.countBy id;;
I don’t know how to make it any simpler than that.

Once again, I congratulated myself (yay me!) and moved on. Later that day, I logged on again and a commenter was nice enough to give me my comeuppance.

Unfortunately, this doesn’t return the expected output. That last “a” is what throws a wrench in solutions like this.

Looking back at that tweet, holy shit! I missed that second "a"! Well hell, this is a completely different kind of problem than I first thought. I'll just copy the input string from the website and bang something out in order to save my honor. A wrong must be righted.

I spent the next hour or two staring at the code and wondering what the hell is going on. Twenty lines of code and it doesn't compile. Gives me something like:

error FS0010: Unexpected character '“' in interaction

Did I cast something the wrong way? I tried recasting things. Did I hose up my environment? I reloaded. Is my code structure off in subtle ways? Nothing I did seemed to work. Finally, out of desperation, I started a new project and re-typed everything in by hand.

It worked.

The problem? When I copied and pasted the input code, it was using typographic quotes, not programming quotes. It was a copy-and-paste problem, of the freaking input data, that had defeated me all of that time.

Once that was out of the way, I kicked out the solution rather quickly, first coding it up as a strongly-typed function and then removing the type restrictions to make it one day a generic library function. As another commenter pointed out, this is classic Run Length Encoding, so that's what I named the function.


let runLengthEncode<'a when 'a:equality> inputSeq =
  let rec rleInnerLoop inputSeq acc =
    if inputSeq|>Seq.isEmpty then acc else
    let itemWereWatching=inputSeq |>Seq.head
    let chunk=inputSeq |> Seq.takeWhile itemWereWatching.Equals
    let chunkLength=chunk|>Seq.length
    let newInputSeq = inputSeq |> Seq.skip chunkLength
    let newItem=seq{itemWereWatching, chunkLength}
    rleInnerLoop newInputSeq newItem |> Seq.append acc
  rleInnerLoop inputSeq Seq.empty

The only type constraint I had was that whatever stream of stuff I'm processing, it has to support equality, or a=b, otherwise how could I remove duplicates? I'm also quite taken back by the size of the code. Now my code was the big honking solution. Yikes!

Could I shorten it? Well sure, since it's pure FP and written in an idiomatic style, the whole thing could collapse to one line of code. But why do that? All of those labels in there are for my benefit, in case I need to debug. Until I've tested the hell out of it, I need the labels. (Plus the labels and size shouldn't matter to the compiler anyway, if I can collapse it, the compiler should be able to collapse it)

But something about this feels odd to me, as if it were one of those situations where non-idiomatic code might be better for performance reasons. That's what happens under-the-hood with some F# library functions. For now, though, I'll let the library guys worry about optimization and I'm stay happy with a solved problem. yay me

Of course I have to write the inverse function, Run Length Decode:

let runLengthDecode (inputSeq:seq<'a*int>) =
  seq {yield! inputSeq|>Seq.collect(fun x->Seq.replicate (snd x) (fst x))}

That's a lot shorter and makes me happy. When I see functions like the first one I worry that I've gone far afield of the best solution, perhaps by reinventing the wheel. When I see functions like the second one, I am reassured that this is a common problem. Many others have been here before me. (This goes back to my statement that "I need the labels until I've tested the hell out of it" Other people have done the testing work. Woot!)

What lessons were learned? Well, I really suck at coding in my head. I knew already but continue to be amazed when it happens with only the simplest problems. One of the major points of Info-Ops II is that our belief in our understanding of the code, no matter what it is, is much more unfounded than founded. Even with "perfect" code, battle-tested and hardened, the most you can say is that it works, it has no bugs, and you can trust it. You can't say you understand all that's going on. Modern programming if far, far too complex for that ever to be true.

We're all like that, even non-coders. As much as some of us think we can code in our head, it doesn't work that way, even many times with the most trivial of problems. That doesn't stop us, though. There are people who think they can talk at length about deep issues like foreign policy, all the while maintaining a logical and coherent set of symbols. They're being logical. They're running the code they've created in their head. Other folks, though, are not. (Other folks see this the same way, only with the players reversed)

This is true in policy discussion for the exact same reasons it's true with coding. There's simply too much depth and nuance in language for it to work like code, and even if it did work like code, we'd think we understood far, far more than we actually do. We lose either way.

Secondly, test cases don't always save you. Yes, my initial mistake was not having test cases, just doing it in my head, but once I had test cases and started coding it out, the major part of my hang-up was dealing with some kind of IDE/OS/Unicode/UTF-8 crap, not the code itself. Doesn't matter, I still would have taken more than the allotted 25 minutes. I do not get the job.

But I had fun. That made it worth the effort.

P.S. For those interested in how this all would collapse into one line, here's a hint:

let runLengthEncode2<'a when 'a:equality> inputSeq =
  let rec rleInnerLoop inputSeq acc = match inputSeq|>Seq.isEmpty with |true->acc|false->rleInnerLoop (inputSeq |> Seq.skip ((inputSeq |> Seq.takeWhile (inputSeq |>Seq.head).Equals)|>Seq.length)) (seq{(inputSeq |>Seq.head), ((inputSeq |> Seq.takeWhile (inputSeq |>Seq.head).Equals)|>Seq.length)}) |> Seq.append acc
  rleInnerLoop inputSeq Seq.empty