10:27pm, 17 September 2017
I think one of most commonly repeated and most useful pieces of advice given to us in Computer Science education was ‘optimise for learning, because learning compounds’.
The way I heard it said goes something like this:
“Everything that you learn makes the next related concept easier to learn. For instance, if you know 1 programming language, then assuming you don’t jump paradigms, the 2nd will be slightly easier to learn. Every nth programming language makes the n+1 language easier to learn. Therefore learn as much as you can when you are young, so that it becomes easier to learn new things later when you don’t have as much time. Or just do so because you want to achieve the best possible outcomes for yourself.”
Professor John Ousterhout reformulated this into the pithy ‘A little bit of slope makes up for a lot of y-intercept’, which is nice. I know students who took the advice and pasted it in their Github profiles – you see “Optimise For Learning” under their profile pictures – which is a pretty strong geek endorsement in my world.
With some distance, though, I think working out all the implications of this idea can actually get pretty interesting. Let’s take a closer look.
If you believe ‘learning compounds’, then you probably also have to believe that ideas are what matters in software, not the technologies themselves.
Say you learn for-loops in C, which is as good a place to learn for-loops as any. Later, if you start learning Java, you’ll find that you’ll understand for-loops there as well. Sure, you’d have to learn objects and
public static void main(String args) and interfaces and the JVM and the whole shebang that is Java, but you get for-loops pretty much immediately. The cost of learning for-loops once is therefore amortised across all the languages with for-loops.
When people say that ‘learning compounds’ this effect is what they mean. You learn an idea once in some technology, and then you get that idea for free when you encounter it in another context. This allows you to learn the second thing quicker.
This principle seems to apply all the way up. Once you learn decoupling in software design, you can – with some thought – apply decoupling to any software project you see. And once you understand the tradeoff between threads and events, you can quickly evaluate between Node vs Rails, or nginx vs Apache.
If you put this observation together with the observation that most technologies reimplement old ideas from Computer Science, then you get a new observation: that keeping up with the software industry isn’t like keeping up with a daunting, forever-changing world any longer. I’ve seen resumes by candidates here in Saigon that list a long, long table of technologies they’ve used in the past, as if this makes them more valuable. Reading the list is actually rather tiring. Worse, learning all those technologies over the span of a career without extracting the conceptually reusable bits is positively exhausting, or so I would imagine.
A traditional CS education seems to suggest another way: that you can survive in tech by keeping a smaller, unchanging set of ideas. These ideas would get reimplemented by new technologies once every few years. Technologies change; ideas don’t.
So the focus on ‘learning compounds’ directly implies ‘learn ideas from technologies, not just the technologies themselves’. Again, this is pretty obvious: when you learn MVC in Rails, you quickly recognise MVC in pretty much all the other major web frameworks. Understanding MVC in Rails even helps when you are trying to understand the nested MVC model of Android and iOS applications; it also maps roughly to .NET’s MVVM model.
We can go deeper! An idea like for-loops is great because lots of programming languages have for-loops. That means the cost of learning for-loops is lower after amortising across all those languages.
This suggests that the more general the idea is, the more valuable it is. If the idea is used in lots of places, then learning it gives you the best return on your time. This implies that – all else being equal – you should optimise for learning general ideas.
That sounds nice in theory. Let’s see what that actually looks like in practice.
One of the best programming talks ever given (according to Hacker News) is Rich Hickey’s Simple Made Easy. Hickey’s talk is one of the more general technical presentations I’ve watched – it isn’t about any one technology or paradigm per se; instead it covers this abstract idea of simplicity vs ease-of-use. The idea spans across databases and programming languages, object-oriented and functional paradigms, systems design and software practice. It’s a very deep idea that’s applicable to pretty much all of programming activity.
It also happens that I don’t know how to use much of it.
Rich Hickey has spent decades writing large systems, after which he created the Clojure programming language. Without the scaffolding of Hickey’s experience, the ideas in Simple Made Easy becomes difficult to apply to daily practice.
I wish there was a way to plug my brain and his brain into a central repository of knowledge, where he clicks ‘upload mental model’ and I click ‘download mental model’, and then I become Hickey-level smart, but until we get the Matrix I’d have to make do with watching videos of him speaking.
And I have to do more than that! I have to see if I can put his ideas into practice. Because Hickey’s idea is so general, I find myself having to do the hard work of translating his insight to whatever domain I’m working on. This is very inefficient. A more efficient method would be to use his idea as a yardstick when learning competing approaches to a software problem. The value of Hickey’s talk, therefore, isn’t so much it is a practical thing, but instead that I can use his ideas as a guiding principle for taste.
This difficulty doesn’t stop there, though. If I apply Hickey’s insight wrongly, or if I apply it to a domain where it doesn’t work, I’d get a bad outcome. So I can’t apply it blindly. We have a name for people who blindly apply general ideas to things, regardless of outcomes: we call them ideologues. My other name for ideologue is: ‘a stupid person’.
(I mean, okay, the Soviets aren’t stupid people, but when they insisted Communism worked, and that insistence led to everyone starving to death, then I don’t know what we should call that – mass delusion? Hmm.)
If you want to use Rich Hickey’s ideas, you must be prepared to constantly evaluate their applicability. And this is true for all general ideas: their usefulness lies in the nuanced experiences you must have before you learnt the idea; when you teach it to someone else that nuance gets stripped in transit. It becomes significantly less useful, at least until the other person reconstructs the nuances and limits of the idea himself.
This leads me gently to my next point.
I think the more general the idea, the more difficult it is to teach and learn.
Are there ideas that are unteachable? Well, yes! Try teaching a child to ride a bicycle. You can’t really teach them what to do. You have to show them an example and then push them off a hill and pray they get it on the way down. (Or at least that was how I learnt. It’s a wonder I’m still alive.)
General ideas are difficult to teach because they rely on a scaffolding of experience. Riding a bicycle is in a category of easier things to teach because the scaffolding can be gained via — I don’t know — repeated pushes off a hill. But in nearly all other domains, the general idea isn’t clear until you’ve done enough to be able to spot the patterns.
(You see how difficult talking about this is? Your experiences in learning those general concepts are likely to be different from mine, because you extracted them from different experiences, so you’re either nodding your head or you think I’m crazy. The Haskell community recognised this problem a long time ago; see: Brent Yorgey’s wonderful Monad Burrito blog post as evidence.)
I think the difficulty of teaching general ideas is why people respect those who are good at explaining them. Martin Fowler’s primary claim to fame is that he took the programming habits of a bunch of Smalltalk neckbeards, gave them names, and put them in a book called Refactoring. The techniques he describes applies to any modern programming language, in nearly all software domains. Fowler is consequently respected more than a Joe Random Rubyist who writes a Rails tutorial.
Refactoring has changed the face of programming as we know it. I don’t think the original group even recognised that they were doing these set of habits consciously. They probably had abstract mental models that told them ‘eww’ or ‘nice!’ when they programmed, and it took a guy like Fowler, who was watching, to name them, in turn giving us the ability to talk about those habits as individual techniques. The experience of learning refactoring techniques is to rebuild the mental models that go “eww” or “nice” that the original Smalltalk hackers had. That we can even do this is a testament to Fowler’s skill as a teacher and communicator.
But there are probably deeper, more general, more valuable mental abstractions that live in the brains of programmers like Linus Torvalds and Rich Hickey and Rob Pike; we can only get hints of them by reading what they’ve written. Most people aren’t good at articulating their mental abstractions. So you have to watch as Fowler has, and learn them from observation.
Perhaps this is one reason that pair programming is considered so useful. I almost always learn something new when I have the opportunity to pair. But, again, it’s possible to do it badly: in the same way programmers can learn lots of technologies without extracting general ideas, so too can you walk away from a pair programming session with just a collection of new terminal hacks.
If you’re following me so far, then you should believe that the truly valuable learning opportunity in pair programming is the opportunity to learn the mental abstractions of your pair. And the only way you can do this is to learn by observation.
We can go deeper!
When I was in university, I was close to a friend named Div. Div was the first person I met while queuing up for my student card at the matriculation fair, which is the first of many stupid administrative things you have to do as a new student in Singapore. Div was also in my Discrete Math class in freshman year, where he either didn’t turn up much or, when he did, (and I have no idea why I remember this) he was usually rushing an essay for another class – some Freshman Seminar about Geishas and Samurais or something of the sort.
Div was very intelligent. Before university, my idea of intelligence was the super hero archetype like Tony Stark, the kind that gets himself captured and then puts together a suit of armour from bits of cutlery to kill all his captors.
Div was different. He didn’t express his intelligence so much as express his confusion when people didn’t understand ideas as quickly as he did. I think he was genuinely perplexed when people couldn’t get some complex new thing immediately after class.
I’m exaggerating a little, but the effect was real. Much of the Computer Science curriculum in university felt like squeezing water out of a stone for me. The only fields where it felt intuitive and fun were programming languages and distributed systems (and even then Div did better, if we chose to take the course together).
I started to think that intelligence is the rate at which one can master new ideas. Which meant, in turn, that it was how quickly you could add things to your set of ideas. The better you got at isolating a general idea, the better you could add it. Div was masterfully good at this. In my imagination, once he identified a general idea, it became straightforward to integrate into the rest of his knowledge. After which the new idea turns into the scaffolding on which the next set of ideas are hung. Part of intelligence is genetic, and many of my friends felt like overclocked CPUs in this regard. But the observation itself seemed pretty useful.
I suspected that general ideas of this type existed in a lot of things. My clue was that I was fairly far along on the writing skill tree. Writing for me feels like squeezing a terrain of ideas into a smaller shaped container … except that the container has, urm, movement and rhythm. (That makes no sense to you, and it shouldn’t; it’s my mental model of writing, and it’s built of god knows what). But the skill tree itself is fairly straightforward: there are low-level skills like grammar, spelling, and vocabulary, and above them tricks like the one for writing long sentences without losing clarity. But after you master those, the higher levels – to me – are all about structuring thought, and I learnt mostly from studying the thought-structures of other writers. These aren’t easily describable at all. Writing coaches tend to teach only the lowest rungs of this, but their lessons sound like the instructions of a zen monk on acid. Suffice to say, the rest you have to build yourself.
If we bring this back to software: once you get past the basic levels, I think that broadly speaking there are two ways to grow your set of ideas. The first way is to extract as many general ideas from each technology you encounter. The second way is to learn as many new technologies as possible, isolating ideas via a higher amount of sampling.
The best people in school were good at both, I thought. They were both more efficient at picking out the good ideas per technology, and quicker at learning new technologies. But the former method requires you to recognise that literally unspeakable, general ideas exist. If you don’t recognise that such mental abstractions exist, then you cannot make them the object of your pursuit. Assuming an equal amount of intelligence, you cannot learn as quickly as a person who does.
I think that’s a valuable idea. When you next meet someone who is good at something, it might pay to ask yourself: what unspeakable mental model does she have that I don’t? What deep, general thing does she know? Can I learn it? And if so, how?
It’s interesting that we ended up here from “optimise for learning”, but I tend to find that working out all the implications of simple ideas can have rewarding ends. Optimising for learning works because learning compounds, and learning compounds because ideas help to unlock related ideas. The most valuable ideas are the general ones. And the most general ideas can’t be spoken of at all.
I write an essay a week on topics loosely connected to building a technology company in Asia. You may subscribe below for essay updates:
Impressions from a cowboy town.
I don't have a lot of time to code. Under my constraints, here are some personal principles for picking technologies.
Sometimes lessons from business books are pretty much the same thing as old wisdom from my father.