I was at the lake last Summer. I picked a mossy old bench to sit, when a group of adolescents gathered to throw stones into the water. For an hour they busied around, making loud gestures and arguing over whose stone made the bigger splash. They would try every angle, and then bicker about which one spattered out the widest, or which one sent the water flying the highest, and how could they agree on what they’d seen if in a moment it was gone? Finally they approached me hoping I would settle their dispute, but I’d not seen the water splash– they scoffed, frustrated, and went on their way, and I was free to continue watching the ripples that now adorned the surface of the lake.
While we’re on the subject of sorting things online, we might as well talk about Google: the 93-billion dollar company whose main export is taking all the things ever and putting them in the right order. If there’s one thing Google knows best, it’s sorting stuff.
I was curious how it all works, and it turned out really interesting, plus I got to learn a bit about Markov chains. It all starts with an algorithm called PageRank1. According to Wikipedia,
Pagerank uses a model of a random surfer who gets bored after several
clicks and switches to a random page. It can be understood as a Markov chain in
which the states are pages, and the transitions are the links between pages.
When calculating PageRank, pages with no outbound links are assumed to link
out to all other pages in the collection (the random surfer chooses another
page at random).
The PageRank values are the entries of the dominant eigenvector of the
modified adjacency matrix.
In this post I’ll try to break that down and provide some of the background
necessary to understand Google PageRank.
At Functional Imperative we’re building the new CanLII Connects website (a social portal for Canada’s largest database of legal cases), and this week I was given the task of figuring out a sensible way of sorting posts.
Figuring out how to sort user-generated content is a common problem that many social websites face.
Here’s Reddit’s scoring equation for ‘Best’ (source and explanation):
Not all scoring equations are that hairy, here are a few more.
Interestingly enough, Reddit’s ‘Hot’ scoring function (explained in link above):
As multicore computing becomes the norm (even my phone is dual core!), it’s important to understand the benefits and also the limitations of concurrency. Amdahl’s Law addresses the latter.
Let’s imagine a simple program. It prints “Hello World” 100 times, then quits.
Our first version of the program is written as a single sequential task: it prints one “Hello World”, then another, then another, 100 times, then quits. This program takes some unit of time, to execute.
Now say we have a dual-core machine at hand. (My phone, perhaps).
Cool, now we can spawn two tasks that print “Hello World” 50 times each. And, because our magical imaginary computer experiences no overhead, it takes us exactly units of time to run our second program.
So we keep adding more and more processors, until we have 100 concurrent threads printing one “Hello World” each, and our program runs 100 times faster.
At this point we stop: “Ah, the trend is clear: more processors equals more speed! No point in continuing this experiment.”
A naive (wrong) first guess: Given processors executing a program, the maximum boost in speed is . (That is, we can get our program to run times faster).
Cool! This means that, given enough processors, we could make any program run almost instantly. Right?
Of course this is not the case! Enough daydreaming. Let’s figure out a more Continue reading
Here is a simple way to share the same
.bash_profile across multiple computers, while still retaining unique settings in between computers.
Suppose you want some special setting to apply only to your laptop.
First, create an empty file called
$ touch ~/.setup_00
Next, in your
rc file, add the following
if statement. Anything inside that
if statement will only apply to your laptop:
if [ -f '.setup_00' ]; then echo "This message only shows on my laptop!" fi
You can use this method to run any shell script uniquely on computers that contain the
That’s it. It’s not fancy, but it works.
Rice’s theorem can be stated thus:
Every non-trivial semantic property of a program is undecidable.
Before we prove the theorem, let’s break down that statement: Continue reading