We know from basic physics that not everything scales up so simply. Just because a small model of a bridge doesn’t fall down, does not mean that an exact replica of the design, made a hundred times bigger, will stand. It won’t. It's just that simple. However whenever we talk about how to scale a computer architecture of online service from a small number of users to a much larger number the option seemed limited to two simple options (endlessly debated)
scale-up, or
scale-out.
To
scale-out(or horizontally) means to put more nodes into the system. If four machines running web servers can’t handle the traffic, then add four more and balance the load across all eight. If the bridge is full of traffic, build another bridge right beside the first and split the load. The extra capacity is simple to implement (just copy exactly) which is the appeal, but it also comes at a price. Two bridges cost (almost) twice as much to build as one.
To
scale-up (or vertically) means to put more resources into a single node. This happens when we throw more CPUs, memory, or server instances on a box. It’s like building a double-decker bridge. The attraction is that the costs don’t double because so much more of the infrastructure is shared. There’s just a limit to how many decks you can reasonably build into a bridge.
The problem here is that the scale-up and scale-out approaches not only increase capacity, but also complexity and cost. Twice the CPUs (regardless of how which boxes they’re in) mean more power, more heat, and more cost. Scale-up and scale-out do not help to build a bridge at half the cost, nor one that spans twice as far. To achieve those goals one has to focus on making stronger building materials and investigating alternative construction techniques. There is no consensus on what to call some of these more innovative approaches but I like to call them
scale-in and
scale-down.
To
scale-in means to put the nodes closer together. When two computers communicate in an interdependent way, it really matters what the latency is between them, particularly when they have to exchange messages several times in order to complete a transaction. Financial institutions co-locate their program trading engines inside the stock exchange’s datacenter just to reduce the speed of light propagation delay between the nodes. Same machines, same software, approximately same cost, but millions of dollars in difference in performance. The proliferation of cache’s, data grids, and the practice of pushing content to the edges of content delivery networks are all examples of the scaling-in of data. Think about what Yahoo! would have to do to build your my.yahoo.com page if every single piece of personalized data (news, blogs, stock quotes, and weather) represented another query between their various servers instead of a high-speed lookup in the web/app servers’ local memory. Scaling in also happens regularly in the semiconductor industry. 90nm wafer fabrication is replaced with the smaller 65 nm process, and then 45 nm, placing the on-chip blocks of logic closer together. Each new smaller generation is more efficient and more powerful.
To
scale-down means to replace the higher level logic of a node with lower level logic. At first this sounds like a contradiction in terms. Higher level means better and more powerful right. Let me explain what I mean by high and lower. Computations today happen at mind boggling layers of abstraction. A simple A=B comparison statement can be running in a scripting language, inside a plug in, inside an integration server, inside an EJB, inside an app server, inside a java virtual machine, inside a guest operating system, inside a virtual machine, inside a host operating system, running on a CPU, inside a server, with remote storage (holding A), and distributed memory (holding B), across two networks, spanning data centers, running VLANs, across multiple switches, and on, and on. A=B is something than can consume an enormous amount of resources, or, it can run in the simplest low level instruction set built into a digital watch. Don’t get me wrong, I’m not a saying we should all go back to programming assembler code for better efficiency. These layers of abstraction add value too or they wouldn’t be so popular. They help scale in the all important time-to-market, ease-of-use, and number of available programmer’s dimensions. However, there are all kinds of examples of very highly repetitive logic that is executed at the highest layers of the abstraction stack that could be more efficiently done down lower, scaled-down. For example, do we really need SOAP based WS-Notification messages to tell us “Status OK” when using HTTP response codes (ala REST) work fine?
The absolute biggest “bang for the buck” scale-down situation is when highly predictable and repeatable functions that are currently done in software, on general purpose CPUs, can be shrunk down into the hardware layer in the form of FPGAs, ASICs, or even in some cases, in the extended instruction sets of the next generation of CPUs. Do we need to waste general purpose CPU’s running java-based load balancing algorithms when a perfectly good F5 box can do the same thing many orders of magnitude times faster? Imagine if IP routers had remained, as they started, entirely written in software. How many of the machines in Google’s datacenter would be dedicated to forwarding IP packets back and forth to the rest of the cloud? The cloud could not exist as we now know it without the tremendous advances in scaling-down software logic into the purpose built chipsets that power the high-speed and cost effective network equipment that we all take for granted.