Software is the bottleneck


In my last article, I made the case that Apple's supposed problem with professional users has nothing to do with the kind of hardware they are making and everything to do with price. The reason Apple owned the creative professional market ten years ago was that the total cost of ownership (hardware & software) of their systems was significantly cheaper than PCs. Now that cost advantage has evaporated and many creatives are looking at switching to any of the many, high-power Windows systems being advertised for content creation. However, with all of the synthetic benchmarks and chest-thumping, no one is asking the most important question: Does buying faster hardware actually speed up creative workflows?

Throughout the short history of computing, nerds have lusted after bigger and faster hardware. It’s an easy obsession to understand. As physical beings, we are drawn to physical objects - things that we can see and touch. Hardware manufacturers have played on this basic psychology by designing computers as beautiful objects, and Apple has mastered this. We could easily fill a glossy, expensive coffee table book with pictures of their hardware, (oh look they already have).

This is the part where technophiles inevitably say something along the lines of, “That’s why Apple makes so much money… Marketing, form over function, blah blah blah”. This is a common refrain but it’s dead wrong. Yes, Apple makes beautiful designs but that just gets you in the door. It’s software that actually sells systems because that’s what we interact with. Apple understands that consumers don’t place any value on software. Some don’t understand that there is a difference between software and hardware. However, if Apple Inc. sold the same hardware but shipped Windows, (or Android) as the software interface, they wouldn’t be in business.

The right software actually sells hardware all on it’s own. The tech press refers to this as a “killer app”- as in “the killer app for the PC is Microsoft Office.” It’s an annoying phrase, but the sentiment is right. Great software enables us to accomplish more, be more creative and communicate faster. At the same time, bad or out of date software causes more problems than it fixes. This is the biggest challenge creative professionals in every field are facing as the software they use is buckling under increasingly complex workloads. My contention is that it is the software, not the hardware that is the biggest bottleneck in content creation today. There is a solution - but before I get to that - here is a little history.


Math is hard


 Early version of Adobe Premiere running on a Macintosh Quadro

Early version of Adobe Premiere running on a Macintosh Quadro


Not so long ago, computers were extremely slow. You may have missed the dark ages of computing so I’ll try to put things in perspective. The U.S. government had a law forbidding the export of supercomputers to various hostile nations. About twenty years ago, a supercomputer was defined as any system with at least 1 teraflop of computational power, (one billion floating-point operations per second). We have pocket-sized devices now with processor speed measured in multiple teraflops. My first computer on the other hand - an Apple Performa 550 - didn’t even have a floating-point execution unit! It could process floating point operations only by running them through the integer unit. This was extremely slow. Doing complex tasks like image manipulation wasn’t easy. Doing it in real-time, (i.e. while playing video) was impossible without many thousands of dollars in specialty hardware. This was the time when the kind of hardware you ran really did matter because basic computer systems were unequal to the task.

Computer graphics applications process video in up to 32bit floating-point colorspace. This means they use 32bits to describe each pixel's specific color.  Video compression usually uses 8 to 12bits to describe each pixel’s color so even a 16bit color space should be enough to process video in. Having an extra 16bits of overhead means that there is more than enough precision to transform the colors accurately without rounding errors that can cause visual artifacts. However, even with today's very fast CPU's there are still way too many pixels in 4K, (or 8K!) video to process using the CPUs floating point unit alone. Special image processing hardware had to be devised to ensure that real-time image processing and effects don't slow the computer to crawl. Here is a list of these special processing units, (presented from slowest to fastest):

  1. Integer Emulation (software)
  2. Dedicated FPU (hardware)
  3. Specialized execution units (hardware vector)
  4. GPU Compute (massively parallel hardware vector)
  5. FPGA (fully programmable application specific integrated circuit)
  6. ASIC (application specific integrated circuit)

A full explanation of these specialty execution units is beyond the scope of this article, but we need some background to understand the problem. Everything in the list above past number 2 has to be specifically supported in software. Today that is accomplished through high-level APIs that abstract the hardware details from the application layer, (making it trivial to support new hardware). Twenty years ago the hardware was so slow, the software had to eek out every drop of speed. You couldn't use APIs, (even if any had been developed at this early stage) so developers had to support specific dedicated processing hardware in their applications to get workable speeds out of their code. This meant that every new add-on card or updated CPU required the application to be re-written to support it’s hardware.

To make matters worse, high-level languages weren’t generally used because the code they produced ran slower than low-level languages. For code that needed to be extra responsive even C was too slow. Programmers turned to assembly languages specific to the processor they were intended to run on. This produced the fastest code but made the programs extremely difficult to port to other platforms. Steve Jobs’ NeXT Inc. was trying to solve exactly this problem in the 90’s with their NeXTstep/OpenStep operating system and Objective C programing language. The goal was to abstract the code enough so that the only thing required to run an application on different hardware was a re-compile. Sun’s Java programming language took the idea even further. Allowing you to compile at runtime enabled the use of the same code on multiple types of hardware with no extra work from the developer. The problem is that the more you abstract the code, the less efficient the code becomes, (this is a generalization/simplification but mostly holds true) and the more processing power is lost to the abstraction layer. In the end, any code that had to be fast/real time was written at the lowest level possible and re-written when ported between different hardware systems.

Re-writing basic functions for every new piece of hardware is the perfect recipe for a completely unmanageable code base. There are more opportunities for bugs to crop up and it also takes up valuable resources that could be spent creating new features. With every new advance in computing power, the code has to be revised. In the long run this becomes untenable and leads to a bloated, complex codebase that is next to impossible to bring up to date. The other way to make old applications compatible with new hardware is to use the extra speed from new hardware to write a translation or compatibility layer. This solves the problem of having to rewrite core code with every new platform , but the drawback is that this leads to even less new code development because all the coding effort is going into the compatibility layer instead of the application itself. Imagine the difficulty of having a conversation with someone that only speaks a foreign language through an interpreter. You can make yourself understood but it takes a lot longer. In older software there can be multiple translation layers above the normal driver-OS-API layers that are present in every modern system. When you see a major difference in processing time using different software packages either a lot more work is being done or the code is massively less efficient. 

Ultimately, the proper solution to aging code is to throw it out and start over from scratch. Building on a modern, high-level language allows for the new code to be much simpler, more efficient and much easier to port to new hardware. However, in the case of large, professional applications this requires a very large investment in time and money. Usually the functionality would take a long time to reach parity with the old versions well. This is what Apple did with Final Cut Pro X starting in 2009 and it’s taken them several years and many versions to get the program back in fighting shape. In contrast, Adobe Premiere, Photoshop, After Effects, etc. all have legacy code bases that are holding them back, (sometimes severely) and they are trying to modernize a piece at a time. This strategy will keep the program operational, but could lead to leaner, focused competitors pulling the rug out from under them.


Real-world tests


We’ve made the case that old code is holding back many professional applications but how much difference does it actually make to a given workflow? Here is an example comparing Final Cut X, Premiere Pro, and the new version of Davinci Resolve on both an iMac and MacBook Pro:

Ignoring the stabilization in Final Cut, (which is very much the curve breaker at 20x faster than premiere or resolve) you can see that Premiere is 2-10x slower across the board on the same hardware. The heavier the workload, the slower Premiere performs. Premiere is not taking advantage of the specialty hardware that is made for processing images, (items 3-6 on the above chart) and is trying to process everything using the most generic CPU and GPU functions. In contrast, Final Cut and Resolve are taking advantage of those special execution units to make sure that they are processing as fast as possible. Here’s another example that really highlights what’s going on:


In this video you see a challenge was laid down to try and edit 4K video on a laptop. YouTube channel "Linus Tech Tips" decided to see if a top-end PC ultrabook laptop, (a portable PC laptop and not a bulky desktop replacement laptop) could edit 4K video in Premiere. The answer they came to was yes, but only if transcoded to an easier-to-edit codec first like cineform. A process that required a beefy desktop to get done in a reasonable amount of time. Jonathan from "Tld Today" took up the same challenge but used a 12” MacBook with a 1.1ghz Core-M processor, (this is a tablet sized laptop with no fan and no discrete graphics card) which is multiple times slower than the ultrabook used for the Premiere test. But with Final Cut not only could he edit 4K directly out of the camera, but the rendering finished in less than half of the time and the video was almost twice as long, (that’s around 4-5x faster for everybody keeping track). He then issued a counter-challenge saying he would be able to shoot and edit an entire video on the 12" MacBook in Final Cut before Linus could do the same thing using his 36-core server in Premiere. There were no takers.

This poor showing by Premiere reinforces the argument that it is not taking advantage of all of the specialty processing hardware it can. Now depending on your workflow, render times may or may not be a large part of the day’s work, but timeline speed is definitely important. Especially on a lower-end system like a laptop, Final Cut and the new version of resolve will be easily usable where Premiere probably won’t. You can definitely build a workstation powerful enough to edit any footage in Premiere, but you shouldn’t need to spend thousands of dollars on 16-core monster machines to get simple work done. The point of all of this is to show that today’s hardware is plenty fast enough for the work we are doing on it. For the most part professional software hasn’t kept up with the speed of the hardware. And to drive the point home, let’s take a look at editing 4K on the iPad Pro.

You read correctly… you can’t edit 4K on an ultrabook in Premiere, but you can with an iPad… Of course there are some caveats. The video has to be in h.264 format to play in real-time on the iPad, (the iPad CPU has a dedicated ASIC for processing h.264 encoding) but the point still stands. With the right software, (software that can take advantage of today’s efficient hardware features) creative professionals can keep up with today’s workloads without buying expensive, bulky, power-hungry systems.

In my next article, I will detail different workflows and show how to minimize performance bottlenecks.

-Mario Colangelo