Over the summer, I’ve heard some protégés bring Linux up. The conversations seem to go something like this:
THEM: It seems like a lot of people are using Linux in my research team.
ME: Hmmmm, do you have any skill with Linux?
THEM: No, not really, but should I? So many people seem to be using it …
ME: Maybe, but the truth is you are probably already are using it and didn’t even know. You should probably dig in and start learning …
A Brief History of Linux
The history of the GNU/Linux operating system traces a great deal of its lineage back to the Unix operating system, originally developed in the early 1970s at (AT&T) Bell Laboratories to ultimately support multi-user, multi-process computers. Though both Linux and Unix share many constructs, tools and philosophies they are distinct and different. But the story of the operating system now dominating global Internet servers regurgitating the very bytes that you are reading on your screen now, is the story of something much deeper. It is fundamentally a story about a tension of cultures. One culture — largely academic, free, experimental and research-based trying to hold on to a philosophy of computing that embraced sharing and openness (indeed, some of the same values that brought Unix and computing the point it had), and the other a culture based on business outcomes, bottom lines, scalability, efficiency and the wildly profitable computing platforms growing out of the innovative pre-public Internet systems of the 1980s.
In part, Linux grew out of a kind of rebellion, fomented in the early 1980s by the likes of AT&T, HP, IBM and DEC (Digital Equipment Corporation) who began building Unix-backed operating systems to run their most profitable scientific workstations and business servers on Unix. Companies were developing their own variants of Unix tuned (and licensed) just so to the hardware and the companies backing it — indeed, by the end of the 1980s there were quite a few notable and proprietary Unix-based operating systems: from Sun Microsystems’ Solaris, to IBM’s AIX, HP’s HP-UX, and DEC’s Ultrix. All of these efforts gave rise to an era of proprietary Unix distributions often with subtle differences that established barriers to interoperability with other distributions.
One movement took shape in 1983 challenging the concept of proprietary software and ultimately gave life to what would become Linux. This movement was led by Richard Stallman, an uncompromising, but exceedingly bright programmer from MIT’s then Artificial Intelligence Lab. Stallman set out to emancipate Unix (and perhaps all software) from the commercial and proprietary bondage it was experiencing, by headed an effort called GNU, a recursive acronym for GNU’s Not Unix. GNU, under Stallman’s leadership, would embark to rewrite much of core Unix code and utilities with new licenses that would allow it to be free and open to modification, sharing and “hacking” (not kind we think of today as malicious, but the kind that involves what Stallman calls “clever playfulness”). He also wanted to transmit a set of values about “freedom” through “free” software that allowed users to escape the steep licensing fees of Unix. “Free” was not about cost, rather an ideology of permissiveness and freedom — Stallman and the Free Software Foundation that he fronted, did not discourage charging money for software services, rather it prohibited restricting users from doing what they wanted with the software once they obtained it. A quick read of the GNU Manifesto should be all that is necessary to allow you insight into the mind and philosophy of Stallman and the FSF.
Through the 80s many of the important tools in Unix were rewritten for GNU, but what was yet missing by the late 1980s was the kernel — the critical part of an operating system that sits between the hardware resources of the computer (memory, CPU, device I/O, etc.) and the rest of the higher level operating system programs (services, utilities, compilers, GUIs, etc.).
A Unix-like clone called Minix had been developed by computer scientist Andrew Tannenbaum as a teaching tool for operating systems concepts. Minix had a kernel that might have worked with GNU, though it was incomplete and ultimately lacked the robustness to match what GNU had already developed. Furthermore, as microprocessors were getting cheaper and cheaper, the commodity electronics race was beginning, and everyone wanted to see what could be done with all the high-powered, low cost electronics during what turned out to be the explosion of the desktop PC era of the 80s. Minux was largely an academic exercise designed for lower end hardware and education.
In 1991, a young Finnish graduate student at the University of Helsinki, Linus Torvalds, was inspired to develop a Unix-like operating system for the next generation of high-powered personal computer CPUs such as Intel’s high performance desktop 386 processor. Up to that point, PC operating systems, while fragmented, were beginning to be dominated by Microsoft’s proprietary and not Unix-like MS-DOS (Microsoft Disk Operating System). Minix, on many fronts fit the criterion for these machines to run a stable and capable operating system, yet there were architectural differences of opinion about how to best develop the low-level kernel for the 386. Furthermore, the academic nature of Minux and lack of critical features, disqualified it from prime time readiness. Torvalds thus set out to write a Minux-inspired version of a kernel for the 386 architecture. He released the first Beta version in the summer of 1991, and in 1994 the first production version was released (with an order of magnitude more code than the first Beta) which integrated numerous GNU utilities and other community contributions. The code was released with the copyleft GNU General Public License, and thus an earnest rebellion took hold cementing the “freedom” of what has now become the world’s most widely used open and “free” Unix-like operating system, Linux.
For some light reading Torvalds’ Master’s thesis, published in 1997, was titled “Linux: a Portable Operating System”, describing the development of the operating system and innovations it had been built upon.
Linux and Scientific Computing
Today’s Linux — many millions of lines of code later — is heavily used in desktop and server computing environments. It runs on nearly all generations of hardware, from low cost SBCs (Single Board Computers) like the Raspberry Pi to supercomputers sporting hundreds of thousands of cores. It is, in fact, the operating system of choice for all of the world’s top 500 fastest supercomputers because it is open, free, permissive of the necessary customizations required in high performance computing environments and above-all Unix-like. These are the same supercomputers that are the engines of 21st century science. The same one’s running climate models, finding dark matter and enabling next generation AI. Contrast this, with 20 years ago, when Unix ran over 96% of the top 500 supercomputers.
While Linux is an ideal choice in scientific computing because of the customizations that can be made to it without violating licensing agreements, it is also an exceedingly friendly platform to the multitude of programming languages found in scientific research labs around the world. The most robust compilers and interpreters for common languages like C, C++, FORTRAN, Java, Python and R are easily and readily available in Linux. The highly efficient and open source datastores such as MariaDB, PostgreSQL, MongoDB, GraphDB and more are all available on Linux and install easily. Most (if not all) of the raw data formats of scientific data formats (think NetCDF) can be processed on Linux, and its rock-solid reliability are a necessity for computing environments with long-running processes that require stable computing environments.
Finally, the GNU General Public License (aka GPL) which is the backbone of Linux, while contentious for some, philosophically aligns with scientific inquiry and discovery, where we are encouraged, even expected, to be collaborative, open, transparent and free with our findings, results, workflows, code and data.
The truth is that if you’ve worked on a supercomputer, you’ve already been working with Linux. If you’ve worked with a cloud compute instance in your work, you’ve already been working with Linux. Whether you decide to use it on your desktop, is a matter of personal choice and productivity, but as you being your graduate career (and beyond), you’ll likely have more contact with Linux.
The democratization of software and hardware rapidly continues to grow, and Linux will remain an important and potent force in those developments and in your scientific computing future. Embrace it. Learn about it. Play around with it — most distributions can be booted on a USB drive without installing or disrupting anything already installed on your hard drive. You might even decide to migrate your entire system to it, but realize it will be necessary to know your way around it in a basic way and the more you grow your skills now, the easier things will be in the future.