New languages, and improvements on existing ones, are mushrooming throughout the develoment landscape. Mozilla’s Rust , Apple’s Swift , Jetbrains’s Kotlin , and many other languages provide developers with a new range of choices for speed, safety, convenience, portability, and power.
Why now? One big reason is new tools for building languages—specifically, compilers. And chief among them is LLVM (Low-Level Virtual Machine), an open source project originally developed by Swift language creator Chris Lattner as a research project at the University of Illinois.
LLVM makes it easier to not only create new languages, but to enhance the development of existing ones. It provides tools for automating many of the most thankless parts of the task of language creation: creating a compiler, porting the outputted code to multiple platforms and architectures, and writing code to handle common language metaphors like exceptions. Its liberal licensing means it can be freely reused as a software component or deployed as a service.
The roster of languages making use of LLVM has many familiar names. Apple’s Swift language uses LLVM as its compiler framework, and Rust uses LLVM as a core component of its tool chain. Also, many compilers have an LLVM edition, such as Clang, the C/C++ compiler (this the name, “C-lang”), itself a project closely allied with LLVM. And Kotlin, nominally a JVM language, is developing a version of the language called Kotlin Native that uses LLVM to compile to machine-native code.
At its heart, LLVM is a library for programmatically creating machine-native code. A developer uses the API to generate instructions in a format called an intermediate representation , or IR. LLVM can then compile the IR into a standalone binary, or perform a JIT (just-in-time) compilation on the code to run in the context of another program, such as an interpreter for the language.
LLVM’s APIs provide primitives for developing many common structures and patterns found in programming languages. For example, almost every language has the concept of a function and of a global variable. LLVM has functions and global variables as standard elements in its IR, so instead of spending time and energy reinventing those particular wheels, you can just use LLVM’s implementations and focus on the parts of your language that need the attention.
One way to think of LLVM is as something that’s often been said about the C programming language: C is sometimes described as a portable, high-level assembly language, because it has constructions that can map closely to system hardware, and it has been ported to almost every system architecture there is. But C only works as a portable assembly language as a side effect of how it works; that wasn’t really one of its design goals.
By contrast, LLVM’s IR was designed from the beginning to be a portable assembly. One way it accomplishes this portability is by offering primitives independent of any particular machine architecture. For example, integer types aren’t confined to the maximum bit width of the underlying hardware (such as 32 or 64 bits). You can create primitive integer types using as many bits as needed, like a 128-bit integer. You also don’t have to worry about crafting output to match a specific processor’s instruction set; LLVM takes care of that for you too.
The most common use case for LLVM is as an ahead-of-time (AOT) compiler for a language. But LLVM makes other things possible as well.
Some situations require code to be generated on the fly at runtime, rather than compiled ahead of time. The Julia language , for example, JIT-compiles its code, because it needs to run fast and interact with the user via a REPL (read-eval-print loop) or interactive prompt. Mono, the .Net implementation, has an option to compile to native code by way of an LLVM back end .
Numba , a math-acceleration package for Python, JIT-compiles selected Python functions to machine code. It can also compile Numba-decorated code ahead of time, but (like Julia) Python offers rapid development by being an interpreted language. Using JIT compilation to produce such code complements Python’s interactive workflow better than ahead-of-time compilation.
Others are experimenting with unorthodox ways to use LLVM as a JIT, such as compiling PostgreSQL queries , yielding up to a fivefold increase in performance.
LLVM doesn’t just compile the IR to native machine code. You can also programmatically direct it to optimize the code with a high degree of granularity, all the way through the linking process. The optimizations can be quite aggressive, including things like inlining functions, eliminating dead code (including unused type declarations and function arguments), and unrolling loops.
Again, the power is in not having to implement all this yourself. LLVM can handle them for you, or you can direct it to toggle them off as needed. For example, if you want smaller binaries at the cost of some performance, you could have your compiler front end tell LLVM to disable loop unrolling.
LLVM has been used to produce compilers for many general-purpose languages, but it’s also useful for producing languages that are highly vertical or exclusive to a problem domain. In some ways, this is where LLVM shines brightest, because it removes a lot of the drudgery in creating such a language and makes it perform well.
Another way LLVM can be used is to add domain-specific extensions to an existing language. Nvidia used LLVM to create the Nvidia CUDA Compiler , which lets languages add native support for CUDA that compiles as part of the native code you’re generating, instead of being invoked through a library shipped with it.
The typical way to work with LLVM is via code in a language you’re comfortable with (and that has support for LLVM’s libraries, of course).
Two common language choices are C and C++. Many LLVM developers default to one of those two for several good reasons:
Still, those two languages are not the only choices. Many languages can call natively into C libraries, so it’s theoretically possible to perform LLVM development with any such language. But it helps to have an actual library in the language that elegantly wraps LLVM’s APIs. Fortunately, many languages and language runtimes have such libraries, including C#/.Net/Mono , Rust , Haskell , OCAML , Node.js , Go , and Python .
One caveat is that some of the language bindings to LLVM may be less complete than others. With Python, for example, there are many choices, but each varies in its completeness and utility:
If you’re curious about how to use LLVM libraries to build a language, LLVM’s own creators have a tutorial , using either C++ or OCAML, that steps you through creating a simple language called Kaleidoscope. It’s since been ported to other languages:
With all that LLVM does provide, it’s useful to also know what it doesn’t do.
For instance, LLVM does not parse a language’s grammar. Many tools already do that job, like lex/yacc , flex/bison , and ANTLR . Parsing is meant to be decoupled from compilation anyway, so it’s not surprising LLVM doesn’t try to address any of this.
LLVM also does not directly address the larger culture of software around a given language. Installing the compiler’s binaries, managing packages in an installation, and upgrading the tool chain—you need to do that on your own.
Finally, and most important, there are still common parts of languages that LLVM doesn’t provide primitives for. Many languages have some manner of garbage-collected memory management, either as the main way to manage memory or as an adjunct to strategies like RAII (which C++ and Rust use). LLVM doesn’t give you a garbage-collector mechanism, but it does provide tools to implement garbage collection by allowing code to be marked with metadata that makes writing garbage collectors easier.
None of this, though, rules out the possibility that LLVM might eventually add native mechanisms for implementing garbage collection. LLVM is developing quickly, with a major release every six months or so. And the pace of development is likely to only pick up thanks to the way many current languages have put LLVM at the heart of their development process.
This story, "What is LLVM? The power behind Swift, Rust, Clang, and more" was originally published by InfoWorld .