Software is an idea ultimately represented in specific hardware configurations. The immediate medium of this representation, from the programmer's perspective, is the programming language in which the idea is written down. Programming languages have so far been set aside when examining which sensual aspects of source code resulted in what could be deemed a "beautiful" program text. And yet, the relationship between semantics (deep-structure) and its syntactic representation (surface-structure) is framed by programming languages, as they define the legal organization of form.
This section examines the influence of programming languages on the aesthetic manifestations of source code. To do so, we first go over a broad description of programming languages, focusing on what makes a programming language expressive. Second, we touch upon the problem of semantics in programming languages, and how they might differ from a human understanding of semantics. We then we assess their fit as an artistic, expressive system by introducing notions to style and idiomaticity in programming language communities. In so doing, we highlight a couple of computing-specific concepts that are made accessible by programming languages, discussing how different linguistic interfaces propose different representations.
Programming languages
We start by recalling the historical and technical developments of programming languages, relocating them as an interface between hardware and software. With a better technical understanding, this will allow us to pinpoint the overlap and differences between human semantics and machine semantics.
History and developments
A programming language is a strictly-defined set of syntactic rules and symbols for describing instructions to be executed by the processor. The history of programming languages is, in a sense, the history of decoupling the means of creating software from hardware. The earliest programming languages were embedded in hardware itself, such as piano rolls and punched cards for Jacquard looms (
author, year)
. Operating on similar principles, the first electric computers—such as the ENIAC, the UNIVAC or the MUC—still required manual re-wiring in order to implement any change in the algorithm being computed. This process then gave way to programming through the stack of cards fed into the machine, a more modular process which nonetheless retained a definite material aspect. It is with the shift to the stored-program model, at the dawn of the 1950s, that the programs could be written, stored, recalled and executed in their electro(-mecha)nical form, essentially freeing the software result from any immediately physical representation.
This tendency to have software gradually separate from hardware saw a parallel in the development of programming languages themselves. Ultimately, any software instruction needs to execute one of the built-in, hardwired instructions of the processor. Also called machine language , these instructions set describe the specific implementation of the most common operations executed by a computer (e.g. add, move, read, load
, etc.), and are part of the oldest and most direct semantic interface to the hardware. These operations are ultimately represented as binary numbers to the processing unit. To represent these binary combinations, a first layer of a family of languages called Assembly, provides a syntax which is loosely based on English. When read by the CPU, each of these Assembly mnenmonics is converted into binary representation165. Considered today as some of the most low-level code one can write, Assembly languages are machine-dependent, featuring a one-to-one translation from English keywords to the kind of instruction sets known to the processor they are expected to interface with. As such, a program written for a particular architecture of a computer (e.g. x86 or ARM) cannot be executed without any modifications on a another machine.
The first widely acknoweldged high-level language which allowed for a complete decoupling of hardware and software is FORTRAN166. At this point, programmers did not need to care about the specifics of the machine that they were running on anymore, and found more freedom in their exploration of what could be done in writing software, expanding beyond scientific and military applications into the commercial world (seeThe practices of programmers
). Moving away from hand-crafted and platform-specific Assembly code also implied a certain sense of looseness incompatible with the extension of its application domain: widening the problem domain demanded tightening the specification of such languages. As such, FORTRAN167, and the subsquent COBOL, Lisp and ALGOL 58 also started being concerned with the specific definition of their syntax in a non-ambiguous manner to ensure reliability. Using Backus-Naur Form notation, it became possible to formalize their syntactic rules in order to prevent any unexpected behaviour and support rigorous reasoning for the implementation and research of current and subsequent languages. With such specifications, and with the decoupling from hardware, programming languages became, in a way, context-free.
The context-free grammatical basis for programming allowed for the further development of compilers and interpreters, binary programs which, given a syntactically-valid program text, output their machine code representation. Such a machine-code representation can then be executed by the processor168. At this point, a defining aspect of programming languages is their theoretical lack of ambiguity. This need for disambiguation was reflected both in the engineering roots of computation169and in their formal mathematic roots notation170, and was thus a requirement of the further development of functional software engineering.
Nowadays, most programming languages are Turing-complete: that is, their design allows for the implementation of a Turing machine and therefore for the simulation of any possible aspect of computation. This means that any programming language that is Turing-complete is functionally equivalent to any other Turing-complete programming language, creating essentially a chain of equivalency between all programming languages. And yet, programming language history is full of rise and fall of languages, of hypes and dissapointments, of self-claimed beautiful ones and criticized ugly ones, from COBOL to Ada, Delphi and C. This is because, given such a wide, quasi-universal problem set, the decision space requires creative constraints: individual programmers resort to different approaches of writing computational procedures, echoing what Gilles Gaston-Granger undestands as style , as a formal way to approach the production and communication of aesthetic, linguistic and scientific works (
author, year)
. We have already seen one example of such difference in approaching the domain of computation: compilation vs. interpretation. While the input and outputs are the same171, there are pros and cons172to each approach, which in turn allows programmers to bestow value judgments on which on they consider better than the other. Ultimately all programming languages need to address these basic components of computation, but they can do it in the way they want. Such basic components are, according to Milner (
author, year)
:
data
primitive operations
sequence control
data control
storage management
operating environment
This decision to change the way of doing something while retaining the same goal is particularly salient in the emergence of programming paradigms. A programming paradigm is an approach to programming based on a coherent set of principles, sometimes involving mathematical theory or a specific domain of application. Some of these concepts include encapsulation and interfaces (in object-oriented programming), pure function and lacks of side effects (in functional programming), or mathematical logic (in declarative programming). Each paradigm supports a set of concepts that makes it the best for a certain kind of problem (
author, year)
, these concepts in turn act as stances which influence how to approach, represent and prioritize the computational concepts mentioned above, and as tools to operate on their problem domain.
Along with programming paradigms, programming languages also present syntactic affordances for engaging with computational concepts. Nonetheless, this is only one part of the picture: the interpretation of syntax necessarily involves semantics. Machine semantics, as we will see, operate a delicate balance between computational operations and human assumptions.
Machine semantics and human semantics
One of the reasonings behind the formal approach to programming languages is, according to the designers of ALGOL 58, the dissatisfaction with the fact that subtle semantic questions remained unanswered due to a lack of clear description (
author, year)
. If the goal of a program text is to produce a functional and deterministic execution, then programming languages must be syntactically unambiguous, and the compiler must be given a framework to interpret this syntax. The very requirement for semantic representation in program language design is first and foremost due to the fact that:
The first and most obvious point is that whenever someone writes a program, it is a program about something. (
author, year)
The issue that he points out in the rest of his work is that humans and computers do not have the same understanding of what a program text is about. In general, semantics have the properties of aboutness and directedness (they point towards something external to them), and syntax has the property of (local) consistency and combination (they function as a mostly closed system). Looking at programing languages as applied mathematics, in the sense that it is the art and science of constituting complex systems through the manipulation formal tokens, tokens which in turn represent elements in the world of some kind, we arrive at the issue of defining semantics in strictly computer-understandable terms.
In attempting to develop early forms of artificial intelligence in the 1970s, Terry Winograd and Fernando Flores develop a framework for machine cognition as related to human cognition, through the analysis of language-based meaning-making (
Winograd, 1986)
Understanding Computers and Cognition: A New Foundation for Design by Terry Winograd, Fernando Flores, 1986.
. In short, they consider meaning as created by a process of active reading, in which the linguistic form enables interpretation, rather than exclusively conveying information. They further state that interpretation happens through grounding , essentially contextualizing information in order to interpret it and extract meaning. He identifies three different kinds of grounding: experiential, formal, and social. The experiential grounding, in which verification is made by direct observation, relates to the role of the senses in the constitution of the conceptual structures that enable our understanding of the world—also known as the material implementation of knowledge. The formal grounding relies on logical and logical statements to deduce meaning from previous, given statements that are known, which we can see at play in mathematical reasoning. Finally, social grounding relies on a community of individuals sharing similar conceptual structures in order to qualify for meaning to be confirmed. Of these three groundings, programming languages rely on the second.
The reason for the bypassing of experiential and social grounding can be found in one of the foundations of computer science, as well as information science: Claude Shannon's mathematical theory of communication. In it, he postulates the separation of meaning from information, making only the distinction between signal and noise. Only formal manipulation of signal can then reconstitute meaning173. We think of computers as digital machines but they can also be seen as only the digital implementation of the phenomenon of computation. Indeed, according to Brian Cantwell Smith, computing is meaning mechanically realized , due to the fact that the discipline has both mechanical and non-mechanical lineages (
Smith, 2016)
AoS V1·C0: Introduction by Brian Cantwell Smith, 2016. [link]
. It is therefore through formal logic that one can recreate meaning through the exclusive use of the computer.
This machine meaning is also represented through several layers. A computer is a collection of layers, each defining different levels of machines, with different semantic capabilities. First, it is a physical machine, dealing with voltage differences. These voltage differences are then quantized into binary symbols, in order to become manipulable by a logical machine. From this logical machine is built an abstract machine, which uses logical grounding in order to execute specific, pre-determined commands. The interpretation of which commands to execute, however, leaves no room for the kind of semantic room for error that humans exhibit (particularly in hermeneutics). It is a strictly defined mapping of an input to an output, whose first manifestation can be found in the symbols table in Turing's seminal paper (
Turing, 1936)
On Computable Numbers, with an Application to the Entscheidungsproblem by Alan Turing, 1936.
. The abstract machine, in turn, allows for high-level machines (or, more precisely, high-level languages which can implement any other abstract machine). These languages themselves have linguistic constructs which allow the development of representational schemes for data (i.e. data structures such as structs, lists, tuples, objects
, etc.). Finally, the last frontier, so to speak, is the problem domain: the thing(s) that the programmer is talking about and intends to act upon. Going back down the ladder of abstractions, these entities in the problem domain are then represented in data structures, manipulated through high-level languages, processed by an abstract machine and executed by a logical machine which turns these pre-established commands into voltage variations.
The problem domain is akin to a semantic domain, a set of related meaningful entities, operating within a specific context, and which a particular syntax refers to. Yet, there is only one context which the computer provides: itself. Within this unique context, semantics still hold a place in any programming language textbook, and is addressed regularly in programming language research. Concretely, semantics in computer programming focuses on how variables and functions should behave in relation to one another (
author, year)
. Given the statement l := j + p
, the goal of programming language semantics is to deduce what is the correct way to process such a statement; there will be different ways to do so depending on the value and the type of the j
and p
variables. If they are strings, then the value of j
will be their concatenation, putting one next to the other. If they are numbers, it will be their addition, and so on.
This problem of determining which operation should take place given a particular type of variables requires the reconciliation of the name of entities, tokens in source code, with the entities themselves, composed of a value and a type. The way this is achieved is actually quite similar to how syntax is dealt with. The compiler (or interpreter), after lexical analysis, constructs an abstract syntax tree (AST) representation of the statement, separating it, in the above case, in the tokens: l
, :=
, j
, +
and p
. Among these, :=
and +
are considered terminal nodes, or leaves, while the other values still need to be determined. The second pass represents a second abstract syntax tree through a so-called semantic analysis, which then decorates the first tree, assigning specific values (attributes) and types to the non-terminal nodes, given the working environment (e.g. production, development, test). This process is called binding , as it associates (binds) the name of a variable with its value and its type.
Semantics is thus the decoration of parsed ASTs, evaluating attribute—which can be either synthesized or inherited. Since decoration is the addition of a new layer (a semantic layer) on top of a base layer (a syntactic one), but of a similar tree form, this leads to the use of what can be described as a meta-syntax tree .
Regarding when the values are being bound, there are multiple different binding times, such as language-design time (when the meaning of +
is defined), compile time, linker time, and program-writing time. It is only during the last one of these times, that the programmer inserts their own interpretation of a particular meaning (e.g. j := "jouer"
, meaning one of the four possible actions to be taken from the start screen of a hypothetical video game). Such a specific meaning is then shadowed by its literal representation (the five consecutive characters which form the string) and its pre-defined type (here, it would be the string
type, although different languages have different terms to refer to the same consecutive list of alphanumeric characters).
Ultimately, this process shows that the meaning of a formal expression can, with significant difficulty and clumsiness, nonetheless be explained; but the conceptual content still eludes the computer, varying from the mundane (e.g. a simple counter) to the almost-esoteric (e.g. a playful activity). Even the most human-beautiful code cannot force the computer to deal with new environments in which meaning has, imperceptibly, changed. Indeed,
In programming languages, variables are truly variable, whereas variables in mathematics are actually constant (
Wirth, 2003)
The Essence of Programming Languages by Niklaus Wirth, 2003.
.
This implies that the content of the variables, when set during program-writing time, might throw off the whole interpretative process of the computer. In turn, this would transform a functional program into a buggy one, defeating the very purpose of the program. While programming languages are rigorously specified, they are nonetheless designed in a way that leaves space for the programmer's expressivity.
At this point, the only thing that the computer does know that the programmer does not is how the code is represented in an AST, and where in physical memory is located the data required to give meaning to that tree (
author, year)
. We might hypothesize that beautiful code, from the computer's perspective, is code which is tailored to its physical architecture, a feat which might only be realistically available when writing in Assembly, with deep knowledge of the hardware architecture being worked on.
Just like some human concepts that are complicated to make the computer on its own terms, there are also computer concepts that are hard to grasp for humans. As we've seen with software patterns, what also matters to programming languages is not just their design, but their situated use:
It must be a pleasure and a joy to work with a language, at least for the orderly mind. The language is the primary, daily tool. If the programmer cannot love his tool, he cannot love his work, and he cannot identify himself with it. (
Wirth, 2003)
The Essence of Programming Languages by Niklaus Wirth, 2003.
While there is only one version of how the computer interprets instructions, it is through programming languages that both form and content, syntax and semantics are made accessible to the programmer. Within computation as a whole, a plethora of programming languages exist, designed by humans for humans, differentiating themselves by how the representations they afford guide the programmer in reading and writing source code.
Qualities of programming languages
All programming languages stem from and relate to a single commonality—Turing-completeness and data processing—, and yet these linguistic interfaces nonetheless offer many approaches to performing computation, including a diversity and reliability of functional affordances and stylistic phrasing. Since diversity within equivalence supports qualified preference, we can now examine what makes a programming language good—i.e. receive a positive value judgment—before turning to the question of the extent to which a good programming language enables the writing of good program texts.
Every programming language of practical use takes a particular approach to those basic components, sometimes backed by an extended rationale (e.g. ALGOL 68), or sometimes not (e.g. JavaScript). In the case in which one is circumscribed to context-free grammars, it would be possible to optimize a particular language for a quantifiable standard (e.g. compile time, time use, cycles used). And still, as computers exist to solve problems beyond their own technical specifications, such problems are diverse in nature and therefore necessitate different approaches174. These different approaches to the problem domain are in turn influenced the development of different programming languages and paradigms, since a problem domain might have different data representations (e.g. objects, text strings, formal rules, dynamic models, etc.) or data flows (e.g. sequential, parallel, non-deterministic). For instance, two of the early programming languages, FORTRAN and Lisp, addressed two very different problem domains: the accounting needs of businesses and the development of formal rules for artificial intelligence, respectively. Within programming languages, there is room to distinguish better ones and worse ones, based on particular qualities, and given standards.
What makes a good programming language is a matter which has been discussed amongst computer scientists, at least since the GOTO
statement has been publicly considered harmful (
Dijkstra, 1968)
Letters to the editor: Go to statement considered harmful by Edsger W. Dijkstra, 1968. [link]
, or that the BASIC language is damaging to one's cognitive abilities175. Some of these discussions include both subjective arguments over preferred languages, as well as objective arguments related to performance and ease-of-use (
Gannon, 1975)
The impact of language design on the production of reliable software by J. D. Gannon, J. J. Horning, 1975. [link]
. According to Pratt and Zelkowitz:
The difference among programming languages are not quantitative differences in what can be done, by only qualitative differences in how elegantly, easily and effectively things can be done. (
author, year)
As a concrete example, one can turn to Brian Kernighan's discussion of his preferences between the language PASCAL and C (
Kernighan, 1981)
Why Pascal is Not My Favorite Programming Language by Brian W. Kernighan, 1981. [link]
. Going through the generic features of a programming languages, he comments on the approaches taken by the programming languages on each of these. He professes his preference for the C language, based on their shared inclination for strong typing176, explicit control flow, cosmetic annoyances and his dislike for an environment in which " considerable pains must be taken to simulate sensible input " (
Kernighan, 1981)
Why Pascal is Not My Favorite Programming Language by Brian W. Kernighan, 1981. [link]
. Nonetheless,he acknowledges that PASCAL can nonetheless be a toy language suitable for teaching, thus pointing again the context-dependence of value judgments in programming.
While this example reveals that individual preferences for programming languages can be based on objective criteria when compared to what an ideal language should be able to achieve, Turing-completeness offers an interesting challenge to the Sapir-Whorf hypothesis—if natural languages might only weakly affect the kinds of cognitive structures speakers of those languages can construct, programming languages are claimed to do so to large extents. For instance, Alan Perlis's famous Epigrams on Programming mentions that " A language that doesn't affect the way you think about programming, is not worth knowing. " (
Perlis, 1982)
Special Feature: Epigrams on programming by Alan J. Perlis, 1982. [link]
. These differences in the ways of doing illustrates how different programming languages are applicable to different domains and different styles of approaching those domains. They do so through different kinds of notations—different aesthetic features—when it comes to realizing the same task.
Of the two programs presented in
hello-ruby
puts"hello"
- A terse example of writing a string to an output in Ruby.
- A verbose approach to writing a string to an output in Java.
is written in Java, designed by James Gosling, both in the mid-1990s. While Ruby is dynamically-typed, interpreted, Java is a statically-typed and compiled language, and both include garbage collection and object-orientation. These two snippets are obviously quite dissimilar at first glance, as the Ruby listing only includes one reserved keyword177, puts
, while the Java listing involves a lot more lexical scaffolding, including class and function declaration.
From a language design perspective, Robert Sebesta suggests three main features of programming languages in order to be considered good: abstraction , simplicity and orthogonality (
author, year)
. From the two snippets, we now explore some of the most important criteria in programming language design, and how they could underpin the writing of good programs.
Abstraction
Abstraction is the ability of the language to allow for the essential idea of a statement to be expressed without being encumbered by specifics which do not relate directly to the matter at hand, or to any matter at all. Programming languages which facilitate abstraction can lead to more succint code, and tend to hide complexity (of the machine, and of the language), from the programmer, allowing her to move between different levels of reasoning. For instance, the Java snippet in
- A verbose approach to writing a string to an output in Java.
explicitly states the usage of the System
object, in order to access its out
attribute, and then call its println()
method. While a lot of code here might seem verbose, or superfluous, it is in part due to it being based on an object-oriented paradigm. However, out
object itself might seem to go particularly contrary to the requirement of programming languages to abstract out unnecessary details: println()
is a system call whose purpose is to write something on the screen, and therefore already implicitly relates to the output; one shouldn't have to specify it explicitly.
In contrast, Ruby entierly abstracts away the system component of the print call, by taking advantadge of its status as an interpreted language: the runtime already provides such standard features of the language. Printing, in Java, does not abstract away the machine, while printing, in Ruby, hides it in order to focus on the actual appearance of the message. Another abstraction is that of the language name itself from the import statements. When we write in Java, we (hopefully) know that we write in Java, and therefore probably assume that the default imports come from the Java ecosystem—there shouldn't be any need to explicitly redeclare it. For instance, System.out.println()
isn't written java.io.System.out.println()
. Meanwhile, the Ruby listing makes implicit the necessary declaration of require ".../lib/ruby/3.1.0"
, allowing the programmer to focus, through visual clarity, on the real problem at hand, which the logic of the program being written is supposed to address. In this direction, languages which provide more abstraction (such as Ruby), or which handle errors in an abstract way (such as Perl) tend to allow for greater readability by focusing on the most import tokens, rather than aggregating system-related and operational visual clutter—also called verbosity.
Related to abstraction is the approach to typing , the process of specifiying the type of a variable or of a return value (such as integer, string, vector, etc.). A strictly-typed language such as C++ might end up being harder to read because of its verbosity, while a type-free language might be simpler to read and write, but might not provide guarantees of reliability when executed. The tradeoff here is again between being explicit and reliable, and being implicit, subtle, and dangerous (such as JavaScript's very liberal understanding of typing). In some instances, typing can usually be inferred by typographical details: Python's boolean values are capitalized ( True, False
), and the difference difference between string and byte in Go is represented by the use of double-quotes for the former and single-quotes for the latter. In the case above, explicitly having to mention that greeting
is of type String
is again redundant, since it is already hinted at by the double-quotes. Ruby does not force programmers to explicitly declare variable types (they can, if they want to), but in this case they let the computers do the heavy lifting of specifying something that is already obvious to the programmer, through a process called dynamic typing.
A particularly note-worthy example of an elegant solution to the tradeoff between guarantee of functionality (safety) and readability can be found in some programming languages handling of values returned by functions, such as in the Go listing in
- Go proposes an elegant way of ignoring certain variables, with the use of the underscore token.
The _
character which we see on the first line is the choice made by Go's designers to force the user to both acknowledge and ignore the value that is returned by calling the function getNumber()
. This particular character, acting as an empty line, represents absence , not cluttering the layout of the source, while reminding subtly of the potential of this particular statement to go wrong and crash the program. Conversely, the functionally equivalent code written in JavaScript and shown in
multiple-returns-js
let getNumbers = () => {
return [1, 2.0, 3]
}
numbers = getNumbers()
first = numbers[0]
second = numbers[2]
- JavaScript does not have any built-in syntax to ignore certain variables, resulting in more cumbersome code.
does not have this semantic feature (a variable named _
is still a valid name), and thus requires additional steps to reach the same result.
multiple-returns-js
let getNumbers = () => {
return [1, 2.0, 3]
}
numbers = getNumbers()
first = numbers[0]
second = numbers[2]
- JavaScript does not have any built-in syntax to ignore certain variables, resulting in more cumbersome code.
Abstraction in programming languages is therefore a tradeoff between explicitly highlighting the computer concern (how to operate practically on some data or statement), and hiding anything but the human concern (whether or not that operation is of immediate concern to the problem at hand at all). As such, languages which offer powerful abstractions tend not to stand in the way of the thinking process of the programmer. This particular example of the way in which Go deals with non-needed values is a good example of the designer's explicit stylistic choice.
Orthogonality
Orthogonality is the affordance for a language to offer a small set of simple syntactics constructs which can be recombined in order to achieve greater complexity, while remaining independent from each other178. A direct consequence of such a feature is the ease with which the programmer can familiarize themselves with the number of constructs in the language, and therefore their ease in using them without resorting to the language's reference, or external program texts under the form of packages, libraries, etc. The orthogonality of a language offers a simple but powerful solution to the complexity of understanding software. Importantly, an orthogonal programming language must make sure that there are no unintended side-effects, such that each program token's action is independent from each other. The functionality of a statement thus comes not just from the individual keywords, but also from their combination.
For instance, the language Lisp treats both data and functions in a similar way, essentially allowing the same construct to be recombined in powerful and elegant ways. To the beginner, however, it might prove confusing to express whole problem domains exclusively with lists. Conversely, the Ruby language makes every data type (themselves abstracted away) an object, therefore making each building block a slightly different version of each other, providing less orthogonality. The silver lining from Ruby's design choice is that it allows for greater creativity in writing code, since everything is an object, which elicits a feeling of familiarity. In turn, this makes the language more habitable, if more uncertain179.
Orthogonality implies both independence, since all constructs operate distinctly from each other, while remaining related, and cooperation with each other, because their functional restrictions requires that be used in conjunction with one another. This offers a solution to the cognitive burden of programs, in which data can end up being tangled in a non-linear execution, and become ungraspable. This unreadability is triggered, not by verbosity, but because of the uncertainty of, and confusion about, the potential side-effects caused by any statement. Doing one thing, and doing it well, is a generally-accepted measure of quality in software development practices.
Such independence in programming constructs also presents a kind of symmetry —a well-accepted aesthetic feature of any artefact—, in that each construct is similar, not in their functionality, but in the fact that their self-contained parts of an orthogonal systems, and therefore share the same quality. This similarity eases the cognitive friction in writing and reading code since an orthogonal language allows the programmer to rely on the fact that everything behaves as stated, without having to keep track of a collection of quirks and arbitrary decisions180.
Finally, one of the consequences of different amounts of orthogonality is the shift from computer semantic interpretation to human interpretation. Non-orthogonality implies that the compiler (as a procedural representation of the language) has the final say in what can be expressed, reifing seemingly arbitrary design choices, and requiring cognitive effort from the programmer to identify these unwanted interactions, while orthogonal languages leave more leeway to the writer in focusing on the interaction of all programming constructs used, rather than on a subset of those interactions which does not relate to the program's intent.
Simplicity
Both of these features, abstraction and orthogonality, ultimately relate to simplicity. As Ryan Stansifer puts it:
Simplicity enters in four guises: uniformity (rules are few and simple), generality (a small number of general functions provide as special cases a host of more specialized functions, orthogonality), familiarity (familiar symbols and usages are adopted whenever possible), and brevity (economy of expression is sought). (
author, year)
The point of a simple programming language is to not stand in the way of the program being written, or of the problem being addressed. From a language design perspective, simplicity is achieved by letting the programmer do more (or as much) with less, recalling definitions of elegance. This means that the set of syntactical tokens exposed to the writer and reader combine in sufficient ways to enable desired expressiveness, and thus relating back to orthogonality181.
Moving away from broad language design, and more specific applications, the goal of simplicity is also achieved by having accurate conceptual mappings between computer expression semantics and human semantics (refer toThe psychology of programming
for a discussion of mappings). If one is to write a program related to an interactive fiction in which sentences are being input and output in C, then the apparently simple data structure char
of the language reveals itself to be cumbersone and complex when each word and the sentence that the programmer wants to deal with must be present not as sentences nor words, but as series of char
182. A simple language does not mean that it is easy183. By making things simple, but not too simple (
UNKNOWN AUTHOR, 2009)
Masterminds Of Programming by Federico Biancuzzi, Shane Warden, 2009. [link]
, it remains a means to an end, akin to any other tool or instrument184.
A proper combination of orthogonality, abstraction and simplicity results, once more, in elegance. Mobilizing the architectural domain, the language designer Bruce McLennan further presses the point:
There are other reasons that elegance is relevant to a well-engineered programming language. The programming language is something the professional programmer will live with - even live in. It should feel comfortable and safe, like a well-designed home or office; in this way it can contribute to the quality of the activities that take place within it. Would you work better in an oriental garden or a sweatshop? (
author, year)
Programming languages are thus both tools and environments, and moreover eminently symbolic , manipulating and shaping symbolic matter. Looking at these languages from a Goodmanian perspective provides a backdrop to examine their communicative and expressive power. From the perspective of the computer, programming languages are unambiguous insofar as any expression or statement will ultimately result in an unambiguous execution by the CPU (if any ambiguity remains, the program does not compile, the ambiguity gets resolved by the compiler, or the program crashes during execution). They are also syntactically disjointed (i.e. clearly distinguishable from one another), but not semantically: two programming tokens can have the same effect under different appearances. The use of formal specifications aims at resolving any possible ambiguity in the syntax of the language in a very clear fashion, but fashionable equivalence can come back as a desire of the language designer. The semantics of programming languages, as we will see below, also aim at being somewhat disjointed: a variable cannot be of multiple types at the exact same time, even though a function might have multiple signatures in some languages. Finally, programming languages are also differentiated systems since no symbol can refer to two things at the same time.
The tension arises when it comes to the criteria of unambiguity, from a human perspective. The most natural-language-like component of programs, the variable and function names, always have the potential of being ambiguous185. We consider this ambiguity both a productive opportunity for creativity, and a hindrance for program reliability. If programming languages are aesthetic symbol systems, then they can allow for expressiveness, first and foremost of computational concepts. It is in the handling of particularly complex concepts that programming languages also differentiate themselves in value. The differences in programming language design and us thus amounts to differences in style. In the words of Niklaus Wirth:
Stylistic arguments may appear to many as irrelevant in a technical environment, because they seem to be merely a matter of taste. I oppose this view, and on the contrary claim that stylistic elements are the most visible parts of a language. They mirror the mind and spirit of the designer very directly, and they are reflected in every program written. (
Wirth, 2003)
The Essence of Programming Languages by Niklaus Wirth, 2003.
Idiosyncratic implementations
Software, as an abstract artifact, can be understood at the physical, design and intentional levels (
author, year)
. With modern programming languages allowing us to safely ignore the hardware level, it is at the interaction of the design (programming) and intentional (human) level that things get complicated; all programming languages can do the same thing, but they all do it in a slightly different way186. In order to illustrate the expressivity of programming languages, we highlight three programming concepts which are innate to any modern computing environment, and yet relatively complex to deal with for humans: iterating , referencing and threading .
The first and the most straightforward example is iteration, or the process of counting through the items of a list. Since, ultimately, all program text is organized as continuours series of binary encodings, going through such a list in a fundamental operation in programming. Different implementations of such an operation are shown in
iterating-c
#include<stdio.h>intmain(){
int max_count = 5;
structint my_list[max_count] = {2046, 2047, 2048, 2049, 2050};
for (int i = 0; i < max_count; i++)
{
printf("%d", my_list[i]);
}
}
- Iterating in C involves keeping track of an iterating counter and knowing the maximum value of a list beforehand.
for the C language and in
iterating-py
my_list= [2046, 2047, 2048, 2049, 2050]
for item in my_list:print(item)
- Iterating in Python is done through a specific syntax which abstracts away the details of the process.
.
iterating-c
#include<stdio.h>intmain(){
int max_count = 5;
structint my_list[max_count] = {2046, 2047, 2048, 2049, 2050};
for (int i = 0; i < max_count; i++)
{
printf("%d", my_list[i]);
}
}
- Iterating in C involves keeping track of an iterating counter and knowing the maximum value of a list beforehand.
iterating-py
my_list= [2046, 2047, 2048, 2049, 2050]
for item in my_list:print(item)
- Iterating in Python is done through a specific syntax which abstracts away the details of the process.
This comparison shows how a similar function can be performed via different syntaxes. Particularly, we can see how the Python listing implies a more human-readable syntax, getting rid of machine-required punctuation, and thus facilitating the pronounciation out loud. In contrast, the C listing states the parts of the loop in an order that is not intuitive to human comprehension. Read out loud, the C listing would be equivalent to " For an index named i starting at 0, and while i is less than a value named max\_count, increase i by one on each iteration ", which focuses more on the index management than on the list itself; while the Python listing would read " for an item in my list ", much more concise and expressive.
Referencing is a more complex problem than iterating187. It is a surface-level consequence of the use-mention problem referred to above, the separation between a name and its value, with the two being bound together by the address of the physical location in memory. As somewhat independent entities, it is possible to manipulate them separately, with consequences that are not intuitive to grasp. For instance, when one sees the name of a variable in a program text, is the name referencing the value of the variable, or the location at which this value is stored? Here, we need a mark which allows the programmer to tell the difference. Programming language notation attempts at remediating those issues by offering symbols to represent these differences, as we can see in
references-c
intdate = 2046; // `date` refers to the literal value of the number 2046int *pointer = &date; // `pointer` refers to the address where the value of `date` is stored, e.g. 0x5621
*pointer = 1996; // this accesses the value located at the memory address held by `pointer` (0x5621) and sets it to 1996
std::cout << date; // prints the literal value of date, at the address 0x5621: 1996
- Pointers involve a non-straightforward way to reason about values.
.
references-c
intdate = 2046; // `date` refers to the literal value of the number 2046int *pointer = &date; // `pointer` refers to the address where the value of `date` is stored, e.g. 0x5621
*pointer = 1996; // this accesses the value located at the memory address held by `pointer` (0x5621) and sets it to 1996
std::cout << date; // prints the literal value of date, at the address 0x5621: 1996
- Pointers involve a non-straightforward way to reason about values.
The characters *
and &
are used to signal that one is dealing with a variable of type pointer, and that one is accessing the pointed location of a variable, respectively. Line 2 of the snippet above is an expression called dereferencing , a neologism which is perhaps indicative of the lack of existing words for referring to that concept. In turns, this hints at a lack of conventional conceptual structures to which we can map such a phenomenon, showing some of the limits of metaphorical tools to think through concepts.
Meanwhile, Ruby syntax does not allow the programmer to directly manipulate pointers, so two variables would actually be referring to the same data. The design decision here is not to allow the programmer to make the difference between a reference and an actual value, and instead prefer that the programmer constructs programs which, on one side, might be less memory-efficient but are, on the other side, easier to read and write, since variable manipulation only ever occurs in one single way—through reference.
Notation does not exclusively operate at the surface level. Some programming languages signify, by their use of the above characters, that they allow for this direct manipulation, through something called pointer arithmetic 188. Indeed, the possibility to add and substract memory locations independent of the values held in these locations, as well as the ability to do arithmetic operations between an address and its value isn't a process whose meaning comes from a purely experiential or social perspective, but rather exists meaningfully for humans only through logical grounding, by understanding the theoretical architecture of the computer. What also transpires from these operations is another dimension of the non-linearity of programming languages, demanding complex mental models to be constructed and updated to anticipate what the program will ultimately result in when executed.
Threading is the ability to do multiple things at the same time, in parallel. The concept itself is simple, to the point that we take it for granted in modern computer applications since the advent of time-sharing systems: we can have a text editor take input and scan that input for typos at the same time, as well as scanning for updates in a linked bibliography file. However, the proper handling of threading when writing and reading software is quite a complex task189.
First, every program is executed as a process. Such a process can then create children subprocesses for which it is responsible. From the hardware standpoint, unpredictability arises from the fact that CPU cores will run different threads of the same process, and yet, as they are under different loads, some processes will get done faster at times and later at other times. The task of the programmer involves figuring out how do the children process communicate information back to the parent process, how do they communicate between each other, and how does the parent process make sure all the children process have exited before exiting itself.
This involves the ability to demultiply the behaviour of routines (whose execution is already non-linear) to keep track of what could be going on at any point in the execution of the program, including use and modification of shared resources, the scheduling of thread start and end, as well as synchronization of race conditions (e.g. if two things happen at the same time, which one happens first, such that the consistence of the global state is preserved?).
For instance, we can look at printing numbers at a random interval. As seen in the non-threaded example in
- A sequential execution of a Go program, with random timeouts. The order of the output is guaranteed, but not its timing.
, it is somewhat deterministic since we know that 2045
will alway print before 2046
. In the threaded equivalent in
threading-go
package main
import (
"fmt""math/rand""time"
)
funcrecall(date int) {
random_delay := (rand.Int() % 5) + 1
time.Sleep(time.Second * time.Duration(random_delay))
fmt.Println(date)
}
funcmain() {
go recall(2045)
go recall(2046)
fmt.Println("We're done!")
}
/*
-- possible output #1:
2045
2046
We're done!
-- possible output #2:
2046
2045
We're done!
*/
- A concurrent execution of a Go program, with random timeouts. Neither the order nor the timing of the output is guaranteed. The keyword when calling the functions instructs the program to run the function in parallel.
- A sequential execution of a Go program, with random timeouts. The order of the output is guaranteed, but not its timing.
Nonetheless, the threading syntax in
threading-go
package main
import (
"fmt""math/rand""time"
)
funcrecall(date int) {
random_delay := (rand.Int() % 5) + 1
time.Sleep(time.Second * time.Duration(random_delay))
fmt.Println(date)
}
funcmain() {
go recall(2045)
go recall(2046)
fmt.Println("We're done!")
}
/*
-- possible output #1:
2045
2046
We're done!
-- possible output #2:
2046
2045
We're done!
*/
- A concurrent execution of a Go program, with random timeouts. Neither the order nor the timing of the output is guaranteed. The keyword when calling the functions instructs the program to run the function in parallel.
allows the programmer to keep their mental modal of a function execution, while the threading syntax in C, shown in
- In C, the syntax to write thread, and the representation of the concept, is more verbose, as it forces separate variable declaration, separate creation and join, and specific positional arguments.
, creates a lot more cognitive overhead, by declaring specific types, calling a specific function with unknown arguments, and then manually closing the thread afterwards.
threading-go
package main
import (
"fmt""math/rand""time"
)
funcrecall(date int) {
random_delay := (rand.Int() % 5) + 1
time.Sleep(time.Second * time.Duration(random_delay))
fmt.Println(date)
}
funcmain() {
go recall(2045)
go recall(2046)
fmt.Println("We're done!")
}
/*
-- possible output #1:
2045
2046
We're done!
-- possible output #2:
2046
2045
We're done!
*/
- A concurrent execution of a Go program, with random timeouts. Neither the order nor the timing of the output is guaranteed. The keyword when calling the functions instructs the program to run the function in parallel.
- In C, the syntax to write thread, and the representation of the concept, is more verbose, as it forces separate variable declaration, separate creation and join, and specific positional arguments.
Threading shows how the complexity of a deep-structure benefits to be adequately represented in the surface. Once again, aesthetically-satisfying (simple, concise, expressive) notation can help programmers in understanding what is going on in a multi-threaded program, by removing additional cognitive overload generated by verbosity.
Here, we see how the abstraction provided by some language constructs in Go result in a simpler and more expressive program text. In this case, the non-essential properties of the thread are abstracted away from programmer concern. The double-meaning embedded in the go
keyword even uses a sensual evokation of moving away (from the main thread) in order to stimulate implicit understanding of what is going on. Meanwhile, the version written in C includes the necessary headers at the top of the file, the explicit type declaration when starting the thread, the call to pthread_create
, without a clear idea of what the p
stands for, as well as the final join()
method call in order to make sure that the parallel thread returns to the main process, and does not create a memory leak in the program once it exits. While both behaviours are the same, the syntax of Go allows for a cleaner and simpler representation.
Programming languages aim at helping programmers solve semantic issues in the problem domain through elegant syntactical means while reducing unnecessary interactions with the underlying technical system. These styles also have a functional component, as we have seen how languages differ in the ways in which they enable the programmer's access to and manipulation of computational actions. Beyond a language designer's perspective, there also exists a social influence on how a source code should be written according to its linguistic community.
Styles and idioms in programming
Concrete use of programming languages operate on a different level of formality: if programming paradigms are top-down strategies specified by the language designers, they are also complemented by the bottom-up tactics of softare developers. Such practices crystallize, for instance, in idiomatic writing . Idiomaticity refers, in traditional linguistics, to the realized way in which a given language is used, in contrast with its possible, syntactically-correct and semantically-equivalent, alternatives. For instance, it is idiomatic to say "The hungry dog" in English, but not "The hungered dog" (a correct sentence, whose equivalent is idiomatic in French and German)190. It therefore refers to the way in which a language is a social, experiential construct, relying on intersubjective communication (
author, year)
. Idiomaticity is therefore not a purely theoretical feature, but first and foremost a social one. This social component in programming languages is therefore related to how one writes a language "properly". In this sense, programming language communities are akin to hobbyists clubs, with their names191meetups, mascots, conferences and inside-jokes. Writing in a particular language can be due to external requirements, but also to personal preference:
I think a programming language should have a philosophy of helping our thinking, and so Ruby's focus is on productivity and the joy of programming. Other programming languages, for example, focus instead on simplicity, performance, or something like that. Each programming language has a different philosophy and design. If you feel comfortable with Ruby's philosophy, that means Ruby is your language. (
Matsumoto, 2019)
Yukihiro Matsumoto: "Ruby is designed for humans, not machines" by Yukihiro Matsumoto, 2019. [link]
So an idiom in a programming language depends on the social interpretation of the formal programming paradigms192. Such an interpretation is also manifested in community-created and community-owned documents.
PEP 20, is one of such documents. Informally titled The Zen of Python , it shows how the philosophy of a programming language relates to the practice of programming in it193 (
Peters, 1999)
. Without particular explicit directives, it nonetheless highlights attitudes that one should keep in mind and exhibit when writing Python code. Such a document sets the mood and the priorities of the Python community at large (being included in its official guidelines in 2004), and highlights a very perspective on the priorities of theoretical language design. For instance, the first Zen is clearly states the priorities of idiomatic Python:
Beautiful is better than ugly. (
Peters, 2004)
PEP 20 – The Zen of Python peps.python.org by Tim Peters, 2004. [link]
This epigram sets the focus on a specific feature of the code, rather than on a specific implementation. With such a broad statements, it also contributes to strengthening the community bonds by creating shared values as folk knowledge. In practice, writing idiomatic code requires not only the awareness of the community standards around such an idiomaticity, but also knowledge of the language construct themselves which differentiate it from different programming languages. In the case of PEP20 quoted about, one can even include it inside the program text with import this
, showing the tight coupling between abstract statements and concrete code. For instance, in
range-operator
# idiomaticfor i in range(5):
print(i)
# generic for i in [0, 1, 2, 3, 4, 5]:
print(i)
- These two range operators are semantically equivalent in Python, but the first is more idiomatic than the second.
, distinct syntactical operators are semantically equivalent but only the second example is considered idiomatic Python, partly because it is specific to Python, and because it is more performing than the first example, due to the desire of the developers of Python to encourage idiomaticity; that is, what they consider good Python to be.
range-operator
# idiomaticfor i in range(5):
print(i)
# generic for i in [0, 1, 2, 3, 4, 5]:
print(i)
- These two range operators are semantically equivalent in Python, but the first is more idiomatic than the second.
Beautiful code, then seems to be a function of knowledge, not just of what the intent of the programmer is, but knowledge of the language itself as a social endeavour. We can see in
fibonacci
@lru_cache(3)deffib(n):
return n if n < 2else fib(n - 1) + fib(n - 2)
- The decorator is the idiotmatic way to calculate the sum of the Fibonacci sequence.
(
Schmitz, 2015)
What makes some code "beautiful"? by Brian Schmitz, 2015. [link]
a more complex example of beautiful, because idiomatic, Python code.
fibonacci
@lru_cache(3)deffib(n):
return n if n < 2else fib(n - 1) + fib(n - 2)
- The decorator is the idiotmatic way to calculate the sum of the Fibonacci sequence.
(
Schmitz, 2015)
What makes some code "beautiful"? by Brian Schmitz, 2015. [link]
This function calculates the Fibonacci sequence (a classic exercise in computer programming), but makes an idiomatic (and clever) use of decorators in Python. The @lru_cache(3)
line caches the last 3 results in the least-recently used order, closely mirroring the fact that the Fibonacci sequence only ever needs to compute the terms n, n-1 and n-2, reducing computational complexity, but at the expense of added complexity for non Pythonistas. Through this, the programmer uses a key, advanced feature of the language in order to make the final program more terse, more precise, and mirroring more faithfully the problem than other implementations, to the detriment of a decrease in readability for non-Pythonistas.
Idiomaticity reflects what the social and aesthetic intent of the language designers and implementers. Notation matters, and designers want to encourage good practices through good notations, assuming that programmers would gravitate towards what is both the most efficient and the best-looking solution.
And it's not hard to "prove" it: If two people write code to solve the same problem and one makes a terrible spaghetti monster in COBOL while the other goes for super-elegant and highly abstracted solution in Haskell, does it really matter to the computer? As long as the two are compiled to the same machine code, the machine does not care. All the clever constructs used, all the elegance, they are there only to guide our intuition about the code. (
Sustrik, 2021)
On the Nature of Programming Languages by Martin Sustrik, 2021. [link]
Another way to encourage writing good code is through the addition of syntactic sugar . Syntactic sugar describes the aesthetic features of the language who are variants of a similar computational feature, and where the only difference between them is their appearance—i.e. visual, semantic shortcuts. The looping examples above are good instances of syntactic sugar, albeit with performance differences. The Ruby language is riddled with syntactic sugar, and highlights how syntactic sugar can "sweeten" the reading process, aiming for more clarity, conciseness, and proximity to natural languages. In Ruby, to access a boolean value on an attribute of an object, one would write it as in any other language. The added syntactic sugar in Ruby comes in the form of the question mark in control flow statements, as shown in
ruby-alive
if Being.alive
puts "and well"
if Being.alive?
puts "and well"
- Ruby features a lot of syntactic sugar. For instance, one can add the at the end of a method call in order to signify more clearly the boolean nature of the return value. Other languages tend to disallow the use of special characters in method names.
, or exclamation mark in method calls, to highlight the destructive nature of the method.
ruby-alive
if Being.alive
puts "and well"
if Being.alive?
puts "and well"
- Ruby features a lot of syntactic sugar. For instance, one can add the at the end of a method call in order to signify more clearly the boolean nature of the return value. Other languages tend to disallow the use of special characters in method names.
In C, syntactic sugar includes my_array[i]
to access the i
th element of the array my_array
, rather than the more cryptic *(my_array + i)
. In Python, opening a file could be written as f = open("notes.md")
, but it also proposes the syntactic sugar of with open("notes.md") as f:
, which consists in a block which both opens the file, and implicitly closes it at the end of the block.
There are absolutely no functional differences in the statements above, and the question mark is just here to make the code seem more natural and intuitive to humans. Checking for a boolean (or non-nil value) in an if statement is, in the end, the equivalent of asking a question about that value. Here, Ruby makes that explicit, therefore making it easier to read with the most minimal amount of additional visual noise (i.e. one character).
We have seen how programming languages can be subjected to aesthetic judgment, but those aesthetic criteria are only there to ultimately support the writing of good (i.e. functional and beautiful) code. Such a support exists via design choices (abstraction, orthogonality, simplicity), but also through the practical uses of programming languages, notably in terms of idiomaticity and of syntactic sugar, allowing some languages more readability than others. Like all tools, it is their (knowledgeable) use which matters, rather than their design, and it is the problems that they are used to deal with, and the way in which they are dealt with which ultimately informs whether or not a program text in that language will exhibit aesthetic features.
This concept of appropriateness also relates to material honesty. As seen inMaterial knowledge
, the fact that a programmer tends to identify their practice with craft implies that they work with tools and materials. Programming languages being their tools, and computation the material, one can extend to the concept of material honesty to the source code (
author, year)
. In this case, working with, and in respect of, the material and tools at hand is a display of excellence in the community of practitioners, and results in an artefact which is in harmony and is well-adapted to the technical environment which allowed it to be. Source code written in accordance with the principles and the affordances of its programming language is therefore more prone to receive a positive aesthetic judgment. Furthermore, idiomatic writing is accompanied by a language-independent, but group-dependent feature: that of programming style.
Fundamentally, the problem of style might be that " the practical existence of humanity is absorbed in the struggle between individuality and generality " (
Simmel, 1991)
The Problem of Style by Georg Simmel, 1991. [link]
. Simmel's investigation of the topic originally focuses on the dichotomy between works of fine art and mass-produced works of applied arts. Indeed, Simmel draws a distinction between the former, as indiosyncratic objects displaying the subjectivity of its maker, and the latter, as industrially produced and replicated, in which the copy cannot be told apart from the original. The work of fine art, according to him, is a world unto itself, is its own end, symbolizing by its very frame that it refuses any participation in the movements of a practical life beyond itself , while the work of applied arts only exists beyond this individuality, first and foremost as a practical object.
As these two kinds of work exist at the opposite extremes of a single continuum, we can insert a third approach: that of the crafted object. It exists in-between, as a repeated display of its maker's subjectivity, destined for active use rather than passive contemplation (
author, year)
. So while style can be seen as a general principle which either mixes with, replaces or displaces individuality, style in programming doesn't stand neatly at either extreme. The work of Gilles-Gaston Granger, and his focus on style as a structuring practice can help to better apprehend style as a relationship between individual taste and structural organization (
author, year)
. Granger posits style in scientific endeavours, which is a component of programming practice, as a mode of knowing at the scale of the group. Abiding by a particular style, the writer and reader can implicitly agree on the fundamental values underpinning a given text, and thus facilitate expectations in further readings of a given program text.
Concretely, programming style exist as dynamic documents,with both social and technical components. On the social side, they are only useful if inconditionally adopted by all members working on a particular code-base, since " all code in any code-base should look like a single person typed it, no matter how many people contributed. " (
Waldron, 2020)
Idiomatic.js/readme.md at master · rwaldron/idiomatic.js by Rick Waldron, 2020. [link]
; personal style is usually frowned upon by software developers as an indicator of individual preferences over group coordination194.
In the strict sense, guidelines are therefore reference documents which should provide an answer to the question of what is the preferred way of writing a particular statement (e.g. var vs. let, or camelCase vs. snake_case). Beyond aesthetic preferences aimed at optimizing the clarity of a given source code, style guides also include a technical component which aims at reducing programming errors by catching erroneous patterns in a given codebase (e.g. variable declaration before intialization, loose reference to the function-calling context).
Programming style also exhibits the particular property that it is not just enforced by convention, but also by computational procedure: linters and formatters are particular software whose main function is to formally rearrange the appearance of lines of code according to some preset rules. This constitutes an additional socio-technical context which further enmeshes human writing and machine writing (
Depaz, 2022)
Discursive Strategies in Style Guides Negotiation on GitHub by Pierre Depaz, 2022. [link]
. Essentially, this means that source code will be judged not just on how it functions technically, but also how it exists stylistically—that is, within a social contract which can be implemented through technical, automated means.
-
In conclusion, programming languages, as a symbol systems subject to aesthetic judgment, are an important factor in allowing for aesthetic properties to emerge during the process of writing program texts. They present affordances for the abstraction and combination of otherwise-complex programming concepts, for the development of familiarity through their idiomatic uses and for ease of readability—to the point that it might become transparent to experienced readers. Yet, one must keep in mind that there is a difference between considering a programming language good or beautiful in itself , considering the quality of the programs written related to the programming language they are written in, in more general aesthetic features. In the next section, we look at some of those aesthetic features which can be transposed across languages.