How does Casper compare to Ouroboros?

Differences between the proposed Ethereum protocols and Cardano’s consensus algorithm

9 August 2018 Prof Aggelos Kiayias 13 mins read

How does Casper compare to Ouroboros?

TL;DR In response to recent discussions in social media, we give a brief comparison of the Ouroboros and Casper proof-of-stake protocols.

Ouroboros is a formally specified and analysed protocol with mathematically proven security guarantees based on clearly specified assumptions. The protocol description, models and proofs are all public. Hence, the underlying assumptions, the target protocol properties, and the respective correctness proofs can be publicly scrutinised. Ouroboros offers stake-based finality with the strongest possible guarantees in terms of the amount of stake backing up honest operation. It also provides a solid foundation over which services such as near instant finality of transactions can be offered in optimistic network conditions.

Regarding Casper, we are not aware of any currently published source that sufficiently describes the protocol's mode of operation nor any provable guarantees about it. Still, from what has been presented about Casper until now, as compared to Ouroboros, we can safely conclude that Casper provides much weaker guarantees in terms of how much stake the adversary needs to control in order to disrupt the protocol. Below, we compare the two protocols along several dimensions; for lack of proper documentation, many properties of Casper have to be assumed to the best of our knowledge.


In response to a discussion here and here, we give a brief comparison of the Ouroboros proof-of-stake (PoS) protocol and Casper PoS. For Ouroboros, we refer to the original version underlying the Cardano Settlement Layer (published at Crypto 2017), however most of our comments apply to later versions Ouroboros Praos and Ouroboros Genesis as well. For Casper, we primarily refer to the Casper Friendly Finality Gadget (FFG) as described in the white paper, being the most recent Casper proposal that is sufficiently descriptive to draw a full comparison (other references include Ethereum Mauve, Casper+Sharding v2.1, FFG-RPJ, Casper TBG/CBC).

Any PoS ledger consensus protocol should satisfy two fundamental properties: persistence and liveness. The first ensures that the ledger is final and immutable. The second ensures that transactions broadcasted by honest parties are eventually included in the (immutable) ledger. Such properties, typically, cannot be proven unconditionally: they will rely on certain conditions, some of them cryptographic, e.g., that digital signatures cannot be forged, while others are related to the behaviour of the participants, e.g., that the players who follow the protocol control a majority of the stake. There are other desirable properties that a PoS protocol should satisfy (such as that executing the protocol as prescribed is the only rational strategy for the participants), but persistence and liveness as defined above constitute the bare minimum pair of fundamental properties necessary for ledger consensus.

Let us now discuss some of the differences between the two protocols and their analyses.

Execution Model and Falsifiability of Claims

The Ouroboros protocol is analyzed in a model that is fully described: it unambiguously defines all the participants’ programs, their execution and interactions, their communication – including network properties – and the potential corruption by an adversarial entity of any set of parties controlling a minority of the stake. Such a model allows the formulation of mathematically precise security guarantees satisfied by any execution, such as the persistence and liveness properties proven for Ouroboros. In particular, the formal modeling of Ouroboros permits precise, quantitative statements about stake bounds and settlement times; see below. This makes all the claims we make about Ouroboros entirely concrete; there is nothing left up to interpretation or reader perspective. Without such a model (notably missing in the Casper FFG white paper or in any other available sources related to Casper), it is impossible to prove the correctness of any claims about the protocol. Consensus protocols, in general, are complex objects; designing them without the development of rigorous mathematical arguments that establish the required properties can prove to be precarious as prior practice in secure systems design has shown. Good design intuition and best effort are just not sufficient when a ledger consensus protocol is supposed to carry assets worth billions.

A comprehensive solution to PoS ledger consensus

Given the above, it is important to appreciate that the Ouroboros protocol is proven to provide persistence and liveness under clearly defined assumptions such as honest stake majority which is the bare minimum assumption needed in the PoS setting. On the other hand, Casper FFG, as described in the white paper, is an enhancement on top of a pre-existing “block proposal mechanism”, e.g., a PoW blockchain (namely Ethereum); in particular, its security guarantees as a ledger consensus protocol depend on the security of this proposal mechanism. As the authors of Casper FFG observe, “a wholly compromised block proposal mechanism will prevent Casper from finalizing new blocks”, hence the honest-majority-of-hashing power assumption is still necessary for Casper FFG’s liveness. Similarly, other versions of the Casper protocol, such as Casper FFG-RPJ, are incomplete and/or not accompanied by any proofs of security.

Stake Assumptions

Ouroboros is proven to achieve persistence and liveness under the assumption of honest majority of all stake in the system, even in the case that some significant portions of stakeholders are not participating in the protocol (see e.g., Theorem 1 in the Ouroboros Genesis paper for the most comprehensive statement on Ouroboros security). In contrast, Casper requires a ⅔-fraction of deposited stake to be controlled by honest parties (see section 2.1 of the white paper). Since the deposited stake is blocked and cannot be used for other purposes in the meantime, it is reasonable to assume that the deposited stake will be a small fraction of the total stake in the system. Naturally, larger amounts of stake are more difficult to control so that basing security on the total stake in the system, as in Ouroboros, is a more prudent choice. As a concrete example, in the current sharded version of Ethereum (Ethereum Mauve paper or Casper+Sharding chain v2.1), a minimum of 32 ETH per validator is required with 100-128 validators per shard depending on the reference, without any other restriction. It follows that if the total deposited stake among all prospective validators turns out to be minimal and is not otherwise restricted then just a few thousand ETH would be enough to register a set of sybil validators that could disrupt the ledger consensus security properties.

Finality

Though the notion is not formally defined in the Casper FFG white paper, it is easy to see that the property of “stake-based finality” is subsumed by persistence, the property that ensures that transactions become permanently part of the public immutable ledger; the stake-based adjective on finality used in Casper FFG refers to the fact that the condition under which finality is to be attained is based on stake as opposed to, e.g., a hashing power assumption. As mentioned above, no protocol can be deemed to solve the ledger consensus problem without providing persistence (and hence finality). In fact, all PoS protocols provide such properties only with a high probability – if for no other reason, cryptography can always fail with (very) small probability (for example, someone may guess your key). We do in fact know that Bitcoin and (pre-Casper) Ethereum provide finality (shown by the works of GKL15, GKL17 and PSS17) assuming honest majority of computational power), and so does Ouroboros, assuming honest majority of stake as shown in KRDO17, DGKR18, BGKRZ18.

Put simply, Ouroboros provides stake-based finality and it does so with the strongest possible guarantee in terms of stake: against a malicious coalition controlling any amount of the total stake existing in the system as long as it is bounded below 50%. In the Casper FFG white paper, where Casper operates over the Ethereum blockchain, stake-based finality is provided every 100 blocks under the assumption that ⅔ of the deposited stake is honest. As a concrete example, in the same window of time, which is a little over half an hour in our current deployment, we can derive from our formal analysis that Ouroboros will offer finality against, say, a 10% stake adversary with probability of error less than 2^(-44). This is less than 1/10000000000000, one over ten trillion. To appreciate such small numbers, consider that it is expected to have one large asteroid hit the earth once every 100 million years (Scientific American). Thus, it is 10 thousand times more likely that a big asteroid will hit the earth next month than that Ouroboros will reorganise its chain to drop a particular transaction after it has been included in the ledger for about half an hour.

Eventual Consensus vs. (near-)Instant finality

Blockchain protocols like Bitcoin and Ouroboros are called eventual-consensus since they ensure that the irreversibility of a block increases gradually with the number of blocks that are added on top of it. This means that finality is more nuanced than just a true or false value, and is quantified by the probability of reverting a transaction as a function of the strength of the adversary and the length of time that has passed since the block containing that transaction was added. This design enables these protocols to work in the strongest possible adversarial settings and still be very efficient in terms of the number of messages that need to be exchanged; furthermore, they have the feature that the recipient of a transaction can decide for herself how important a transaction is and adjust her own notions of stability per transaction. Their downside is that they do not provide near-instant finality, or in other words, a fast assurance that the transaction will be finalised. This may be a potential advantage of classical BFT protocols that have inspired the design of Casper FFG as well as other protocols in the space including Algorand.

However, near-instant finality typically also comes with significant downsides in terms of the security model such as a much higher requirement of honest stake or, perhaps more importantly, a high degree of guaranteed online presence that must be offered by the participants following the protocol. This hurts the dynamic availability of the participants (see below) which is one of the hallmarks of the bitcoin era of consensus protocols. On the other hand, near-instant finality can be built as a service on top of Ouroboros and this is something that we will be releasing in due course. Moreover, we can argue that this is the best possible way forward: use the Ouroboros eventual consensus protocol which is secure under the strongest possible stake-based guarantees as the solid foundation over which services such as near-instant settlement in optimistic network conditions can be safely built.

Incentives and dynamic availability

Casper FFG is inspired by pre-Bitcoin era standard BFT consensus protocols and as such it cannot handle uncertainty in terms of the number of participating entities once the set of validators becomes fixed. This means that the protocol cannot operate in the “sleepy setting” and “dynamic availability” setting, where a significant number of parties that are supposed to act in the protocol are unavailable due to network conditions, hardware failure or simply lack of interest. This is a significant concern in a decentralized setting where the execution of the protocol is not meant to be left in the hands of a few centralized-power actors, but is rather distributed proportionally among a great number of smaller players. The Casper-FFG white paper acknowledges this as the “Catastrophic Crash” scenario and observes that in this case “no future checkpoints can be finalized”. The authors propose a mitigation in the form of the so-called “inactivity leak.” This idea is only described informally as draining “the deposit of any validator that does not vote for checkpoints, until eventually its deposit sizes decrease low enough that the validators who are voting are a supermajority.” Unfortunately, this modification would in turn negate any potential advantage Casper can claim in face of network splits, as the authors also recognise: “The inactivity leak introduces the possibility of two conflicting checkpoints being finalized without any validator getting slashed.” This also affects the incentives running the protocol. Ouroboros allows for a natural and incentive-driven aggregation of stake into stake pools that will be performed over a period of time using our stake pool reward mechanism, without forcing the behaviour of stakeholders onto a predetermined structure, while Casper has to impose preset numbers of block validators.

Randomness

While the original Ouroboros protocol does not use VRFs to generate protocol randomness (instead it uses a guaranteed-output-delivery coin-tossing protocol based on verifiable secret-sharing), the follow-up versions Praos and Genesis do so for performance gains. The VRFs proposed for use in Ouroboros Praos and Genesis are proven secure under standard cryptographic assumptions (such as the Computational Diffie Hellman assumption) while the security analysis we have performed ensures Ouroboros’ resilience to randomness manipulation (see Ouroboros Praos and Ouroboros Genesis).

Network Assumptions

Ouroboros is analysed in the “partially synchronous” setting where messages are delivered to the majority of the parties executing the protocol within a time window upper bounded by a network delay Δ which is unknown to the parties. The order of messages is adversarial and it is not guaranteed that two honest parties will receive messages in the same order. The adversary is allowed to inject arbitrary messages selectively to any of the parties. Casper makes no explicit claims about the network setting it operates in, nevertheless, when describing defenses against long range revisions it alludes to a similar type of model.

Sharding

This property refers to the ability of a database or ledger consensus protocol to scale its processing power as more nodes (or processing capacity) enter the system, ideally with a linear speedup in the number of nodes added. Ouroboros Hydra, the scalable version of Ouroboros is in development and will be released in due time following our usual mode of discourse, i.e., the release of a full paper containing complete mathematical formulations of the problem that we solve, a full description of our protocol solution, as well as concrete statements about the protocol’s properties that are accompanied by all necessary proofs. At present, the version of Casper that enables sharding, (Casper+Sharding v2.1), is incomplete even in terms of protocol description, and as such, it cannot allow any proof of security.

Learn more about Ouroboros.

Team effort is a hallmark of IOHK research and this blog post is no exception. I am grateful to Christian Badertscher, Matthias Fitzi, Peter Gaži, Alexander Russell, Jeremy Wood, and Vassilis Zikas for various suggestions, comments, and corrections to the above text.

Artwork,
Creative Commons
Mike Beeple

From free algebras to free monads

7 August 2018 Marcin Szamotulski 29 mins read

From free algebras to free monads

In universal algebra freeness is a well defined algebraic property. We will explore equational theories which are tightly connected to free algebras. We will consider free monoids. Then we'll explain how monads can be brought into the picture in the context of monoidal categories. This will lead to a precise definition of a free monad as a free monoid.

This post requires familiarity with some very basic Category Theory and does not assume any knowledge on universal algebra. Most mathematical notions will be introduced but you might want to dig into literature for some more examples; though most of the books are quite lengthy and not suited for non-mathematicians - you've been warned ;). Knowing this I tried to bring all the required definitions, together with some very basic examples. As you read you may want to read about semigroups, monoids, groups, $G$-sets, lattices, Boolean or Heyting algebras from Wikipedia articles or try to find info on nCatLab (though this is is a heavy resource, with mostly with higher categorical approach, so probably better suited for more familiar readers).

Preliminaries

We will need some preliminary definitions. Let's begin with a definition of algebra. For a set $A$ we will denote $A^n$ the $n$th cartesian product of $A$, i.e. $A^2=A\times A$.

Definition Algebra
An algebra $\underline{A}$ is a set $A$ together with a finite set of operations $f_i^{\underline{A}}:A^{j_i}\rightarrow A$ ($i=1,\ldots,n$), usually simply written as$\underline{A}=(A, (f_i^{\underline{A}})_{i=1,\ldots,n})$. For operation $f_i^{\underline{A}}$ the natural number $j_i$ is called its arity, which is a finite natural number. The set $A$ is usually called universum of the algebra $\underline{A}$. The set of symbols $f_i$ is called type of the algebra $\underline{A}$.

Examples includes many classical algebraic structures, like semigroups, where there is only a single operation of arity 2, monoids which in addition have one operation of arity $0$ - the unit element of multiplication. Other source of examples are Boolean algebras with two 2-ary operations $\wedge$ and $\vee$ or more generally lattices, Heyting algebras. Also rings, modules, fields, vector spaces and countless other structures. Universal algebra has a very general theory describing common concepts but also deals with very special cases of some of more esoteric algebras.

Definition Homomorphism
A homomorphism between two algebras $\underline{A}=(A, (f_i^{\underline{A}})_{i=1,\ldots,n})$ and $\underline{B}=(B, (f_i^{\underline{B}})_{i=1,\ldots,n})$ (of the same type) is a map $h:A\rightarrow B$ with the property that for every $i$: $$ h(f^{\underline{A}}_i(a_1,\ldots, a_{i_j})) = f^{\underline{B}}_i(h(a_1),\ldots,h(a_{i_j}))$$

This means that homomorphism preserve operations. For example a homomorphism of monoids is a map that preserves the multiplication and unit. For boolean algebras, it means that a homomorphism preserves the $\vee$ (also called join) and $\wedge$ (usually called meet) operations, etc.

It is an easy observations that homomorphism are closed under composition and since the identity map is always a homomorphism this leads to well defined categories, e.g. category of monoids, category of boolean algebras, category of rings, ...

Free algebras and Equational theories

Free Algebra
An algebra $\underline{A}=(A,(f_i^{\underline{A}})_{i=1,\dots,n})$ is free in a class of algebras $\mathcal{C}$ over a subset $S\subset A$ if for every $\underline{B}=(B,(f_i^{\underline{B}}){i=1,\dots,n})\in\mathcal{C}$ every map $S\rightarrow B$ uniquely extends to a homomorphism $\underline{A}\rightarrow \underline{B}$. The set $S$ is called the set of generators.

As you can see the definition of a free algebra requires a context, this interesting in its own! There are free monoids in the class of all monoids and there are free commutative monoids in the class of commutative monoids (i.e. monoids in which $m\cdot n=n\cdot m$ for each elements $m,n$).

Many theories allow free algebras. Let's see some other examples:

  • The monoid of natural numbers $\mathbb{N}$ with addition and $0$ as its unit element is a free monoid generated by $\{1\}$. It is both free in the class of all monoids and in the class of commutative ones. The $n$-th cartesian product $\mathbb{N}^n$ is a free commutative monoid generated by the set $\{(1,0,\ldots,0),(0,1,0,\ldots,0),\ldots,(0,\ldots,0,1)\}$, but it's not a free monoid in the class of all monoids.

  • The additive group of integers $\mathbb{Z}$ is a free group with one generator, it is also free in the class of commutative groups. As in monoids: $\mathbb{Z}^n$ is a free commutative group with $n$ generators.

  • A free group with two generators can be pictured as the Cayley graph (which is a fractal) (note that its first quarter is the free monoid with two generators).

  • Every vector space is free, since every vector space admits a basis.

  • In the class of $G$-sets, free $G$ sets are exactly all the cartesian products of $G^n$.

  • In the class of rings, polynomial rings with integer coefficients, usually denoted by: $\mathbb{Z}[X]$ or $\mathbb{Z}[X_1,\dots,X_n]$ for polynomials with many variables) are free (you likely have learned quite a lot about them in school, you just haven't been told the really interesting part ;)). This example was the motivation for terms, their algebra and term functions which we will discover next.

    This is also true for semi-rings. You might have used this fact when using purescript validation library. A free semiring generated by a type `a` has type `[[a]]`; for example `[[()]]` is isomorphic to $\mathbb{N}[X]$, since (please excuse mixing Haskell and mathematical notation): $$[[()]]\simeq[\mathbb{N}]\simeq\mathbb{N}[X]$$

Free algebras play an essential role in a proof of beautiful and outstanding Birkhoff theorem. It states that a class of algebras $\mathcal{C}$ is an equational theory if and only if the class is closed under cartesian products, homomorphic images and subalgebras. Equational theories are classes of algebras which satisfy a set of equations; examples includes: semigroups, monoids, groups or boolean or Heyting algebras but also commutative (abelian) semigroups / monoids / groups, and many other classical algebraic structures.

We need to be a little bit more precise language to speak about equational theories in the full generality of universal algebra, which we are going to introduce.

Terms, term functions and their algebra

Definition Term

Let’s consider an algebra type $(f_i)_{i=1,\ldots,n}$. Then the set of terms on a set $X$ (set of variables) is inductively defined as:

  • each $x\in X$ is a term (of arity $0$)

  • each $f_i(x_1,\dots ,x_{j_i})$ is a term of arity $j_i$ for $x_1,\dots ,x_{j_i}\in X$

  • if $g_1,\dots g_n$ are terms of arities $j_1$ to $j_n$ respectively, and $g$ is a term of arity $n$ then $g(g_1(x_{11},\dots,x_{1j_1}),\dots, g_n(x_{n1},\dots,x_{nj_n}))$ is a term of arity $j_1+\dots+j_n$ with $x_{kl}\in X$.

We will denote the set of terms on $X$ by $\mathsf{T}^{(f_i)_{i=1,\dots,n}}(X)$ or simply $\mathsf{T}(X)$.

For example in groups: $x^{-1}\cdot x$, $x\cdot y$ and $1$ (the unit of the group) are terms. Terms are just abstract expressions that one can build using algebraic operations that are supported by the algebra type. Each term $t$ defines a term function on every algebra of the given type. In groups the following terms are distinct but they define equal term function: $x^{-1}\cdot x$ and $1$; on the other hand the two (distinct) terms $(x\cdot y)\cdot z$ and $x\cdot (y\cdot z)$ define equal term functions. The two terms $x\cdot y$ and $y\cdot x$ define distinct term functions (on non commutative groups or commutative monoids). Another example comes from boolean algebras (or more broadly lattice theory) where the two terms $x\wedge (y\vee z)$ and $(x\wedge y)\vee(x\wedge z)$ define equal term functions on Boolean algebras (or more generally distributive lattices). If $t$ is a term then the associated term function on an algebra $\underline{A}$ we let denote by $\tilde{t}^{\underline{A}}$. Term functions are natural to express equalities within a theory. Now we are ready to formally define equational classes of algebras.

Definition Equational Theory
A class of algebras $\mathcal{C}$ is an equational theory if and only if there exists a set of pairs of terms $\mathbf{E}\subset\mathsf{T}(X)^2$ such that the class consists exactly of algebras $\underline{A}=(A,(f_i^{\underline{A}})_{i=1,\dots,n})$ for which the following condition is satisfied: for each pair of terms $(t, s)\in \mathbf{E}$ two corresponding term functions $\tilde{t}^{\underline{A}}$ and $\tilde{s}^{\underline{A}}$ are equal.

For example the class of monoids is an equational theory for $$\mathbf{E}=\bigl\{(1\cdot x,\, x),\; (x\cdot 1,\, x),\; \bigl((x\cdot y)\cdot z,\, x\cdot (y\cdot z)\bigr)\bigr\}$$ i.e. all the algebras with two operations: one of arity 0 (the unit) and one of arity 2 (the multiplication), such that the $1$ is the unit for multiplication $\cdot $ and multiplication is associative. The class of commutative monoids is also an equational theory with one additional equation $(x\cdot y,\, y\cdot x)$. Groups, Boolean or Heyting algebras, lattices are also equational theories.

Coming back to free algebras: it turns out that the set of terms $\mathsf{T}^{(f_i)}(X)$ on a given set of variables $X$ has an algebra structure of type $(f_i)_{i=1,\dots,n}$: it is given by the inductive step in the definition of terms: if $t_i\in \mathsf{T}^{(f_i)}(X)$ for $i=1,\dots,j_i$ then $$ f_j^{\underline{\mathsf{T}^{(f_i)}(X)}}(t_1,\ldots,t_{j_i}) := f_j(t_1,\ldots,t_{j_i})\in \mathsf{T}(X) $$ Furthermore $\underline{\mathsf{T}^{(f_i)}(X)}$ is a free algebra over $X$ in the class of all algebras of the given type $(f_i)_{i=1,\dots,n}$. An extension of a map $h:X\rightarrow\underline{A}=(A,(f_i^{\underline{A}})_{i=1,\ldots,n})$ can be build inductively following the definition of terms and using the homomorphism property: $$ h(f_i(t_1,\ldots,f_{i_j})) := f_i^{\underline{A}}(h(t_1),\ldots,h(t_{i_j})) $$ The map $h$ is indeed a homomorphism: $$ \begin{array}{ll} h\bigl(f_i^{\underline{\mathsf{T}(X)}}(t_1,\ldots,t_{i_j})\bigr) & = h(f_i(t_1,\ldots, t_{i_j}) \\\\ & = f_i^{\underline{A}}(h(t_1),\ldots, h(t_{i_j})) \\\\ \end{array} $$ Note that the class of algebras of the same type is usually very broad, but this is the first approximation to build free algebras in an equational theory. This is just the equational theory for the empty set $\mathbf{E}$.

Let’s see this on an example and let us consider algebras of the same type as a monoid: with one nullary operation (unit $1$ or `mempty` if you like) and one 2-ary operation (multiplication / `mappend`). Let $X$ be a set of variables. Then $1$ is a valid term, and also if $t_1$ and $t_2$ are terms on $X$ then also $t_1\cdot t_2$ is a term, but also $t_1\cdot 1$ and $1\cdot t_2$ are valid and distinct terms. $\mathsf{T}(X)$ resembles a monoid but it isn't. It is not associative and the unitality condition is not valid since $t\cdot 1\neq t\neq 1\cdot t$ as terms. We still need a way to enforce the laws. But note that if you have a map $f:X\rightarrow M$ to a monoid $M$ which you'd like to extend to a homomorphism $\mathsf{T}(X)\rightarrow M$ that preserves $1$ (which is not the unit, yet) and multiplication (even though it is not associative), you don’t have much choice: $\mathsf{T}(X)\rightarrow M$: $t_1\cdot t_2$ must be mapped to $f(t_1)\cdot f(t_2)\in M$.

We need a tool to enforce term equations. For that one can use

Definition Congruence relation
Let $\underline{A}=(A,(f^A_i)_{i=1,\dots,n})$ be an algebra and let $\sim$ be an equivalence relation on $A$, i.e. a subset of $A\times A$ which is:
  • reflexive: for each $a\in A$: $a\sim a$
  • symmetric: for each $a,b\in A$: if $a\sim b$ then $b\sim a$
  • transitive: for each $a,b,c\in A$: if $a\sim b$ and $b\sim c$ then $a\sim c$
An equivalence relation is a congruence relation if for all operations $f_i$ and any $a_1,\dots,a_{i_j}\in A$ and $b_1,\dots,b_{i_j}\in A$ the following implication holds: $$ a_1\sim b_1,\dots,a_{i_j}\sim b_{i_j}\Rightarrow f_i^{\underline{A}}(a_1,\dots,a_{i_j})\sim f_i^{\underline{A}}(b_1,\dots,b_{i_j}) $$
If you have an equivalence relation $~$ on a set $A$ then you can always construct the quotient set $A/\sim$. An equivalence class of $a\in A$ is the set $[a]:={x\in A:\; x\sim a}$, then $A/\sim$ is just the set of equivalence classes. However if you have a congruence then the quotient $A/\sim$ carries algebra structure which turns the quotient map $A\rightarrow A/\sim$ into a homomorphism.

Equivalence relations and congruences form complete lattices (partial ordered which have all suprema and minima, also infinite). If you have two equivalence relations (congruences) then their intersection (as subsets of $A^2$) is an equivalence relation (congruence).

The set of equations that defines the class of monoids generates a congruence relation on the term algebra $\underline{\mathsf{T}^{f_i}(X)}$ (i.e. an equivalence relation which is compatible with operations: $x_1\sim y_1$ and $x_2\sim y_2$ then $(x_1\cdot y_1) \sim (x_2\cdot y_2)$). One can define it as the smallest congruence relation which contains the set $\mathbf{E}$. Equivalence relation on a set $A$ is just a subset of the cartesian product $A\times A$ (which satisfy certain axioms), so it all fits together! One can describe this congruence more precisely, but we'll be happy with the fact that it exists. To show that, first one need to observe that intersection of congruences is a congruence, then the smallest congruence containing the set $\mathbf{E}$ is an intersection of all congruences that contain $\mathbf{E}$. This intersection is non empty since the set $A\times A$ is itself a congruence relation.

The key point now is that if we take the term algebra and take a quotient by the smallest congruence that contains all the pairs of terms which belong to the set $\mathbf{E}$ we will obtain a free algebra in the equational class defined by $\mathbf{E}$. We will leave the proof to a curious reader.

Free monoids

Let’s take a look on a free monoid that we can build this way. First let us consider the free algebra $\underline{\mathsf{T}(X)}$ for algebras of the same type as monoids (which include non associative monoids, which unit does not behave like a unit). And let $\sim$ be the smallest relation (congruence) that enforces $\mathsf{T}(X)/\sim$ to be a monoid.

Since monoids are associative every element in $\underline{\mathsf{T}(X)}/\sim$ can be represented as $x_1\cdot( x_2\cdot (x_3\cdot\ldots \cdot x_n))$ (where we group brackets to the right). Multiplication of $x_1\cdot( x_2\cdot (x_3\cdot\ldots \cdot x_n))$ and $y_1\cdot( y_2\cdot (y_3\cdot\ldots \cdot y_m))$ is just $x_1\cdot (x_2\cdot (x_3\cdot\ldots\cdot(x_n\cdot (y_1\cdot (y_2\cdot (y_3\cdot\ldots\;\cdot y_m)\ldots)$. In Haskell if you’d represent the set $X$ as a type $a$ then the free monoid is just the list type $[a]$ with multiplication: list concatenation and unit element: the empty list. Just think of

-- A set with `n` elements corresponds
-- to a type with `n` constructors:
data X = X_1|⋯|X_n

Free Monads

It turns out that monads in $\mathcal{Hask}$ are also an equational theory. Just the terms are higher kinded: $*\rightarrow*$ rather than $*$ as in monoids. The same construction of a free algebra works in the land of monads, but we need to look at them from another perspective. Let us first take a mathematical definition of view on monads.

Definition Monad
A monad is an (endo) functor `m` with two natural transformations:
class Monad m where
return :: a -> m a
join   :: m(m a) -> m a

which is unital and associative, i.e. the following law holds:

-- | associativity
join . join == join . fmap join
-- | unitality
join . return  = id = join . fmap return

These axioms are easier to understand as diagrams:

and

It is a basic lemma that this definition a monad is equivalent to what we are used to in Haskell:

class Monad m where
return :: a -> m a
>>=    :: m a -> (a -> m b) -> m b

Having `join` one defines `>>=` as

ma >>= f = join $ f <$> ma

and the other way, having `>>=` then

join = (>>= id)`

Not only these two constructions are reverse to each other, but also they translate the monad laws correctly.

Monoids in monoidal categories

To define a monoid $M$ in the category $\mathcal{Set}$ (of sets) one needs the product $M\times M$. Abstraction of this structure leads to monoidal categories.

Definition Monoidal Category
Category $\mathcal{C}$ with a bifunctor $-\otimes-:\mathcal{C}\times\mathcal{C}\rightarrow\mathcal{C}$ is called strict monoidal category if `\otimes` is associative and unital, i.e. for all $a,b,c\in\mathcal{C}$ $(a\otimes b)\otimes c = a\otimes (b\otimes c)$ and there exists a unit object $1$ such that $1\otimes a=a=a\otimes 1$.

Most examples of monoidal categories are not strict but are associative and unital up to a natural transformation. Think of $(A\times B)\times C\simeq A\times(B\times C)$ in $\mathcal{Set}$ (or any category with (finite) products, like $\mathcal{Hask}$). Let me just stress out that since $\otimes$ is a bifunctor, for any two maps $f:\;a_1\rightarrow b_1$ and $g:\;a_2\rightarrow b_2$ we have a map $f\otimes g: a_1\otimes a_2\rightarrow b_1\otimes b_2$, and moreover it behaves nicely with respect to composition: $(f_1\otimes g_1) \cdot (f_2\otimes g_2) = (f_1\cdot f_2)\otimes(g_1\cdot g_2)$ for composable pairs of arrows $f_1,\;f_2$ and $g_1,\;g_2$.

Now we can generalise a definition of a monoid to such categories:

Definition Monoid in a Monoidal Category
A monoid in a monoidal category $\mathcal{C}$ with monoidal product $-\otimes-$ and a unit $1$ is an object $m$ with a pair of morphisms $$ \mathrm{mappend}:\;m\otimes m\rightarrow m\quad\mathrm{mempty}:\;1\rightarrow m $$ such that

and

The main point of this section is that these diagrams have exactly the same shape as associativity and unitality for monads. Indeed, a monoid in the category of endo-functors with functor composition as a monoidal product $\otimes$ and unit the identity functor is a monad. In category theory this category is strict monoidal, if you try to type this in Haskell you will end up with a non strict monoidal structure, where you will need to show penthagon equation.

These consideration suggest that we should be able to build a free monad using our algebraic approach to free algebras. And this is what we will follow in the next section.

Free monads in $\mathcal{Hask}$

Firstly, what should replace the set of generators $X$ in $\mathsf{T}(X)/\sim$? First we generalised from the category of sets $\mathcal{Set}$ to a monoidal category $(\mathcal{C},\otimes, 1)$: its clear that we just should pick an object of the category $\mathcal{C}$. Now since our category is the category of (endo) functors of $\mathcal{Hask}$ the set of generators is just a functor. So let's pick a functor `f`.

To get a free monad we need to decypher $\mathsf{T}(f)/\sim$ in the context of a monoid in a monoidal category of endofunctors. Note that here $\mathsf{T}(f)$ and $\mathsf{T}(f)/\sim$ are functors! To simplify the notation, let $\mathsf{Free}(f):=\mathsf{T}(f)/\sim$. So what is a term in this setting? It should be an expressions of a Haskell's type: $$ \begin{equation} \begin{array}{c} \bigl(\mathsf{Free}(f)\otimes\mathsf{Free}(f)\otimes\ldots\otimes \mathsf{Free}(f)\bigr)(a) \\\\ \quad\quad = \mathsf{Free}(f)\bigl(\mathsf{Free}(f)\bigl(\ldots (\mathsf{Free}(f)(a)\bigr)\ldots\bigr) \end{array} \end{equation} $$ In our setup the monoidal product $-\otimes-$ is just the functor composition, thus $\mathsf{Free}(f)(a)$ must be a type which (Haskell's) terms are of Haskell's types:

a, f a, f (f a), f (f (f a)), ...

The monadic `join` will take something of type $\mathsf{Free}(f)\;(\mathsf{Free}(f)\;(a))$, e.g. $f^n(b)=f\;(f\;(\dots f\;(b)\dots)$ (by abusing the notation $f^n$) where $b$ has type $f^m(a)=(f\;(f\;(\dots(f\;(a)\dots)$ and return something of type $\mathsf{Free}(f)(a)$ and it should be quite clear how to do that: just take the obvious element of type $f^{n+m}(a)$. Altogether, this is a good trace of a monad, so let us translate this into a concrete Haskell type:

data Free f a
= Return a
-- ^ the terms of type a
| Free (f (Free f a))
-- ^
-- recursive definition which embraces
-- `f a`, `f (f a)` and so on

instance Functor f => Functor (Free f) where
fmap f (Return a) = Return (f a)
fmap f (Free  ff) = Free (fmap (fmap f) ff)

`Free f` is just a tree shaped by the functor `f`. This type indeed embraces all the terms of types: `a, f a, f (f a), ...` into a single type. Now the monad instance:

instance Monad (Free f a) where
return = Return
join (Return ma) = ma
-- ^ stitch a tree of trees into a tree
join (Free fma) = Free $ join <$> fma
-- ^ recurs to the leaves

As you can see, takes a tree of trees and outputs a bigger tree, that's what join does on the Return constructor.

Before formulating the next result let's describe morphisms between monads. Let `m` and `n` be two monads then a natural transformation `f :: forall a. m a -> n a` is a homomorphism of monads iff the following two conditions are satisfied:

f . return == return
join . f == f . fmap f . join

Note that this two conditions are satisfied iff f is a monoid homomorphism in the category of (endo)functors of $\mathcal{Hask}$.

Proposition
Let `f` be a functor, then `Free f` then there exists a morphism:
foldFree :: Functor f => (forall x. f x -> m x) -> (Free f a -> m a)

which restricts to an isomorphism of natural transformations on the left hand side and monad homomorphisms on the right hand side, and thus Free f is rightly colled free monad..

Proof
Let start with a defintion of `foldFree`.
foldFree :: Functor f => (forall x. f x -> m x) -> (Free f a -> m a)
foldFree _ (Return a) = return a
foldFree f (Free ff)  = join $ f $ foldFree f <$> ff

It's inverse is:

liftF :: Functor f => (forall x. Free f x -> m x) -> (f a -> m a)
liftF f fa = f $ Free $ Return <$> fa

First let's check that foldFree f is a morhpism of monads:

foldFree f (Return a)
-- | by definition of (foldFree f)
= return a
  
foldFree f (join (Return a))
= foldFree f a
-- | by monad unitality axiom
= join $ return $ foldFree f $ a
-- | by definition of (foldFree f)
= join $ foldFree f (Return $ foldFree f a)
-- | by definition of functor instance of (Free f)
= join $ foldFree f $ fmap (foldFree f) $ Return a

foldFree f (join (Free ff)
-- | by definition of join for (Free f)
= foldFree f (Free $ fmap join $ ff)
-- | by definition of foldFree
= join $ f $ fmap (foldFree f) $ fmap join $ ff
= join $ f $ fmap (foldFree f . join) $ ff
-- | by induction hypothesis
= join $ f $ fmap (join . foldFree f . fmap (foldFree f)) $ ff
= join $ f $ fmap join $ fmap (foldFree f)
$ fmap (fmap (foldFree f)) $ ff
-- | f is natural transformation
= join $ fmap join $ f $ fmap (foldFree f)
$ fmap (fmap (foldFree f)) $ ff
-- | monad associativity
= join $ join $ f $ fmap (foldFree f)
$ fmap (fmap (foldFree f)) $ ff
-- | by definition of (foldFree f)
= join $ foldFree f $ Free
$ fmap (fmap (foldFree f)) $ ff
-- | by functor instance of (Free f)
= join $ foldFree f $ fmap (foldFree f) $ Free ff

And we have

foldFree . liftF :: (forall x. Free f x -> m x) -> (Free f a -> m a)
(foldFree . liftF $ f) (Return x)
-- ^ where f is a morphism of monads
= foldFree (liftF f) (Return x)
= return x
= f (Return x) -- since f is assumed to be a morphism of monads

(foldFree . liftF $ f) (Free ff)
-- ^ where f is a morphism of monads
= foldFree (liftF f) (Free ff)
= join $ liftF f $ fmap (foldFree (liftF f)) $ ff
-- | by induciton hypothesis
= join $ liftF f $ fmap f $ ff
-- | by definition of (liftF f)
= join $ f $ Free $ fmap Return $ fmap f $ ff 
-- | by functor instance of (Free f)
= join $ f $ fmap f $ Free (Return ff)
-- | since f is a morphism of monads
= f $ join $ Free (Return ff)
= f $ Free ff
liftF . foldFree :: (forall x. f x -> m x) -> (f a -> m a)
(liftF . foldFree $ f) fa
-- ^ where f is a natural transformation
= liftF (foldFree f) $ fa
-- | by definition of liftF
= (foldFree f) $ Free $ fmap Return $ fa
-- | by definition of (foldFree f)
= join $ f $ fmap (foldFree f)  $ fmap Return $ fa
= join $ f $ fmap (foldFree f . Return) $ fa
-- | by defintion of (foldFree f)
= join $ f $ fmap return  $ fa
-- | since f is a natural transformation
= join $ fmap return $ f fa
-- | by monad unitality axiom 
= f fa

foldFree corresponds to foldMap which is defined in a very similar way

foldMap :: Monoid m => (a -> m) -> [a] -> m
foldMap _ [] = mempty
foldMap f (a : as) = mappend (f a) (foldMap f as)

Note that foldMap is an isomorphism onto monoid homomorphisms with an inverse

g :: Monoid m => ([a] -> m) -> a -> m
g f a = f [a]

Furthermore, if we had polymorphic functions over monoidal categories in our type system, `foldMap` and `foldFree` would be specialisations of the same function!

Some examples of free monads

Let us study some simple examples of free monads
  • First let us consider the constant functor:
data Const a b = Const a

Then Free (Const a) is isomorphic to Either a

toEither :: Free (Const a) b -> Either a b
toEither (Return b) = Right b
toEither (Free (Const a)) = Left a
    
fromEither :: Either a b -> Free (Const a) b
fromEither (Right b) = Return b
fromEither (Left a)  = Free (Const a)

Since Either () is isomorphic with Maybe also Maybe is a free monad.

  • `Free Identity` is isomorphic to:
  • data Nat = Zero | Succ Nat
    
    newtype Writer m a = Writer { runWriter :: (m, a) }
    deriving Functor
    
    toFree1 :: Free Identity a -> Writer Nat a
    toFree1 (Return a)           = Writer (Zero, a)
    toFree1 (Free (Identity fa)) = case toFree1 fa of
    Writer (n, a) -> (Succ n, a)
    
    fromFree1 :: (Nat, a) -> Free Identity a
    fromFree1 (Writer (Zero,   a))
    = Return a
    fromFree1 (Writer (Succ n, a))
    = Free (Identity (fromFree1 (Free1 n a)))

    Note that Nat is the free monoid with one generator (Nat$\simeq$[()]) in the cateogry $\mathcal{Hask}$, and so is Free Identity but in the monoidal category of endofunctors of $\mathcal{Hask}$!

  • If you take a functor with two constructors
  • data F2 a = FX a | FY a
    deriving Functor

    . Then we have

    data S2 = SX | SY
    
    toFree2 :: Free F2 a -> Writer [S2] a
    toFree2 (Return a) = Writer ([], a)
    toFree2 (Free (FX fa)) = case toM2 fa of
    Writer (m, a) -> Writer (SX : m, a)
    toM2 (Free (FY fa)) = case toM2 fa of
    Writer (m, a) -> Writer (SY : m, a)
    
    fromFree2 :: Writer [S2] a -> Free F2 a
    fromFree2 (Writer ([], a))
    = Return a
    fromFree2 (Writer (SX : xs, a))
    = Free (FX (fromM2 $ Writer (xs, a)))
    fromFree2 (Writer (SY : xs, a))
    = Free (FY (fromM2 $ Writer (xs, a)))

    toFree2 and fromFree2 are isomorphisms. I think you see the pattern: if you take a functor with $n$ constructors you will end up with a writer monad over a free monoid with $n$ generators. You might ask if all the monads are free then? The answer is no: take a non free monoid m then the monad Writer m is a non free monad. You can prove your self that the writer monad Writer m is free if and only if the monoid m is a free monoid in $\mathcal{Hask}$.

    Final remarks

    I hope I convinced you that monads are algebraic constructs and I hope you'll find universal algebra approach useful. In many cases we are dealing with algebraic structures which we require to satisfy certain equations. Very often they fit into equational theories, which have a very clear description and which allow free objects. Freeness is the property that lets one easily interpret the free object in any other object of the same type. In the monad setting they are really useful when writing DSLs, since you will be able to interpret it in any monad, like IO or some pure monad.

    References


    Note: This post originally appeared here.

    Artwork,

    Creative Commons
    Mike Beeple

    Cardano smart contracts testnet IELE launches

    Developers can run programs with increased confidence

    30 July 2018 Gerard Moroney 4 mins read

    Cardano smart contracts testnet IELE launches

    Today we launch the second Cardano testnet, which is for the IELE virtual machine (VM) and follows our recent launch of the KEVM testnet. The technology is not only an important step on the Cardano roadmap, but also for the industry – in offering robust and reliable financial infrastructure. Developers now have the opportunity to explore the smart contracts technology that will be offered as part of Cardano, and to give us their feedback, which we look forward to receiving over the coming months.

    Why smart contracts?

    In many business processes that involve the exchange of value (including money, property or shares) intermediaries are involved in checking that the terms of the agreements are complete and unambiguous as well as satisfied before the exchange can take place. These intermediaries add to the cost of a transaction. The technology of smart contracts (also known as self-executing contracts or blockchain contracts) has emerged as a way of addressing the need for this verification by reducing the time, third-party involvement, and cost of reliably executing an agreement.

    Smart contracts are software programs that are immutably stored on the blockchain. They are executed by virtual machines and store their data in that same immutable infrastructure. Smart contracts offer great benefits to businesses looking to optimize their operations. Many industries – including automotive, supply chain, real estate and healthcare – are investing in research to understand how this technology can make them more competitive.

    What smart contracts technology is currently available?

    There are a few players on the market that offer smart contract capabilities including Hyperledger, NEO and Ethereum. The technology is evolving to meet the market’s demand for platforms that are fast, secure, accurate and can be trusted. Many businesses have tried to deploy broad-scale applications on these platforms and have run into problems (DAO hack, Parity bug and POWH coin to name a few) with these evolving platforms. Despite widespread publicity the most serious bugs continue to reappear in smart contracts. There is a lot of room for innovation here and IOHK is working hard to become a leader in this technology.

    What Is IELE?

    IELE (pronounced YELL-eh) is a virtual machine, with an attendant low-level language, designed to execute smart contracts on the Cardano blockchain. It has been developed by Runtime Verification in partnership with IOHK, which provided funding for the project. The word IELE refers to nymphs in Romanian mythology.

    How does IELE improve on smart contracts platforms?

    IELE is designed to meet the evolving needs of the market for smart contracts by:

    • Serving as a uniform, lower-level platform for translating and executing smart contracts from higher-level languages. It supports compilation from Solidity and many more languages are set to come.

    • Providing a uniform gas model, across all languages.

    • Making it easier to write secure smart contracts. IELE is 'correct by construction' so many errors discovered after the fact (during code execution) in other VMs are not possible in IELE.

    • Using a register-based as opposed to stack-based architecture.

    What can I do with IELE that I could not do before?

    IELE contains two parts: a correct-by-construction VM designed using the K framework, and a correct by construction, Solidity-to-IELE compiler, also designed using the K framework. When you write your Solidity program and try to compile it using the Solidity-to-IELE compiler, it will catch many of the errors that previously would have been missed and that have caused many smart contracts to fail or be exploited.

    In addition, as IELE development progresses, we plan to deliver 'surface languages' allowing programmers proficient in Javascript, Python and other languages to have an easy way to integrate smart contracts into their applications.

    What do I do next?

    The IELE language and its VM are completed. It is now in the process of being integrated into Cardano, which will provide a blockchain to store and retrieve data. While the integration is taking place, developers have the opportunity to use the IELE VM along with the Mallet and Remix tools to create and execute smart contracts on the IOHK testnet site.

    You can also start getting a feel for the capabilities of both IELE and its VM – and even learn to write IELE code directly!

    Join the conversation!

     

    Related video updates

    Artwork,
    Creative Commons
    Mike Beeple

    Cardano in focus at major international event

    Smart contracts tech wins attention as computer science innovation

    11 July 2018 Jeremy Wood 5 mins read

    Cardano in focus at major international event

    As a third-generation blockchain, Cardano incorporates state of-the-art technology that attracts the interest of computer scientists on the worldwide stage. In the past year, papers describing the consensus algorithm of Cardano have been presented at the leading cryptography conferences, and this month it was the turn of its smart contracts technology to be in the spotlight. Grigore Rosu, a professor in computer science at the University of Illinois at Urbana-Champaign, and his startup Runtime Verification have been working with IOHK since June 2017 to develop new technology based on formal semantics for Cardano, including a new virtual machine. He and his colleague Everett Hildenbrandt came to the UK last week to give presentations at the seventh Federated Logic Conference (FLoC), which this year is in the city of Oxford and runs from July 6-19 with about 2000 attendees. This major conference is held about every four years in various locations around the world, and under its umbrella stages together nine major international computer science conferences. These cover topics such as formal methods, logic and programming languages. Prominent figures from these worlds come to take part and keynote speeches this year are from Shafi Goldwasser and Silvio Micali, the cryptographers and Turing prize winners, and mathematician George Gonthier.

    Grigore Rosu

    On Saturday, Grigore had the distinction of giving his first FLoC invited talk, at the "Logical Frameworks and Meta-Languages: Theory and Practice" workshop and his talk was about the K framework. It was a technical presentation, going into detail about the logical formalism underlying K, and matching logic, a first-order logic variant for specifying and reasoning about structure by means of patterns and pattern matching.

    This technology, developed by Grigore and his start-up Runtime Verification, has been developed over the past 15 years and is used in mission-critical software that cannot afford to fail.  To this end, Runtime Verification has worked with companies including NASA, Boeing and Toyota and many others.  His collaboration with IOHK began after he was contacted by CEO Charles Hoskinson, who had spotted that the software vulnerabilities that had resulted in a number of hacks on blockchains and the draining of hundreds of millions of dollars, could have been prevented using the formal methods techniques developed by Grigore and his team.

    The K framework was used to formally model the semantics of the Ethereum Virtual Machine, and the knowledge gained from this process was employed to design IELE, the virtual machine for Cardano that will be released in a test format in a few weeks' time. This is the first time this technology has been deployed within the blockchain industry. Importantly, K is a means to formally verify the code of smart contracts, so they can be automatically checked for the types of flaws that have led to catastrophic financial loss, and must be avoided at all costs.

    Grigore said: "We designed IELE from scratch as a formal specification using K and generated the VM automatically from the specification. Therefore, there is nothing to prove about the VM in terms of correctness, because it is correct-by-construction."  He added: "We retrospectively analysed the EVM specification which we defined previously, and looked at all our attempts to verify smart contracts with it and then stepped back to think how should have a virtual machine been designed to overcome all those problems. We came up with IELE. This is an LLVM-like VM for the blockchain. For me as the designer of K, this is a hugely important milestone, and is the first time a real language has been defined in K and its implementation automatically generated."

    On Wednesday afternoon, Grigore will give a second invited talk at FLoC, in the International Conference on Formal Structures for Computation and Deduction (FSCD), about the importance of formal analysis and verification for blockchain and virtual machines.  The presentation will be a little less technical than his first talk, and will cover Cardano, and how the tools developed by Runtime Verification allowed the automatic generation of a correct-by-contsruction virtual machine, IELE, from a formal specification.

    Runtime Verification and IOHK

    And on Tuesday at the 31st IEEE Computer Security Foundations Symposium, Everett will present on how he and the team developed KEVM. Everett said: "KEVM is a formal specification of the Ethereum Virtual Machine (EVM) in K, which generates not only a VM but also a symbolic execution engine as well as a deductive program verifier for Ethereum smart contracts. There was a big need for such a complete formal EVM specification, because the previous semantics were either too informal or incomplete. Without a formal semantics of EVM the problem of verifying smart contract is meaningless."

    With three talks in total at FLoC in Oxford, it's great to see this technology recognised at such a major international event. Security is of the utmost importance in designing and producing blockchains and we at IOHK put formal methods at the heart of our design approach. K has already been used to formalise C, Java, JavaScript, PHP, Python, Rust, and using these techniques we also plan to make IELE open to developers of many languages, not just Solidity. We will also offer our own languages, Plutus and Marlowe, which are currently in development with computer scientists including Philip Wadler and Simon Thompson. 

    Readers who would like to experiment with the KEVM testnet can do so through our Cardano testnets website. We are set to release the IELE on this same site in just a few weeks from now. Stay tuned for more updates.

    Artwork,
    Creative Commons
    Mike Beeple

    Self Organisation in Coin Selection

    3 July 2018 Edsko de Vries 18 mins read

    Self Organisation in Coin Selection

    The term "self organisation" refers to the emergence of complex behaviour (typically in biological systems) from simple rules and random fluctuations. In this blog post we will see how we can take advantage of self organisation to design a simple yet effective coin selection algorithm. Coin selection is the process of selecting unspent outputs in a user's wallet to satisfy a particular payment request (for a recap of UTxO style accounting, see section "Background: UTxO-style Accounting" of my previous blog post). As Jameson Lopp points out in his blog post The Challenges of Optimizing Unspent Output Selection, coin selection is thorny problem. Moreover, there is a surprising lack of academic publications on the topic; indeed, the only scientific study of coin selection appears to be Mark Erhardt's masters thesis An Evaluation of Coin Selection Strategies.

    Note: by a slight abuse of nomenclature, throughout this blog post we will refer to a user's set of unspent outputs as that user's UTxO.

    Dust

    An obvious strategy that many coin selection algorithms use in some form or other is "try to get as close to the requested value as possible". The problem with such an approach is that it tends to create a lot of dust: small unspent outputs that remain unused in the user's wallet because they're not particularly useful. For example, consider the "largest first" algorithm: a simple algorithm which considers all unspent outputs of the wallet in order of size, adding them to a running total until it has covered the requested amount. Here's an animation of the effect of this algorithm:

    Figure 1 Figure 1-1
    Figure 1. Simulation of largest-first coin selection. Main histogram shows UTxO entries; inset graph shows UTxO balance in blue and UTxO size in red, histogram top-right shows number of inputs per transaction, graph bottom right shows the change:payment ratio (more on that below). Graph at the bottom shows the distribution of deposits (blue, left axis) versus payments (red, right axis). In this case, both are normally distributed with a mean of 1000 and 3000 respectively, and we have a deposit:payment ratio of 3:1; modelling a situation where we have frequent smaller deposits, and less frequent but larger payments (withdrawals). The wallet starts with an initial balance of 1M.

    There are various things to see in this animation, but for now we want to focus on the UTxO histogram and its size. Note that as time passes, the size of the UTxO increases and increases, up to about 60k entries after about 1M cycles (with 3 deposits and 1 payment per cycle). A wallet with 60k entries is huge, and looking at the UTxO histogram we can see why this happens: virtually all of these entries are dust. We get more and more small outputs, and those small outputs are getting smaller and smaller.

    Cleaning up

    Erhardt makes the following very astute observation:

    If 90% of the UTxO is dust, then if we pick an unspent output randomly, we have a 90% change of picking a dust output.

    He concludes that this means that a coin selection algorithm that simply picks unspent outputs at random might be pretty effective; in particular, effective at collecting dust. Indeed, it is. Consider the following simulation:

    Figure 2 Figure 2-2
    Figure 2. Same distribution and ratio as in Figure 1; we run the largest-first algorithm for 1M cycles, and then random coin selection for another 150k cycles.

    Notice quite how rapidly the random coin selection reduces the size of the UTxO once it kicks in. If you watch the inputs-per-transaction histogram, you can see that when the random input selection takes over, it first creates a bunch of transactions with 10 inputs (we limited transaction size to 10 for this simulation), rapidly collecting dust. Once the dust is gone, the number of inputs shrinks to about 3 or 4, which makes perfect sense given the 3:1 ratio of deposits and withdrawals.

    We will restate Erhardt's observation as our first self organisation principle:

    Self organisation principle 1. Random selection has a high priobability of picking dust outputs precisely when there is a lot of dust in the UTxO.

    It provides a very promising starting point for an effective coin selection algorithm, but there are some improvements we can make.

    Active UTxO management

    Consider the following simulation of a pure "select randomly until we reach the target value" coin selection algorithm:

    Figure 3 Figure 3-2
    Figure 3. Random-until-value-reached, for a 1:1 ratio of deposits and withdrawals, both drawn from a normal distribution with mean 1000.

    The first observation is that this algorithm is doing much better than the largest-first policy in terms of the size of the UTxO, which is about 2 orders of magnitude smaller: a dramatic improvement. However, if we look at the UTxO histogram, we can see that there is room for improvement: although this algorithm is good at collecting dust, it's also still generating quite a bit of dust. The UTxO histogram has two peaks. The first one is approximately normally distributed around 1000, which are the deposits that are being made. The second one is near 0, which are all the dust outputs that are being created.

    This brings us to the topic of active UTxO management. In an ideal case, coin selection algorithms should, over time, create a UTxO that has "useful" outputs; that is, outputs that allow us to process future payments with a minimum number of inputs. We can take advantage of self organisation again:

    Self organisation principle 2. If for each payment request for value x we create a change output roughly of the same value x, then we will end up with a lot of change outputs in our UTxO of size x precisely when we have a lot of payment requests of size x.

    We will consider some details of how to achieve this in the next section. For now see what the effect of this is on the UTxO:

    Figure 4 Figure 4-2
    Figure 4. Same deposits and withdrawals as in Figure 3, but now using the "pick randomly until we have a change output roughly equal to the payment" algorithm.

    The graph at the bottom right, which we've ignored so far, records the change:payment ratio. A value near zero means a very small change output (i.e., dust); a very high value would be the result of using a huge UTxO entry for a much smaller payment. A value around 1 is perfect, and means that we are generating change outputs of equal value as the payments.

    Note that the UTxO now follows precisely the distribution of payment requests, and we're not generating dust anymore. One advantage of this is that because we have no dust, the average number of inputs per transaction can be lower than in the basic algorithm.

    Just to illustrate this again, here is the result of the algorithm but now with a 3:1 ratio of deposits and withdrawals:

    Figure 5 Figure 5-2
    Figure 5. Same algorithm as in Figure 4, but now with 3:1 deposits:payments (i.e., many small deposits, fewer but larger payments).

    We have two bumps now: one normally distributed around 1000, corresponding to the the deposits, and one normally distributed around 3000, corresponding to the payment requests that are being made. As before, the median change:payment ratio is a satisfying round 1.0.

    The "Random-Improve" algorithm

    We are now ready to present the coin selection algorithm we propose. The basic idea is simple: we will randomly pick UTxO entries until we have reached the required value, and then continue randomly picking UTxO entries to try and reach a total value such that the the change value is roughly equal to the payment.

    This presents a dilemma though. Suppose we have already covered the minimum value required, and we're trying to improve the change output. We pick an output from the UTxO, and it turns out to be huge. What do we do? One option is to discard it and continue searching, but this would result in coin selection frequently traversing the entire UTxO, resulting in poor performance.

    Fortunately, self organisation comes to the rescue again. We can set an upper bound on the size of the change output we still consider acceptable (we will set it to twice the payment value). Then we take advantage of the following property.

    Self organisation principle 3. Searching the UTxO for additional entries to improve our change output is only useful if the UTxO contains entries that are sufficiently small enough. But precisely when the UTxO contains many small entries, it is less likely that a randomly chosen UTxO entry will push the total above the upper bound we set.

    In other words, our answer to "what do we do when we happen to pick a huge UTxO entry?" is "we stop trying to improve our selection". We can now describe the algorithm:

    1. Randomly select outputs from the UTxO until the payment value is covered. (In the rare case that this fails because the maximum number of transaction inputs has been exceeded, fall-back on the largest-first algorithm for this step.)
    2. Randomly select outputs from the UTxO, considering for each output if that output is an improvement. If it is, add it to the transaction, and keep going. An output is considered an improvement when:
      • It doesn't exceed the specified upper limit
      • Adding the new output gets us closer to the ideal change value
      • It doesn't exceed the maximum number of transaction inputs.
        Figure 6. The Random-Improve algorithm. Side note for point (2a): we use twice the value of the payment as the upper limit. Side note for point (2b): it might be that without the new output we are slightly below the ideal value, and with the new output we are slightly above; that is fine, as long as the absolute distance decreases.

    Evaluation

    The algorithm from Figure 6 is deceptively simple. Do the self organisation principles we isolated really mean that order will emerge from chaos? Simulations suggest, yes, it does. We already mentioned how random input selection does a great job at cleaning up dust in Figure 2; what we didn't emphasize in that section is that the algorithm we simulated there is actually our Random-Improve algorithm. Notice how the median change:payment ratio is initially very low (indicative of a coin selection algorithm that is generating a lot of dust outputs), but climbs rapidly back to 1 as soon as Random-Improve kicks in. We already observed that it does indeed do an excellent job at cleaning up the dust, quickly reducing the size of the UTxO. The simulations in Figures 4 and 5 are also the result of the Random-Improve algorithm.

    That said, of course the long term effects of a coin selection algorithm can depend strongly on the nature of the distribution of deposits and payments. It is therefore important that we evaluate the algorithm against a number of different distributions.

    Normal distribution, 10:1 deposit:payment ratio

    We already evaluated "Random-Improve" against normally distributed payments and deposits with a 1:1 ratio and a 3:1 ratio; perhaps more typical for exchange nodes might be even higher ratios. Here is a 10:1 ratio:

    Figure 7 Figure 7-2
    Figure 7. Simulation of largest-first coin selection. Main histogram shows UTxO entries; inset graph shows UTxO balance in blue and UTxO size in red, histogram top-right shows number of inputs per transaction, graph bottom right shows the change:payment ratio (more on that below). Graph at the bottom shows the distribution of deposits (blue, left axis) versus payments (red, right axis). In this case, both are normally distributed with a mean of 1000 and 3000 respectively, and we have a deposit:payment ratio of 3:1; modelling a situation where we have frequent smaller deposits, and less frequent but larger payments (withdrawals). The wallet starts with an initial balance of 1M.

    We see a very similar picture as we did in Figure 5. Since the deposits and payments are randomly drawn (from normal distributions), the UTxO balance naturally fluctuates up and down. What is satisfying to see however is that the size of the UTxO tracks the balance rather precisely; this is about as good as we can hope for. Notice also that the change:payment ratio is a nice round 1, and the average number of transaction inputs covers around 10 or 11, which is what we'd expect for a 10:1 ratio of deposits:payments.

    Exponential distribution, 1:1 deposit:payment ratio

    What if the payments and deposits are not normally distributed? Here is Random-Improve on exponentially distributed inputs:

    Figure 8 Figure 8-2
    Figure 8. Random-Improve, 1:1 deposit:payment ratio, deposits and payments both drawn from an exponential distribution with scale 1000.

    In an exponential distribution we have a lot of values near 0; for such values it will be hard to achieve a "good" change output, as we are likely to overshoot the range. Partly due to this reason the algorithm isn't quite achieving a 1.0 change:payment ratio, but at 1.5 it is still generating useful change outputs. Furthermore, we can see that the size of the UTxO tracks the UTxO balance nicely, and the average number of transaction inputs is low, with roughly 53% having just one input.

    Moreover, when we increase the deposit:payment ratio to 3:1 and then 10:0, the change:payment ratio drops to about 1.1 and then back to 1.0 (graphs omitted).

    Erlang

    The exponential distribution results in many very small deposits and payments. The algorithm does better on slightly more realistic distributions such as the Erlang-k distributions (for k > 1). Here we show the animation for the 3:1 deposit:payment ratio using the Erlang-3 distribution; the results for other ratios (including 1:1) and other values of k (we additionally simulated for k = 2 and k = 10) are similar.

    Figure 9 Figure 9-2
    Figure 9. Random-Improve, 3:1 deposit:payment ratio, deposits drawn from an Erlang-3 distribution with scale 1000 and payments drawn from Erlang-3 distributio with scale 3000.

    More payments than deposits

    We have been focusing on the case where we have more deposits and fewer (but larger) payments. What happens if the ratio is reversed?

    Figure 10 Figure 10-2
    Figure 10. Random-Improve, 1:10 deposit:payment ratio, deposits and payments drawn from a normal distribution with mean 10k and 1k, respectively. 1M cycles.

    In this case we are unable to achieve that perfect 1.0 change:payment ratio, but this is expected: when we have large deposits, then we frequently have no choice but to use those, leading to large change outputs. We can see this more clearly when we slow things right down, and remove any source of randomness; here is the same 1:10 ratio again, but now only the first 100 cycles, and all deposits exactly 10k and all payments exactly 1k:

    Figure 11
    Figure 11. Random-Improve, 1:10 deposit:payment ratio, all deposits exactly 10k, all payments exactly 1k (no randomness). First 100 cycles only.

    We can see the large value being deposited, and then shifting to the left in the histogram as it is getting used for deposits, each time decreasing that large output by 1k. Indeed, this takes 10 slots on average, which makes sense given the 10:1 ratio; moreover, the average value of the "large output" in such a 10-slot cycle is 5k, explaining why we are getting 5.0 change:payment ratio.

    The algorithm however is not creating dust outputs; the 1k change outputs it is generating are getting used, and the size of the UTxO is perfectly stable. Indeed, back in Figure 12 we can see that the size of the UTxO tracks the balance perfectly; moreover, the vast majority of transactions only use a single input, which is what we'd expect for a 10:0 deposit:payment ratio.

    Real data

    MoneyPot.com

    Ideally, of course, we run the simulation against real event streams from existing wallets. Unfortunately, such data is hard to come by. Erhardt was able to find one such dataset, provided by MoneyPot.com. When we run our algorithm on this dataset we get

    Figure 12 Figure 12-2
    Figure 12. Random-Improve, using the MoneyPot data set. There is a rougly 2:1 deposit:payment ratio. Values have been scaled. NOTE: log scale on the x-axis.

    A few observations are in order here. First, there are quite a few deposits and payments close to 0, just like in an exponential distribution. Moreover, although we have many values close to 0, we also have some huge outliers; the payments are closely clustered together, but there is a 10xE9 difference between the smallest and largest deposit, and moreover a 10xE5 difference between the largest deposit and the largest payment. It is therefore not surprising that we end up with a relatively large change:payment ratio. Nonetheless, the algorithm is behaving well, with the size of the UTxO tracking the balance nicely, with an average UTxO size of 130 entries. The average number of outputs is 3.03, with 50% of transactions using just one input, and 90% using 6 or fewer.

    Cardano Exchange

    One of the large Cardano exchange nodes has also helped us with some anonymised data (deposits and payments), similar in nature to the MoneyPot dataset albeit significantly larger. Coming from an exchange node, however, this dataset is very much skewed towards deposits, with a deposit:payment ratio of roughly 30:1. Under these circumstances, of course, coin selection alone cannot keep the UTxO size small.

    Figure 13 Figure 13-2
    Figure 13. Random-Improve, using data set from a large Cardano exchange. There is a rougly 30:1 deposit:payment ratio. Values have been scaled. NOTE: log scale on the x-axis.

    Conclusions

    The choice of coin selection algorithm has far reaching consequences on the long term behaviour of a cryptocurrency wallet. To a large extent the coin selection algorithm determines, over time, the shape of the UTxO. Moreover, the performance of the algorithm can be of crucial importance to high-traffic wallets such as exchange nodes.

    In his thesis, Erhardt proposes "Branch and Bound" as his preferred coin selection algorithm. Branch and Bound in essence is a limited backtracking algorithm that tries to find an exact match, so that no change output needs to be generated at all. When the backtracking cannot find an exact match within its bounds, the algorithm then falls back on random selection. It does not, however, attempt our "improvement" step, and instead just attempts to reach a minimum but fixed change size, to avoid generating dust. It is hard to compare the two algorithms directly, but on the MoneyPot dataset at least the results are comparable; Erhardt ends up with a slightly smaller average UTxO (109 versus our 130), and a slightly smaller average number of inputs (2.7 versus our 3.0). In principle we could modify our Random-Improve algorithm to start with bounded backtracking to find an exact match, just like Erhardt does; we have not done this however because it adds complexity to the algorithm and reduces performance. Erhardt reports that his algorithm is able to find exact matches in 30% of the time. This is very high, and at least partly explains why his UTxO and average number of change outputs is lower; in the Cardano blockchain, we would not expect that there exist exact matches anywhere near that often (never mind finding them).

    Instead our proposed Random-Improve does no search at all, instead purely relying on self organisation principles, the first of which was stated by Erhardt, and the other two we identified as part of this research. Although in the absence of more real data it is hard to evaluate any coin selection algorithm, we have shown that the algorithm performs well across a large variety of different distributions and deposit:payment ratios. Moreover it is straight-forward to implement and has high performance.

    One improvement we may wish to consider is that when there are very large deposits, we could occassionally issue a "reorganisation transaction" that splits those large deposits into smaller chunks. This would bring the change:payment ratio down, which would improve the evolution of the UTxO over time and is beneficial also for other, more technical reasons (it reduces the need for what we call "dependent transactions" in the wallet specification). Such reorganisation is largely orthogonal to this algorithm, however, and can be implemented independently.