Cardano reaches development milestone
Engineers conclude Byron phase as Shelley work progresses
22 March 2019 6 mins read
The release of Cardano 1.5 marks the start of the shift from the Cardano Byron era to the Shelley era and is an excellent opportunity to describe the ongoing work for Shelley and how the transition will happen. Roughly six months ago, we switched almost all our development efforts to the Shelley code base, and work has been progressing quickly ever since. The last major work in the Byron code base was completed for Cardano 1.4, and for 1.5 we limited work to only those changes required for a smooth transition to Shelley.
The Shelley code base is not just an extension to the Byron code base, but an entirely new foundation. For the Shelley era, we have taken the opportunity to strip back and rebuild the system, as well as including the new staking and delegation functionality. As a result, we have been able to remedy a number of architectural limitations in the Byron code, as well as engage in the semi-formal software development approach that I keep discussing in my videos.
In particular, we now have formal mathematical specifications of the validation rules for the Byron and Shelley blockchain, and will present these specifications at the IOHK summit in April. When development is complete, we will be able to provide evidence that the code correctly implements our specifications. This is an exciting step-change in system quality and will be a first for our industry.
A seamless transition
We must, of course, manage the transition from Byron to Shelley very carefully. It is not just a significant change in the rules, but also a migration from one code base to another. We have gone to great lengths to ensure that the transition process will be as smooth as possible.
While you might expect the Shelly transition to involve a single hard fork, it will actually comprise two. It is worth emphasizing that while these are technically hard forks, they will not be disruptive in the way hard forks often are. The changes have been designed to use our existing update system and be minimally disruptive. For Daedalus users, it will be very much like any other update.
For both hard forks, we will deploy an update which includes the rules of the new era in an unactivated state, to be activated several weeks later. This is key to avoiding disruption at the hard fork: no software is updated at the moment of the hard fork itself. The software update happens earlier, and once everyone is ready we can smoothly activate the change.
The only difference between a hard fork and a regular update is that updating is compulsory between the software release and the hard fork activation. For Daedalus users, this happens via the standard software update system. Exchanges will have to upgrade manually, but they have several weeks to do so.
Why two hard forks?
For technical reasons, the transition from Byron to Shelley is more straightforward if we go via an intermediate transitional era. There is one hard fork to enter the transitional era and then a second one to begin the Shelley era proper. The Byron era uses Ouroboros Classic, and the Shelley era uses Ouroboros Genesis (which is an extension of Ouroboros Praos). Both of these are complex protocols. For a single implementation of a full node to manage a hard fork smoothly it is necessary for it to implement the rules both before and after the hard fork. A direct hard fork from Byron to Shelley would require a single implementation to understand Ouroboros Classic, Ouroboros Genesis, and all of the other validity rules – which is a very complicated prospect indeed.
Not only that, but the Byron version of Ouroboros Classic has some additional complexity that would need to be replicated in a new implementation to preserve perfect consensus. Instead, we are using Ouroboros BFT, a simple variant of Ouroboros, for the transitional era. This means that the Byron code base only has to understand Ouroboros Classic and Ouroboros BFT, while the Shelley code base only has to understand Ouroboros BFT and Ouroboros Genesis. Neither one has to understand both Ouroboros Classic and Ouroboros Genesis. In particular, this means that the new Shelley code base does not need to replicate every detail of the Byron implementation of Ouroboros Classic, achieving a genuine reduction in complexity – and in software development, complexity is the enemy.
A transitional era
So this explains what the Cardano 1.5 release is really for: it is the release in which the Byron code base begins to understands Ouroboros BFT, allowing us to complete the first managed hard fork in a few weeks' time. After the hard fork, we will be in the transitional era using Ouroboros BFT and will be able to start deploying the new code base over time as it is developed. This is the new code base that will be used for the Shelley releases later, but is initially still using Ouroboros BFT for perfect compatibility during the transition.
During this transitional period, we will also run a testnet for delegation and staking. Initially, this testnet will use a subset of the Shelley rules, but we will update it over time until the full Shelley rules are implemented and any other issues uncovered by the testnet resolved.
Once we are satisfied with the full implementation of the Shelley rules, then we will deploy an update of the new code base on mainnet. A few weeks later we will activate the hard fork and then we are finally in the Shelley era on mainnet!
In summary, the Cardano 1.5 release is exciting not because of any major features, or the numerous incremental improvements in Daedalus, but because it is the milestone that marks the beginning of the end for Cardano Byron and the start of the transition process into Cardano Shelley.
The Symphony of Blockchains project comes to Bristol
The groundbreaking immersive blockchain experience was presented at the Arnolfini Gallery, Bristol
20 February 2019 3 mins read
The Symphony of Blockchains project had its public debut last week at a packed event in Bristol’s Arnolfini Gallery. Over a hundred visitors took the chance to explore the bitcoin blockchain in full virtual reality (VR), at a joint presentation from IOHK creative partners Kuva and IOHK's creative team. The Symphony project is an educational initiative, with the goal of creating new and innovative ways to teach people about blockchain technology. Andy Buchan, founder and creative director of Kuva, described the project as ‘an unprecedented opportunity to visualize new technology, and to establish a vocabulary for the blockchain industry,’ while IOHK creative director Richard Wild spoke of how the project had 'enabled creatives to engage with hard concepts in new ways.’
The exhibition itself spanned two rooms: one with a large, towering monolith in the centre and two VR headsets, and a second room with rows of augmented reality posters and a projector attached to a VR setup.
In the first room, visitors queued to put on VR headsets and watch the monolith come to life, while a voiceover explains the fundamentals of blockchain technology. As you listen, bright streaks of light — transactions happening on bitcoin's blockchain in real time — shoot past, gravitating towards the heart of the monolith. There they blend into a glowing ball, a melting pot of transactions that viewers can manipulate, expanding the data into the room around them to explore in 3D.
In a second room, another VR experience — single-handedly created by IOHK creative coder Scott Darby — saw visitors dive deeper into the raw, living data of the bitcoin blockchain. Here you can traverse the intricate structure of Merkle trees, surrounded by bitcoin transaction crystals which glow and hum as you explore. A more in-depth experience, this part of the VR exhibition allowed visitors to travel all the way back to the bitcoin genesis block, generating dynamic audio in real-time as users explored the virtual space.
Creating the Symphony project isn't just a stunning visual achievement for IOHK and Kuva, but a technical one too. Running the VR simulations requires simultaneously managing and syncing multiple data streams, from the audiovisual cues for the headsets to the lidar mapping the room — and of course, live data from the bitcoin blockchain itself.
Everyone who attended the event seemed impressed, describing the experience as ‘mind-blowing.’ The opportunity to explore the constructs underpinning bitcoin in a tactile, physical way allowed even those with no blockchain experience to grasp fundamental concepts quickly, and visitors as young as twelve expressed a greater understanding of blockchain as a result.
The event in Bristol was just the start for the Symphony project: there'll be an even bigger exhibit at the IOHK Summit in Miami next month, before Symphony goes on tour to educate people around the world about blockchain technology.
How we use Nix at IOHK
6 February 2019 10 mins read
IOHK has a history of putting great research into practice. This includes the use of functional programming languages, adoption of formal methods, and — of course — implementing our own peer-reviewed research into blockchain consensus algorithms. We approach the problem of software deployment in similar vein, using ideas which have emerged from recent academic work. This system applies the principles of functional programming to the domain of software package management and system administration.
In this short technical article I will describe how we leverage Nix at IOHK, and share some of our plans for the future.
What are Nix, Nixpkgs, and NixOS?
NixOS is a Linux distribution built on the Nix package manager.
What I personally like about the Nix package manager is its ability to install multiple versions of the same package, or easily apply a source code modification to a package. For Haskell development in particular, there are binary packages for an impressive number of compilers, tools, and Hackage packages available.
What sets NixOS apart is its configuration system. It includes a large collection of high-quality modules which are configured declaratively in a uniform way. This is a boon for system administrators who need to apply the same configuration to many systems, or integrate a wide variety of software into one system.
Underneath the Nix package manager and NixOS is the Nix language — a domain-specific, purely functional, lazily-evaluated language for software configuration and deployment.
The paper NixOS: A purely functional Linux distribution is an excellent introduction. It clearly sets out the problems which NixOS attempts to solve, many of which will be familiar to technical users.
Who is behind the NixOS project?
NixOS was initially created in 2003 at Utrecht University as the research project of Eelco Dolstra, forming the main part of his doctoral thesis. Fifteen years later, the NixOS project is run on GitHub with hundreds of contributors involved, and the code base is currently improving at a rate of about 500 git commits per week.
The community also communicate through an active IRC channel on Freenode, a discussion board, a (roughly) annual conference, and regular meetups.
Moreover, several companies provide Nix consulting services, such as Tweag I/O and Mayflower.
Why Nix?
There are many existing systems for software configuration management, some with far more users than Nix. However, we believe that Nix has the best available implementation of 'Infrastructure as Code', not only in terms of features, but also in its design and potential.
If you treat your deployment as a program written in a functional language, then you gain the advantages of functional programming like referential transparency, safe code reuse, and the ability to unit test your deployment. At its heart, the Nix 'derivation' abstracts details away and allows software modules to be built and used in more or less the same way, regardless of their source language.
A derivation is a function which takes source code, build tools, and library dependencies, and produces an immutable software package; because Nix is pure (in the functional programming sense), if any of the inputs change, then the result is a different software package. Alternatively, if a derivation with exactly the same inputs has already been built elsewhere (by continuous integration (CI), for example), Nix will just download that package, rather than building the derivation itself, reducing duplication of work.
The way Nix calculates whether a derivation input has changed is by taking its cryptographic hash. Dependencies are recursive, so this hash is actually a hash of hashes — a powerful concept instantly recognizable to blockchain enthusiasts.
How we use Nix
Here are some examples of how we are using Nix at IOHK.
1. Deployment of the Cardano SL network
We run around 200 compute nodes in AWS for the various testnets, the Cardano SL Byron mainnet, and the Cardano SL staging deployment (an environment used to test upgrades).
These are all NixOS systems deployed with NixOps.
We have written a tool called iohk-ops
, which generates a NixOps network from an abstract topology description. The topology description describes the number of Cardano SL relay nodes, which data center they are hosted in, and static routes between peers. iohk-ops
calls on NixOps to execute the deployment.
The same tool allows deployment of smaller 'developer clusters' which are used for testing activities. These clusters share the same deployment code, and identical software versions, but are smaller and can be started and stopped as necessary.
The rollback features of NixOS and NixOps have already proved invaluable several times in our staging deployments: the whole network can be redeployed with a single command, or just specific nodes.
With NixOps we have the ability to apply configuration tweaks and code patches, then rebuild at deploy-time, confident that the result will be identical to what comes from CI, and can be undone without leaving a mess behind.
2. Reproducible deployment
Extreme measures such as deploy-time code modification are usually not required, because the deployment environment can be reproduced and tested locally, ensuring any problems are caught well in advance.
As mentioned earlier, a Nix derivation is a pure function. The code can be run locally with exactly the same environment and libraries (up to the Linux kernel) as are used in production.
3. Fast rebuilds of Docker images
Docker's strengths include its immutability, encapsulation, and reproducibility, traits shared by Nix. However, it is severely limited by its data model and restricted DSL.
The structure of software dependencies is a tree, not a sequence of layers, as modelled by Docker. A typical annoyance when building Docker images with a Dockerfile
is that if a layer near the bottom changes, then everything above must be rebuilt. Furthermore, unless steps are taken to use multiple build stages, intermediate build products from lower layers will be included in the final image.
In contrast, a Nix package will only be rebuilt if its dependencies (or their dependencies) have changed. The minimum closure of a Nix package contains only what is necessary to run that software.
Thus we can generate Docker images from the closure of Nix derivations which are minimal and fast to rebuild. The build process can also be parameterized by defining it as a function in the Nix language.
4. Declarative deployment of macOS build slaves
To produce the macOS installer packages of Daedalus, we maintain a small collection of macOS systems (hosted by MacinCloud). These systems were quite tedious to maintain until we started using nix-darwin. Nix-darwin allows declarative configuration of macOS systems in a similar way to NixOS.
We wrote a script to set up the Mac through SSH. All things are managed through Nix code, including the build environment, monitoring, and SSH access.
5. Cross-compiling Cardano SL for Windows
To produce the Windows installer packages of Daedalus, we needed to rely on resource-limited build instances of CI providers, or maintain a small collection of Windows systems. The Windows CI infrastructure has been a constant source of problems for us.
However, we are now able to cross-compile Haskell code for Windows on Linux! The next major version of Daedalus for Windows will ship with a cardano-node.exe
that was built with Nix on Linux.
Due to the size of Cardano SL, this is quite an achievement, and the result of several months of hard work.
6. Development environments
With nix-shell
, we can provide development environments containing all tools and libraries necessary to build a project, pinned to precisely defined versions.
In cardano-sl
, the Nix shell supports either the stack
workflow, or cabal new-build
with Haskell pre-built dependencies downloaded from the IOHK binary cache.
In daedalus
, the Nix shell provides yarn
, and a method of running a Cardano SL blockchain and wallet locally.
Developers and users who are not familiar with build tools from other languages such as Cabal, Stack (Haskell), NPM, Yarn (Javascript), or Cargo (Rust), are still able to rebuild (or download a cached build from CI), just by running nix build
.
There are other tools for providing reproducible development environments, such as Apache Maven, or Docker, but these are not easily composable. Because Nix is a functional language, development environments are the application of functions, which can be composed. So with Nix you can write a function that combines development environments, or just the parts that you need.
Future directions
2019 will be an exciting year for IOHK, with several ambitious projects in the pipeline. We anticipate that Nix will play a large part in the success of these projects. Some of the things we have planned are:
Pure Daedalus build
We currently build the Daedalus installers for Windows and macOS using a Haskell script. This is slower than necessary and needs to run on a Windows or macOS system: we would prefer to build on Linux with Nix so that software components can be cached as necessary.
iohk-nix
IOHK is splitting the Cardano SL code base into independent modules, which must be integrated into a final product build. We have started a repository of common Nix code iohk-nix
that can be re-used by each Cardano SL subproject.
As with all languages, Nix code can be written in a beautiful or an awful way — and because Nix does not have a static type system, the awful code is really bad. So part of this project will be an effort to improve the legibility and documentation of our Nix code so that it's easy to understand and maintain.
Tools for deploying a node or stake pool
In 2019, Cardano SL will be decentralized. We would like to make it super easy for technical users to deploy one or more Cardano SL nodes, or operate a stake pool.
This will include:
- Smoothing the rough corners of
iohk-ops
to make it ready for public use. - Providing a variety of build options such as Docker images, or fully statically-linked executables.
- Using the tools available in NixOS to provide a turn-key monitoring setup for all the necessary application-level metrics.
nix-tools Haskell Builder
The Nix Haskell build infrastructure needed some significant rework in order to cross-compile Haskell to target Windows, mostly to track more information about compiler flags and dependencies.
The result is a more powerful Haskell builder, and a new implementation of stack2nix
which runs faster, among other things. This Haskell build infrastructure is called nix-tools
. We aim to fill in the missing parts and upstream it to Nixpkgs in the coming year.
This subject is so interesting that it deserves its own blog post, which we are looking forward to posting soon.
Conclusion
This was a quick run-through of how we use Nix to help develop and deploy Cardano.
It's not all excellent of course. Users find that Nix has a steep learning curve, and that the community is relatively small. We also have our own wish-list of features and bug fixes for Nix that are not implemented yet.
If you are interested in learning more about Nix, then visit the NixOS website. If you like coding or system administration, it's very easy to start contributing to the NixOS project. It takes a little while to get oriented with the Nix language, but then you'll find that pretty much everything you need to know is in a single git repository.
At IOHK we regularly contribute back to the NixOS project. These contributions include improvements to packages and NixOS modules, and contracting developers to make larger improvements. Additionally, simply by using NixOS at a large scale in research and commercial projects, we contribute to the ecosystem — by publishing our source code under a free licence, submitting bug reports, or sharing our experiences with the community.
We are hiring
The IOHK team is growing, and we are looking for the right people to join us. If helping develop a blockchain platform more advanced than any other is something that interests you, then check out our job listings and get in touch!
Artwork, Mike BeepleCardano is secure against fake stake attacks
Peer-reviewed design means Ouroboros is free from a flaw affecting many proof-of-stake blockchains
29 January 2019 6 mins read
Ada is not among the 26 cryptocurrencies identified by US researchers last week as being vulnerable to ‘fake stake’ attacks.
1 The Cardano blockchain underlying Ada is based on proof-of-stake (PoS), but its Ouroboros protocol uses no bitcoin code and is not affected by the PoSv3 problem.2 This is not just good luck, but a consequence of the thorough, formally-verified approach taken during Cardano’s development.The vulnerability
The vulnerability is explained very well in the original article. In order to understand why Cardano is not affected by it, we will summarise the essence of the vulnerability here.
All the vulnerable systems are using PoSv3, a modification of the bitcoin code that aims to replace hashing power with stake for the purpose of determining who is eligible to create a block. In the original bitcoin code, the decision of who gets to create the next block is based purely on hashing power: whoever manages to find a suitable random number, and thus get a correct hash first, wins. PoSv3, however, adds an additional variable, to simulate the notion of stake.
In a PoS system, the likelihood of getting to create a block is proportional to how much stake a user has in the system: the more stake a user has, the more likely it is that they get to create the next block. To mimic this functionality, PoSv3 allows users to add additional information to their candidate block, in the form of a ‘staking transaction’. The more tokens they have available to use in their staking transaction, the easier it becomes for them to get a correct hash, and thus earn the right to create the next block.
Whilst PoSv3 does successfully tie block creation rights to stake in this way, it also makes block validation more difficult. Not only does the hash of the block itself need to be verified (as in bitcoin), but so does a user’s staking transaction: that is, did the user actually own the tokens they used in their staking transaction? To verify this information, a blockchain node has to be able to refer to the ledger, and – if a block does not simply extend the current chain but introduces a fork – also the history of the ledger. Since that is neither cached nor cheap to calculate, blocks in PoSv3 systems are not validated immediately, but are rather (at least partially) stored in memory or on disk when they pass some heuristics.
The vulnerabilities discussed in the original article can be exploited in a number of ways, but ultimately involve fooling those heuristics and presenting lots of invalid blocks to a node, such that the node runs out of memory and crashes before it can correctly identify that the blocks are invalid.
Why Cardano is different
For Cardano, IOHK took a different approach. Instead of finding a minimal variation of bitcoin, we relied on world-leading academics and researchers to create a new protocol and codebase from scratch, with the requirement that it should provide equivalent (or better) security guarantees than bitcoin, but rely entirely on stake. The result is the Ouroboros protocol
The basic design of Ouroboros is remarkably simple: time is divided into discrete increments, called slots, and slots are grouped into longer periods, called epochs. At the start of each epoch, a lottery determines who gets to create a block for every slot. Instead of this lottery being implicit, ie whoever gets a right hash first wins, the lottery is explicit: a generated random number determines a slot leader for each slot, and the chances of winning for any given slot are proportional to the stake one controls
In this protocol, validating that a block has been signed by the right stakeholder is also simple: it requires only the leader schedule for the current epoch (which will not change in case of a temporary fork), and the checking of a signature. This can and will be done by each node once they get the block header, in contrast to the PoSv3 systems that are vulnerable to fake stake attacks.
In short: Cardano is secure against fake stake attacks because it’s based on a fundamentally different system. PoSv3 cryptocurrencies run on proof-of-work (PoW) systems, modified to take stake into account in the implicit leader election, and the vulnerability in question is a result of that modification, and the additional complexities it involves.
Not only does Cardano have a fundamentally different foundation, but that foundation is the result of multiple peer-reviewed academic papers, and an unprecedented collaboration between researchers and developers. The formal and semi-formal methods involved in creating the upcoming Shelley release of Cardano ensure that its construction at code level evidently matches the protocol described in the peer-reviewed research papers, building in reliability and security by design – and avoiding the problems of PoSv3, which have arisen as a result of modifying an existing protocol instead of creating a thoroughly proven, bespoke protocol like Ouroboros.
Footnotes
1. ‘“Fake Stake” attacks on chain-based Proof-of-Stake cryptocurrencies’ by Sanket Kanjalkar, Yunqi Li, Yuguang Chen, Joseph Kuo, and Andrew Miller of the Decentralized Systems Lab at the University of Illinois at Urbana-Champaign.↩
2. To be precise, the following discussion is targeted at the upcoming Shelley release of Cardano. The currently deployed Byron release is running in a federated setting, and thereby operationally protected from this kind of attack anyway.↩
3. There are by now a number of variations of the Ouroboros protocol. We describe only the classic version of Ouroboros here, but the general argument holds for all variants – in particular for Ouroboros Praos, which will be the protocol used in the Shelley release.↩
4. To be precise, leader election for a given epoch uses the stake distribution at a point in time before the epoch starts, to prevent grinding attacks and a re-calculation of the schedule in case of a temporary fork at the epoch boundary.↩
An in-depth look at quickcheck-state-machine
28 January 2019 46 mins read
Please note: this post originally appeared on the Well-Typed blog. Stateful APIs are everywhere: file systems, databases, widget libraries, the list goes on. Automated testing of such APIs requires generating sequences of API calls, and when we find a failing test, ideally shrinking such a sequence to a minimal test case. Neither the generation nor the shrinking of such sequences is trivial. After all, it is the very nature of stateful systems that later calls may depend on earlier calls: we can only add rows to a database table after we create it, we can only write to a file after we open it, etc. Such dependencies need to be tracked carefully. Moreover, in order to verify the responses we get back from the system, the test needs to maintain some kind of internal representation of what it thinks the internal state of the system is: when we read from a file, we need to know what was in the file in order to be able to verify if the response was correct or not.
In this blog post we will take an in-depth look at quickcheck-state-machine
, a library for testing stateful code. Our running example will be the development of a simple mock file system that should behave identically to a real file system. Although simple, the example will be large enough to give us an opportunity to discuss how we can verify that our generator is producing all test cases we need, and how we can inspect whether the shrinker is doing a good job; in both cases, test case labelling will turn out to be essential. Throughout we will also discuss design patterns for quickcheck-state-machine tests which improve separation of concerns and reduce duplication. It should probably be pointed out that this is an opinionated piece: there are other ways to set things up than we present here.
We will not show the full development in this blog post, and instead focus on explaining the underlying concepts. If you want to follow along, the code is available for download. We will assume version 0.6 of quickcheck-state-machine, which was recently released. If you are using an older version, it is recommended to upgrade, since the newer version includes some important bug fixes, especially in the shrinker.
Introducing the running example
Our running example will be the development of a simple mock file system; the intention is that its behaviour is identical to the real file system, within the confines of what it needs to support. We will represent the state of the file system as
data Mock = M {
dirs :: Set Dir
, files :: Map File String
, open :: Map MHandle File
, next :: MHandle
}
type MHandle = Int
emptyMock :: Mock
emptyMock = M (Set.singleton (Dir [])) Map.empty Map.empty 0
We record which directories (folders) exist, the contents of all files on the system, the currently open handles (where mock handles are just integers), and the next available mock handle. To avoid confusion between files and directories we do not use FilePath but instead use
data Dir = Dir [String]
data File = File {dir :: Dir, name :: String}
As one example, here is the mock equivalent of readFile:
type MockOp a = Mock -> (Either Err a, Mock)
mRead :: File -> MockOp String
mRead f m@(M _ fs hs _)
| alreadyOpen = (Left Busy , m)
| Just s <- Map.lookup f fs = (Right s , m)
| otherwise = (Left DoesNotExist , m)
where
alreadyOpen = f `List.elem` Map.elems hs
We first check if there is an open handle to the file; if so, we disallow reading this file (“resource busy”); if the file exists, we return its content; otherwise we report a “does not exist” error. The implementation of the other mock functions is similar; the full API is
mMkDir :: Dir -> MockOp ()
mOpen :: File -> MockOp MHandle
mWrite :: MHandle -> String -> MockOp ()
mClose :: MHandle -> MockOp ()
mRead :: File -> MockOp String
Finally, we should briefly talk about errors; the errors that the mock file system can report are given by
data Err = AlreadyExists | DoesNotExist | HandleClosed | Busy
and they capture a subset of the IO exceptions
fromIOError :: IOError -> Maybe Err
fromIOError e =
case ioeGetErrorType e of
GHC.AlreadyExists -> Just AlreadyExists
GHC.NoSuchThing -> Just DoesNotExist
GHC.ResourceBusy -> Just Busy
GHC.IllegalOperation -> Just HandleClosed
_otherwise -> Nothing
Testing
Typically we are developing some stateful code, and then write a pure (mock) implementation of the same thing to test it, making sure that the stateful implementation and the simpler pure model compute the same things. Here we are doing the opposite: we are adjusting the model (the mock file system) to match what the real file system does. Either way, the process is the same: we write tests, execute them both in the real thing and in the model, and compare results.
If we were writing unit tests, we might write tests such as
- Write to two different files
- Write to a file and then read it
- etc.
However, as John Hughes of QuviQ – and one of the original authors of QuickCheck – likes to point out, as systems grow, it becomes impossible to write unit tests that test the interaction between all the features of the system. So, don’t write unit tests. Instead, generate tests, and verify properties.
To generate tests for our mock file system, we have to generate sequences of calls into the API; “open this file, open that file, write to the first file we opened, …”. We then execute this sequence both against the mock file system and against the real thing, and compare results. If something goes wrong, we end up with a test case that we can inspect. Ideally, we should then try to reduce this test to try and construct a minimal test case that illustrates the bug. We have to be careful when shrinking: for example, when we remove a call to open from a test case, then any subsequent writes that used that file handle must also be removed. A library such as quickcheck-state-machine
can be used both to help with generating such sequences and, importantly, with shrinking them.
Reifying the API
It is important that we generate the test before executing it. In other words, the test generation should not depend on any values that we only get when we run the test. Such a dependency makes it impossible to re-run the same test multiple times (no reproducible test cases) or shrink tests to obtain minimal examples. In order to do this, we need to reify the API: we need to define a data type whose constructors correspond to the API calls:
data Cmd h =
MkDir Dir
| Open File
| Write h String
| Close h
| Read File
Cmd
is polymorphic in the type of handles h; this is important, because we should be able to execute commands both against the mock file system and against the real file system:
runMock :: Cmd MHandle -> Mock -> (.. Mock)
runIO :: .. -> Cmd IO.Handle -> IO ..
What should the return type of these functions be? After all, different functions return different things: Open returns a new handle, Read returns a string, the other functions return unit. To solve this problem we will simply introduce a union type
data Success h = Unit () | Handle h | String String
A response is then either a succesful response or an error:
newtype Resp h = Resp (Either Err (Success h))
It is now easy to implement runMock: we just map all the constructors in Cmd to the corresponding API calls, and wrap the result in the appropriate constructor of Success:
runMock :: Cmd MHandle -> Mock -> (Resp MHandle, Mock)
runMock (MkDir d) = first (Resp . fmap Unit) . mMkDir d
runMock (Open f) = first (Resp . fmap Handle) . mOpen f
runMock (Write h s) = first (Resp . fmap Unit) . mWrite h s
runMock (Close h) = first (Resp . fmap Unit) . mClose h
runMock (Read f) = first (Resp . fmap String) . mRead f
where first :: (a -> b) -> (a, x) -> (b, x)
comes from Data.Bifunctor
.
We can write a similar interpreter for IO; it will take a FilePath as an additional argument that it will use as a prefix for all paths; we will use this to run the IO test in some temporary directory.
runIO :: FilePath -> Cmd IO.Handle -> IO (Resp IO.Handle)
References
Our interpreter for IO takes real IO handles as argument; but will not have any real handles until we actually run the test. We need a way to generate commands that run in IO but don’t use real handles (yet). Here is where we see the first bit of infrastructure provided by quickcheck-state-machine
, references:
data Reference a r = Reference (r a)
where we will instantiate that r
parameter to either Symbolic
or Concrete
:
data Symbolic a = Symbolic Var
data Concrete a = Concrete a
In other words, a Reference a Concrete
is really just a wrapper around an a
; indeed, quickcheck-state-machine
provides
reference :: a -> Reference a Concrete
concrete :: Reference a Concrete -> a
However, a Reference a Symbolic
is a variable:
newtype Var = Var Int
An example of a program using symbolic references is
openThenWrite :: [Cmd (Reference IO.Handle Symbolic)]
openThenWrite = [
Open (File (Dir []) "a")
, Open (File (Dir []) "b")
, Write (Reference (Symbolic (Var 0))) "Hi"
]
This program corresponds precisely to our example from earlier: “open this file, open that file, then write to the first file we opened”. Commands can return as many symbolic references in their result values as they want
Open
creates a new reference, and so Var 0
returns to the handle returned by the first call to Open
.When we execute the test, those variables will be instantiated to their real values, turning symbolic references into concrete references. We will of course not write programs with symbolic references in them by hand; as we will see later, quickcheck-state-machine
provides infrastructure for doing so.
Since we will frequently need to instantiate Cmd
and Resp
with references to handles, we will introduce some special syntax for this:
newtype At f r = At (f (Reference IO.Handle r))
type f :@ r = At f r
For example, here is a wrapper around runIO
that we will need that executes a command with concrete references:
semantics :: FilePath -> Cmd :@ Concrete -> IO (Resp :@ Concrete)
semantics root (At c) = (At . fmap reference) <$>
runIO root (concrete <$> c)
This is really just a call to runIO
, with some type wrapping and unwrapping.
Relating the two implementations
When we run our tests, we will execute the same set of commands against the mock implementation and in real IO, and compare the responses we get after each command. In order to compare, say, a command “write to this MHandle
” against the mock file system to a command “write to this IOHandle
” in IO, we need to know the relation between the mock handles and the IO handles. As it turns out, the most convenient way to store this mapping is as a mapping from references to the IO handles (either concrete or symbolic) to the corresponding mock handles.
type HandleRefs r = [(Reference IO.Handle r, MHandle)]
(!) :: Eq k => [(k, a)] -> k -> a
env ! r = fromJust (lookup r env)
Then to compare the responses from the mock file system to the responses from IO we need to keep track of the state of the mock file system and this mapping; we will refer to this as the model for our test:
data Model r = Model Mock (HandleRefs r)
initModel :: Model r
initModel = Model emptyMock []
The model must be polymorphic in r
: during test generation we will instantiate r
to Symbolic
, and during test execution we will instantiate r
to Concrete
.
Stepping the model
We want to work towards a function
transition :: Eq1 r => Model r -> Cmd :@ r -> Resp :@ r -> Model r
to step the model; we will gradually build up towards this. First, we can use the model to translate from commands or responses in terms of references to the corresponding commands or responses against the mock file system:
toMock :: (Functor f, Eq1 r) => Model r -> f :@ r -> f MHandle
toMock (Model _ hs) (At fr) = (hs !) <$> fr
Specifically, this can be instantiated to
toMock :: Eq1 r => Model r -> Cmd :@ r -> Cmd MHandle
which means that if we have a command in terms of references, we can translate that command to the corresponding command for the mock file system and execute it:
step :: Eq1 r => Model r -> Cmd :@ r -> (Resp MHandle, Mock)
step m@(Model mock _) c = runMock (toMock m c) mock
In order to construct the full new model however we also need to know how to extend the handle mapping. We can compute this by comparing the response we get from the “real” semantics (Resp :@ r
) to the response we get from the mock semantics (from step
), and simply zip the handles from both responses together to obtain the new mapping. We wrap all this up into an event:
data Event r = Event {
before :: Model r
, cmd :: Cmd :@ r
, after :: Model r
, mockResp :: Resp MHandle
}
and we construct an event from a model, the command we executed, and the response we got from the real implementation:
lockstep :: Eq1 r
=> Model r
-> Cmd :@ r
-> Resp :@ r
-> Event r
lockstep m@(Model _ hs) c (At resp) = Event {
before = m
, cmd = c
, after = Model mock' (hs <> hs')
, mockResp = resp'
}
where
(resp', mock') = step m c
hs' = zip (toList resp) (toList resp')
The function we mentioned at the start of this section is now easily derived:
transition :: Eq1 r => Model r -> Cmd :@ r -> Resp :@ r -> Model r
transition m c = after . lockstep m c
as well as a function that compares the mock response and the response from the real file system and checks that they are the same:
postcondition :: Model Concrete
-> Cmd :@ Concrete
-> Resp :@ Concrete
-> Logic
postcondition m c r = toMock (after e) r .== mockResp e
where
e = lockstep m c r
We will pass this function to quickcheck-state-machine
to be run after every command it executes to make sure that the model and the real system do indeed return the same responses; it therefore does not need to be polymorphic in r
. (Logic
is a type introduced by quickcheck-state-machine
; think of it as a boolean with some additional information, somewhat similar to QuickCheck’s Property
type.)
Events will also be very useful when we label our tests; more on that later.
Constructing symbolic responses
We mentioned above that we will not write programs with symbolic references in it by hand. Instead what will happen is that we execute commands in the mock file system, and then replace any of the generated handles by new variables. Most of this happens behind the scenes by quickcheck-state-machine
, but we do need to give it this function to construct symbolic responses:
symbolicResp :: Model Symbolic
-> Cmd :@ Symbolic
-> GenSym (Resp :@ Symbolic)
symbolicResp m c = At <$> traverse (const genSym) resp
where
(resp, _mock') = step m c
This function does what we just described: we use step to execute the command in the mock model, and then traverse the response, constructing a new (fresh) variable for each handle. GenSym
is a monad defined in quickcheck-state-machine
for the sole purpose of generating variables; we won’t use it anywhere else except in this function.
Generating commands
To generate commands, quickcheck-state-machine
requires a function that produces the next command given the current model; this function will be a standard QuickCheck
generator. For our running example, the generator is easy to write:
generator :: Model Symbolic -> Maybe (Gen (Cmd :@ Symbolic))
generator (Model _ hs) = Just $ QC.oneof $ concat [
withoutHandle
, if null hs then [] else withHandle
]
where
withoutHandle :: [Gen (Cmd :@ Symbolic)]
withoutHandle = [
fmap At $ MkDir <$> genDir
, fmap At $ Open <$> genFile
, fmap At $ Read <$> genFile
]
withHandle :: [Gen (Cmd :@ Symbolic)]
withHandle = [
fmap At $ Write <$> genHandle <*> genString
, fmap At $ Close <$> genHandle
]
genDir :: Gen Dir
genDir = do
n <- QC.choose (0, 3)
Dir <$> replicateM n (QC.elements ["x", "y", "z"])
genFile :: Gen File
genFile = File <$> genDir <*> QC.elements ["a", "b", "c"]
genString :: Gen String
genString = QC.sized $ \n -> replicateM n (QC.elements "ABC")
genHandle :: Gen (Reference IO.Handle Symbolic)
genHandle = QC.elements (map fst hs)
A few comments on the generator:
- When we generate paths, we choose from a very small set of directory and file names. We are not really interested in testing that, for example, our implementation is Unicode-safe here; by picking from a small known set we generate tests that are more likely to contain multiple references to the same file, without having to add special provisions to do so.
- We cannot, of course, generate handles out of nowhere; fortunately, the model tells us which handles we have available, and so
genHandle
just picks one at random from that. We do not limit the generator to picking open handles: it is important to also test the behaviour of the system in the error case where a program attempts to write to a closed handle. - The elements function from
QuickCheck
, which picks a random element from a list, is partial: it will throw an error if the list is empty. We must be careful not to generate any commands requiring handles when we have no handles yet. - The reason for the
Maybe
in the type signature is that in some circumstances a generator might be unable to generate any commands at all given a particular model state. We don’t need to take advantage of this feature and therefore always return Just a generator.
We can also define a shrinker for commands, but we can start with a shrinker that says that individual commands cannot be shrunk:
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ _ = []
Putting it all together
In order to actually start generating and running tests, quickcheck-state-machine wants us to bundle all of this functionality up in a single record:
sm :: FilePath -> StateMachine Model (At Cmd) IO (At Resp)
sm root = StateMachine {
initModel = initModel
, transition = transition
, precondition = precondition
, postcondition = postcondition
, invariant = Nothing
, generator = generator
, distribution = Nothing
, shrinker = shrinker
, semantics = semantics root
, mock = symbolicResp
}
We have defined all of these functions above, with the exception of precondition
. When quickcheck-state-machine
finds a failing test, it will try to shrink it to produce a smaller test case. It does this by trying to remove commands from the program and then checking if the resulting program still “makes sense”: does the precondition of all commands still hold?
The precondition should be as liberal as possible; for instance, the precondition should not require that a file handle is open before writing to it, because we should also check that errors are handled correctly. Typically, the only thing the precondition should verify is that commands do not refer to non-existent handles (i.e., if we remove a call to open, then subsequent uses of the handle returned by that call to open simply cannot be executed anymore). Thus, we will define:
precondition :: Model Symbolic -> Cmd :@ Symbolic -> Logic
precondition (Model _ hs) (At c) =
forall (toList c) (`elem` map fst hs)
Aside. Although we do not need it for our running example, one other thing a precondition might do is rule out test cases that we explicitly don’t want to test. For example, if our mock file system throws an error in some cases because a particular combination of arguments to a call is simply not supported, then we don’t want to flag this as a bug. A clean way to implement this is to extend the error type with a field that marks an error as an intentional limitation; then the precondition can simply rule out any commands that (in the model) would return an error with this mark. This keeps the definition of the precondition itself simple, and the logic of what is and what isn’t an intentional limitation lives in the implementation of the model itself.
Running the tests
Apart from some additional type class instances (see the code), we are now ready to define and execute the actual test:
prop_sequential :: FilePath -> QC.Property
prop_sequential tmpDir =
forAllCommands (sm rootUnused) Nothing $ \cmds ->
QC.monadicIO $ do
tstTmpDir <- liftIO $ createTempDirectory tmpDir "QSM"
let sm' = sm tstTmpDir
(hist, _model, res) <- runCommands sm' cmds
prettyCommands sm' hist
$ checkCommandNames cmds
$ res QC.=== Ok
rootUnused :: FilePath
rootUnused = error "root not used during command generation"
All functions prefixed QC
are standard QuickCheck
functions. The others come from quickcheck-state-machine
:
forAllCommands
usesQuickCheck
’sforAllShrink
and instantiates it with the command generation and shrinking infrastructure fromquickcheck-state-machine
.runCommands
then executes the generated commands, validating the postcondition at every step.prettyCommands
renders those commands in case of a test failure.checkCommandNames
adds some statistics about the distribution of commands in the generated tests.
We can run the test in ghci
:
> quickCheck (prop_sequential "./tmp")
+++ OK, passed 100 tests:
56% coverage
3% [("MkDir",1),("Read",1)]
3% []
2% [("MkDir",1),("Open",1),("Read",1)]
2% [("MkDir",1)]
1% [("Close",1),("MkDir",1),("Open",5),("Read",4),("Write",2)]
...
It tells us that all tests passed, and gives us some statistics about the tests that were executed: in 3% of the cases, they contained a single MkDir
and a single Read
, 3% were completely empty, 2% contained one call to MkDir
, one call to Open
, one call to Read
, and so on.
Labelling
At this point you might conclude that you’re done. We have the real implementation, we have the mock implementation, they return identical results for all tests, what else could we want?
Let’s think back to those unit tests we were on the verge of writing, but stopped just in time because we remembered that we should generate unit tests, not write them:
- Write to two different files
- Write to a file and then read it
How do we know that our generated tests include these two cases (and all the other unit tests that we would have written)? We get some statistics from quickcheck-state-machine
, but it’s not terribly helpful. For example, the first line above tells us that 3% of our test cases contain one call to MkDir
and one call to Read
; but we know that that call to Read
must fail, because if these are the only two commands we execute, there aren’t any files for that Read
to read.
The solution is to label tests. For example, we might introduce labels, or tags, that correspond to the two unit tests above:
data Tag = OpenTwo | SuccessfulRead
We then need to write a function that looks at a particular test case and checks which labels apply. It is important to realize that this does not mean we are bringing back the same unit tests under a different guise: programs that will be labelled OpenTwo
must write to two different files, but may also do a whole bunch of other things.
We can use use the foldl
package to write such a labelling function in a convenient manner. The labelling function will look at the Events
and produce Tags
. To check if we open (at least) two different files, we keep track of all the successful calls to open; for SucessfulRead
we simply remember if we have seen a call to Read
with a non-error result:
tag :: [Event Symbolic] -> [Tag]
tag = Foldl.fold $ catMaybes <$> sequenceA [
openTwo
, successfulRead
]
where
openTwo :: Fold (Event Symbolic) (Maybe Tag)
openTwo = Fold update Set.empty extract
where
update :: Set File -> Event Symbolic -> Set File
update opened ev =
case (cmd ev, mockResp ev) of
(At (Open f), Resp (Right _)) ->
Set.insert f opened
_otherwise ->
opened
extract :: Set File -> Maybe Tag
extract opened = do
guard (Set.size opened >= 2)
return $ OpenTwo
successfulRead :: Fold (Event Symbolic) (Maybe Tag)
successfulRead = Fold update False extract
where
update :: Bool -> Event Symbolic -> Bool
update didRead ev = didRead ||
case (cmd ev, mockResp ev) of
(At (Read _), Resp (Right _)) ->
True
_otherwise ->
False
extract :: Bool -> Maybe Tag
extract didRead = do
guard didRead
return SuccessfulRead
(For a read to be successful, we must have created the file first – this is a property of the semantics, we don’t need to enforce this in the labeller.)
The commands we get back from the forAllCommands
function of quickcheck-state-machine
are of type Commands
. This is a simple wrapper around a list of Command
; Command
in turn bundles a command (Cmd
) along with its response (and the variables in that response). We can therefore easily turn this into a list of events:
execCmd :: Model Symbolic
-> Command (At Cmd) (At Resp)
-> Event Symbolic
execCmd model (Command cmd resp _vars) =
lockstep model cmd resp
execCmds :: Commands (At Cmd) (At Resp) -> [Event Symbolic]
execCmds = \(Commands cs) -> go initModel cs
where
go :: Model Symbolic
-> [Command (At Cmd) (At Resp)]
-> [Event Symbolic]
go _ [] = []
go m (c : cs) = e : go (after e) cs
where
e = execCmd m c
We can then define a variant on prop_sequential
above that replaces checkCommandNames
with our own labelling function (you could also use both, if you wanted to):
prop_sequential' :: FilePath -> QC.Property
prop_sequential' tmpDir =
forAllCommands (sm rootUnused) Nothing $ \cmds ->
QC.monadicIO $ do
tstTmpDir <- liftIO $ createTempDirectory tmpDir "QSM"
let sm' = sm tstTmpDir
(hist, _model, res) <- runCommands sm' cmds
prettyCommands sm' hist
$ QC.tabulate "Tags" (map show $ tag (execCmds cmds))
$ res QC.=== Ok
Here tabulate
is the standard QuickCheck
function for adding multiple labels to a test case. If we now re-run our tests, we get
> quickCheck (prop_sequential' "./tmp")
+++ OK, passed 100 tests.
Tags (57 in total):
63% OpenTwo
37% SuccessfulRead
The numbers here are a bit awkward to interpret: it says that across all tests a total of 57 labels were found, and of those 57 labels, 63% were OpenTwo
and 37% were SuccessfulRead
. In other words, 63% 57 = 36 tags were OpenTwo
(36 tests out of 100 were labelled as OpenTwo
), and 37% 57 = 21 tags were SuccessfulRead
(21 tests out of 100 were labelled as SuccessfulRead
). Note that it is perfectly possible for these two sets of tests to overlap (i.e., a single program can be tagged as both OpenTwo
and SuccessfulRead
).
Inspecting the labelling function
So we have a generator, and we have labels for the unit tests that we didn’t write; have we covered our bases now? Well, how do we know that our labelling function is correct? We might seem to descent into infinite regress here, testing the tests that tests the tests… but it would be good to at least have a look at some test case examples and how they are labelled. Fortunately, this is functionality that QuickCheck
provides out of the box through a function called labelledExamplesWith
. We can define a simple wrapper around it that gives us the possibility to specify a particular seed (which will be useful once we start working on shrinking):
showLabelledExamples :: Maybe Int -> IO ()
showLabelledExamples mReplay = do
replaySeed <- case mReplay of
Nothing -> getStdRandom $ randomR (1, 999999)
Just seed -> return seed
putStrLn $ "Using replaySeed " ++ show replaySeed
let args = QC.stdArgs {
QC.maxSuccess = 10000
, QC.replay = Just (QC.mkQCGen replaySeed, 0)
}
QC.labelledExamplesWith args $
forAllCommands (sm rootUnused) Nothing $ \cmds ->
repeatedly QC.collect (tag . execCmds $ cmds) $
QC.property True
Instead of tabulate
we must use collect
to label examples when constructing examples; collect
takes a single label at a time though, so we use my all-time favourite combinator to repeatedly call collect
for all tags:
repeatedly :: (a -> b -> b) -> ([a] -> b -> b)
repeatedly = flip . List.foldl' . flip
(I wonder if I can patent this function?)
When we run this, we see something like
> showLabelledExamples Nothing
Using replaySeed 288187
*** Found example of OpenTwo
Commands [
.. Open (File (Dir []) "b")
, .. Open (File (Dir []) "a")
]
*** Found example of SuccessfulRead
Commands [
.. Open (File (Dir []) "b")
, .. Close (Reference (Symbolic (Var 0)))
, .. Read (File (Dir []) "b")
]
(I’ve tidied up the output a bit and removed some redundant information.)
This is looking good: both of these look like the kind of examples we would have written ourselves for these tags.
Standard shrinking
In fact, if we look at those test cases at the end of the previous section, you might be thinking that those examples look surprisingly good: not only are they indeed instances of those tags, but they are very small test cases: they are pretty much the unit tests that we would have written if we were so misguided as to think we need to write unit tests. Were we simply lucky?
No, we were not. QuickCheck
’s labelledExamples
not only searches for labelled test cases, it also tries to shrink them when it finds one, and continues to shrink them until it can shrink them no further without losing the label. The shrinker itself is implemented in quickcheck-state-machine
; if we disable it altogether and re-run the search for labelled examples, we might find an example such as the following for SuccessfulRead
, where for clarity I’ve marked all failing commands with an asterisk (*):
Commands [
.. Open (File (Dir ["z", "x", "y"]) "c") (*)
, .. MkDir (Dir ["z", "x") (*)
, .. Read (File (Dir ["z", "y", "y"]) "c") (*)
, .. Read (File (Dir []) "c") (*)
, .. MkDir (Dir [ "x" , "x" ]) (*)
, .. Read (File (Dir ["x", "z", "x"]) "b") (*)
, .. MkDir (Dir ["y"])
, .. MkDir (Dir []) (*)
, .. Open (File (Dir ["z"]) "b") (*)
, .. Open (File (Dir []) "a")
, .. MkDir (Dir []) (*)
, .. Open (File (Dir ["x"]) "b") (*)
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir ["x", "y"]) "b") (*)
, .. Open (File (Dir []) "b")
, .. MkDir (Dir ["y"]) (*)
, .. Read (File (Dir []) "a")
, .. MkDir (Dir ["z"])
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 1)))
, .. Open (File (Dir ["z" , "z" , "y"]) "a") (*)
, .. Open (File (Dir ["x" , "x" , "z"]) "c") (*)
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir ["y", "y"]) "b") (*)
, .. Read (File (Dir ["x", "y", "x"]) "a") (*)
]
This is looking significantly less ideal! If there was a bug in read
, then this would certainly not be a very good minimal test case, and not something you would want to debug. So how does quickcheck-state-machine
shrink tests? The basic approach is quite simple: it simply removes commands from the program. If the resulting program contains commands whose precondition is not satisfied (remember, for our running example this means that those commands would use handles that are no longer created) then it discards it as a possible shrink candidate; otherwise, it reruns the test, and if it still fails, repeats the process.
The large percentage of commands that are unsuccessful can easily be removed by quickcheck-state-machine
:
Commands [
.. MkDir (Dir ["y"])
, .. Open (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir []) "b")
, .. Read (File (Dir []) "a")
, .. MkDir (Dir ["z"])
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 1)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
]
Both calls to MkDir
can easily be removed, and the resulting program would still be tagged as SuccessfulRead
; and the same is true for repeated closing of the same handle:
Commands [
.. Open (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir []) "b")
, .. Read (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 1)))
]
At this point the shrinker cannot remove the second call to Open
because the second call to Close
depends on it, but it can first remove that second call to Close
and then remove that second call to Open
, and we end up with the minimal test case that we saw in the previous section:
Commands [
.. Open (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 0)))
, .. Read (File (Dir []) "a")
]
Perfect.
Improving shrinking
Unfortunately, this does not mean that we can depend on quickcheck-state-machine
to solve all our shrinking problems. Consider this run of showLabelledExamples
:
> showLabelledExamples (Just 166205)
Using replaySeed 166205
*** Found example of OpenTwo
Commands [
.. MkDir (Dir ["x"])
, .. Open (File (Dir []) "c")
, .. Open (File (Dir ["x"]) "a")
]
This is indeed an example of a program in which we open at least two files; however, why is that call to MkDir
still there? After all, if there was a bug in opening more than one file, and the “minimal” test case would include a call to MkDir
, that would be a red herring which might send the person debugging the problem down the wrong path.
The reason that quickcheck-state-machine
did not remove the call to MkDir
, of course, is that without it the second call to Open
would fail: it tries to create a file in directory x
, and if that directory does not exist, it would fail. To fix this, we need to tell quickcheck-state-machine
how to shrink individual commands; so far we have been using
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ _ = []
which says that individual commands cannot be shrunk at all. So how might we shrink an Open
call? One idea might be to shrink the directory, replacing for instance /x/y/a
by /x/a
or indeed just /a
. We can implement this using
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ (At cmd) =
case cmd of
Open (File (Dir d) f) ->
[At $ Open (File (Dir d') f) | d' <- QC.shrink d]
_otherwise ->
[]
If we use this shrinker and run showLabelledExamples
a number of times, we will find that all the examples of OpenTwo
are now indeed minimal… until it finds an example that isn’t:
> showLabelledExamples (Just 980402)
Using replaySeed 980402
*** Found example of OpenTwo
Commands [
.. MkDir (Dir ["x"]))
, .. Open (File (Dir []) "a")
, .. Open (File (Dir ["x"]) "a")
]
In this example we cannot shrink the directory to [] because the resulting program would try to open the same file twice, which is not allowed (“resource busy”). We need a better way to shrink this program.
What we want to implement is “try to replace the file path by some file in the (local) root. It is important to realize however that a shrinker such as
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ (At cmd) =
case cmd of
Open _ -> -- BAD EXAMPLE
[At $ Open (File (Dir []) f') | f' <- ["t1", "t2"]]
_otherwise ->
[]
is a bad idea. This shrinker tries to replace any file path with either /t1
or /t2
. However, this means that shrinking now never stops:
Open (File (Dir [] "t1"))
can be shrunk to
Open (File (Dir [] "t2"))
which can then be shrunk back to
Open (File (Dir [] "t1"))
and QuickCheck
will loop when trying to shrink the test case. It is important that there is a clear direction to shrinking.
An approach that does work is the following: any file path that doesn’t start with a t
can be replaced by path /t100
; moreover, any /tN
(for some number N) can be replaced by /tN'
for some number N’ < N:
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ (At cmd) =
case cmd of
Open (File (Dir []) ('t' : n)) ->
[openTemp n' | n' <- QC.shrink (read n)]
Open _ ->
[openTemp 100]
_otherwise ->
[]
where
openTemp :: Int -> Cmd :@ Symbolic
openTemp n = At $ Open (File (Dir []) ('t' : show n))
Now
Commands [
.. MkDir (Dir ["x"]))
, .. Open (File (Dir []) "a")
, .. Open (File (Dir ["x"]) "a")
]
can shrink to
Commands [
.. MkDir (Dir ["x"]))
, .. Open (File (Dir []) "a")
, .. Open (File (Dir []) "t100")
]
at which point the call to MkDir
can be removed; this will eventually shrink down to
Commands [
.. Open (File (Dir []) "t0")
, .. Open (File (Dir []) "t1")
]
Dependencies between commands
It’s been a long road, but we are almost there. The last thing we need to discuss is how to shrink programs with dependencies. The “open at least two” example above was relatively easy to shrink because we could shrink one command at the time. Sometimes however there are dependencies between commands. For example, consider this “minimal” example of “successful read”:
> showLabelledExamples (Just 617213)
Using replaySeed 617213
*** Found example of SuccessfulRead
Commands [
.. MkDir (Dir [ "z" ])
, .. Open (File (Dir ["z"]) "b")
, .. Close (Reference (Symbolic (Var 0)))
, .. Read (File (Dir ["z"]) "b")
]
As before, we have a call to MkDir
which should ideally be removed. However, if we tried to change the path in the call to Open
, the call to Read
would fail: both of these commands should refer to the same path. But shrinking can only change one command at a time, and this is important to keep the computational complexity (runtime cost) of shrinking down. What to do?
The problem is that we want that call to Read
to use the same path as the call to Open
, but we have no way to express this. The solution is to make this expressible. After all, we can already express “the handle returned by that call to Open
”; all we need to do is introduce a second kind of reference: a reference to a path.
The language of commands changes to
data Cmd fp h = Read (Expr fp) | -- rest as before
Cmd
gets an additional type argument fp
to record the types of paths, and instead of a File
, Read
now takes an Expr
as argument:
data Expr fp = Val File | Var fp
We can of course still use a concrete path, as before, but we can also use a variable: “use the same path as used in that call to open”. This means that Open
must return that reference, so Success
and Resp
get adjusted accordingly:
data Success fp h = Handle fp h | -- rest as before
newtype Resp fp h = Resp (Either Err (Success fp h))
Just like was the case for handles, when we actually run code all variables have been resolved, so the interpreter isn’t any more difficult:
runMock :: Cmd File MHandle -> Mock -> (Resp File MHandle, Mock)
runMock (Open f) = first (Resp . fmap (Handle f)) . mOpen f
runMock (Read e) = first (Resp . fmap String) . mRead (eval e)
-- rest as before
eval :: Expr File -> File
eval (Val f) = f
eval (Var f) = f
The IO interpreter is modified similarly. Most of the rest of the changes to the code are minor and mostly automatic. For example, At
must now instantiate both parameters
newtype At f r = At (f (Reference File r) (Reference IO.Handle r))
the model must record the mapping from file references now too
type FileRefs r = [(Reference File r, File)]
data Model r = Model Mock (FileRefs r) (HandleRefs r)
See Version2.hs
in the example package for details.
Crucially, we can now take advantage of this in the shrinker: when we see a call to Read
with a file path that we have seen before, we can shrink that to use a variable instead:
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker (Model _ fs _) (At cmd) =
case cmd of
Read (Val f) ->
[At $ Read (Var r) | r <- mapMaybe (matches f) fs]
-- other cases as before
where
matches :: File -> (r, File) -> Maybe r
matches f (r, f') | f == f' = Just r
| otherwise = Nothing
This means that the problematic example
> showLabelledExamples (Just 617213)
Using replaySeed 617213
*** Found example of SuccessfulRead
Commands [
.. MkDir (Dir [ "z" ])
, .. Open (File (Dir ["z"]) "b")
, .. Close (Reference (Symbolic (Var 1)))
, .. Read (Val (File (Dir ["z"]) "b"))
]
can now shrink to
Commands [
.. MkDir (Dir [ "z" ])
, .. Open (File (Dir ["z"]) "b")
, .. Close (Reference (Symbolic (Var 1)))
, .. Read (Var 0)
]
At this point the shrinking for Open
that we defined above can kick in, replacing z/b
with /t100
, making the call to MkDir
redundant, and the example can ultimately shrink to
Commands [
.. Open (File (Dir []) "t0")
, .. Close (Reference (Symbolic (Var 1)))
, .. Read (Var 0)
]
Conclusions
Writing unit tests by hand is problematic for at least two reasons. First, as the system grows in complexity the number of interactions between the various features will be impossible to capture in hand written unit tests. Second, unit tests can only exercise the system in ways that the author of the unit tests foresees; often, bugs arise when the system is exercised in ways that were not foreseen. It is therefore important not to write unit tests, but rather generate them. We have seen how a library such as quickcheck-state-machine
can assist in generating and shrinking meaningful sequences of API calls.
We have also seen why it is important to label test cases, how to inspect the labelling function, and how to use labelling to improve shrinking. Without writing a shrinker for individual commands, the standard quickcheck-state-machine
shrinker already does a really good job at removing redundant commands. However, if we are serious about producing minimal test cases without red herrings that might lead debugging efforts astray (and we should be serious about that), then we also need to put some thought into shrinking individual commands.
Finally, we have seen how we can improve shrinking by making dependencies between commands explicit. This also serves as an example of a language with multiple types of references; the approach we put forth in this blog post essentially scales to an arbitrary number of types of references without much difficulty.
We have not talked about running parallel tests at all in this blog post. This is an interesting topic in its own right, but affects only the very outermost level of the test: prop_sequential
would get an analogue prop_parallel
, and nothing else would be affected; the quickcheck-state-machine
documentation shows how to do this; the original paper describing the approach (in Erlang) by John Hughes and others is also well worth a read. Finally, quickcheck-state-machine
is not the only library providing this functionality in Haskell; in particular, hedgehog, an alternative to QuickCheck
, does also.
The way we set up the tests in this blog post is not the only way possible, but is one that we believe leads to a clean design. The running example in this blog post is a simplified version of a mock file system that we use in the new Ouroboros consensus layer, where we use it to simulate file system errors when testing a blockchain database. The tests for that blockchain database in turn also use the design patterns laid out in this blog post, and we have used those design patterns also in a test we added to quickcheck-state-machine
itself to test the shrinker. Of course, being Haskellers, we would prefer to turn design patterns into actual code; indeed, we believe this to be possible, but it may require some more advanced type system features. This is something we want to investigate further.
Footnotes
1. Mapping from `IllegalOperation` to `HandleClosed` is bit too coarse, but suffices for the sake of this blog post.↩
2. This has a somewhat unpleasant untyped feel to it. However, if the design patterns proposed in this blog post are used, this barely affects any code that we write: we never (or almost never) have to pattern match on that `Success` type.↩
3. I’m eliding a `Typeable` constraint here.↩
4. Technically, `quickcheck-state-machine` uses a function `Resp :@ Symbolic -> [Var]` to find all the references “bound” by a command.↩
5. The `Close` call now refers to `Var 1` instead of `Var 0`, because `Open` now creates _two_ variables: the reference to the path and the reference to the handle.↩
Recent posts
2021: the year robots, and graffiti came to a decentralized, smarter Cardano by Anthony Quinn
27 December 2021
Cardano education in 2021: the year of the pioneers by Niamh Ahern
23 December 2021
Cardano at Christmas (and what to say if anyone asks…) by Fernando Sanchez
21 December 2021