Improving Haskell’s big numbers support
Work by IOHK engineers is part of the latest release of the Glasgow compiler
28 July 2020 9 mins read
Haskell is vital to IOHK’s work in ensuring Cardano is a secure blockchain for the future of decentralized finance. As part of this, we use the language to develop the Plutus smart contract platform, and you may have read about the training courses run by our engineers, including a Haskell course in Mongolia this month. Smart contract applications are a powerful way for a distributed network to generate value, allowing individuals and businesses to agree to conditions and automatically execute exchanges of information and wealth, without relying on third parties. Plutus contracts contain a substantial amount of code that is run off the chain, on users’ computers. To make it easy to create portable executables for these, we want to compile the off-chain Haskell code into JavaScript or WebAssembly. To reach that goal, we take part in the development of the Glasgow Haskell Compiler (GHC), GHCJS (Haskell to JavaScript compiler) and Asterius (Haskell to WebAssembly compiler).
Recently we have been working on improving GHC’s support for big numbers, ie, numbers larger than 64-bit (both Integer and Natural types in Haskell). We have developed an implementation of big number operations in Haskell that is faster than the previous one (integer-simple
). We have also improved the way GHC deals with the different implementations to make it more robust and evolutive. These contributions are part of the latest GHC release, version 9.0.
Background
Every Haskell compiler has to support arbitrary-precision integers (see Haskell 98 and Haskell 2010 reports, section 6.4). GHC is no exception and as such it provides the ubiquitous Integer and Natural types (‘big numbers’ for the rest of this post).
Until now, GHC could use one of two packages for this support:
integer-gmp
: use GNU MP library (GMP) (via FFI), LGPL. Good performance.integer-simple
: Haskell implementation, BSD3. Bad performance.
Choosing one or the other depended on license, performance and cross-compilation considerations. In some cases, GMP’s LGPL license can be problematic, especially if you use static linking, which is required on some platforms, including Windows. When it comes to performance: integer-simple
is sometimes several orders of magnitude slower than integer-gmp
, as discussed below. And with cross-compilation, some target platforms may not support GMP, such as JavaScript.
The situation was already unfortunate but there were additional issues:
- Each implementation had its own way of representing big numbers (array of unboxed words or list of boxed words). GHC was aware of the selected implementation and produced different code for each. It could lead to bugs – even ‘insanity’! – when big numbers were exchanged between processes compiled with different implementations. Moreover, it made the compiler code more complicated because it had to deal with this discrepancy (eg, when inspecting heap objects at runtime in GHCi).
- Similarly, because of the different internal representations, there are at least 71 packages on Hackage (among them some widely used ones such as bytestring and text) that explicitly depend on
integer-gmp
package or that provide a flag to depend either oninteger-gmp
orinteger-simple
. It is a maintenance burden because each code path could have specific bugs and should be tested on CI.
All this meant that every new big number implementation was a daunting task. First, the interface to implement was very large (Integer and Natural types and all their operations). Then, we needed to ensure that GHC’s rewrite rules (constant folding) still worked for the implementation, the packages mentioned above needed to be fixed, and new Cabal flags added (by the way, Cabal flags are Boolean so they can’t be easily extended to support more than two options). Finally, GHC’s build system needed to be modified. No wonder it never happened.
Fortunately, most of these issues are now fixed in the latest release, GHC 9.0.
The ghc-bignum package
Starting from GHC 9.0, big numbers support is provided by a single package: ghc-bignum
. This provides a Haskell implementation of big numbers (native-backend
) that is faster than integer-simple
’s (performance figures are given below), is also BSD3-licensed, and uses the same representation of big numbers as integer-gmp
.
Now the different big numbers implementations are considered as internal backends of ghc-bignum
library and there should be no observable difference between backends, except for performance. To enforce this, we even have a meta-backend used during tests that performs every operation with two backends (native-backend
and another selected one) and checks that their results are the same.
A pure Haskell implementation can't really expect to beat the performance of heavily optimized libraries such as GMP, hence integer-gmp
has been integrated into ghc-bignum
as a backend (gmp-backend
).
Adding a big numbers backend is now much easier. The interface to implement is minimal and is documented. A new backend doesn’t have to provide the whole implementation up front: operations provided by native-backend
can be used to fill the holes while another backend is being developed. The test framework doesn’t have to be rewritten for each backend and results can be checked automatically against native-backend
with the meta-backend mentioned above. We hope backends using other libraries, such as OpenSSL libcrypto integers or BSDNT, will be developed in the near future.
The ghc-bignum
package also has a third ffi-backend
that doesn’t provide an implementation per se but performs FFI calls for each operation. So ghc-bignum
must be linked with an object providing the implementation or the compiler should replace these calls with platform-specific operations (eg, JavaScript/CLR/JVM big numbers operations) when this backend is selected. It is similar to what the Asterius compiler was doing – replacing GMP FFI calls with JavaScript BigInt calls – but in a cleaner way because GMP isn’t involved any more.
A major advantage of ghc-bignum
is that all the backends use the same representation for big numbers: an array of words stored in little-endian order. This representation is also used by most big numbers libraries. Formerly, integer-simple
was a notorious exception because it used a Haskell list to store the words, partly explaining its poor performance. Now, any package wanting to access the representation of the big numbers just has to depend on ghc-bignum
. Cabal flags and CPP code are no longer required to deal with the different implementations. However, conditional code may be needed during the transition from integer-*
packages to ghc-bignum
.
To make the transition easier, the integer-gmp-1.1
package is still provided but it has been rewritten to depend on ghc-bignum
and to provide some backward-compatible functions and pattern synonyms. Note, however, that some functions that were only available in integer-gmp-1.0.*
(eg, prime number test, extended GCD) have been removed in integer-gmp-1.1
. We expect these very specific functions to be exported by packages such as hgmp
instead. Alternatively, someone could implement Haskell versions of these functions into native-backend
.
GHC code has been simplified and made faster. Big numbers types and constructors are now known to the compiler (‘wired-in’), in the same way as other literals, so GHC doesn’t have to read interface files each time it wants to generate code using them. The unified representation avoids any need for two code paths, which makes the code more robust and easier to test.
Performance
We have compared the performance of native-backend
against latest integer-simple
and integer-gmp
implementations (Figure 1). We measured the time necessary to compute basic operations:
Platform was Linux 5.5.4 on Intel Core i7-9700K CPU running at 3.60GHz.
The three GHC bindists have been built with Hadrian using ‘perf’ flavor.
integer-gmp
and integer-simple
bindists are built from commit 84dd96105.
native-backend
bindist is built from ghc-bignum
branch rebased on commit 9d094111.
Computations have been performed with positive integers of the following sizes: small – 1 word (64-bit); medium – 8 words (512-bit); big – 76 words (4,848-bit).
Figure 1. Native-backend and integer-gmp are faster than integer-simple in almost all cases (note the logarithmic scale)
Figure 1 shows that native-backend
and integer-gmp
are always faster than integer-simple
. The only exceptions are when we add or subtract a small number (1 word) from a big one as these operations are particularly well suited for integer-simple
’s bignum representation (a list) because the tail of the list remains unchanged and is shared between the big numbers before and after the operation. With the other representation, the tail has to be duplicated in memory.
Division with integer-simple
performs so badly that native-backend
is 40 times faster in all the tested cases.
Sometimes, native-backend
is faster than integer-gmp
(eg, addition/subtraction with a small/medium number). GMP library is probably at least as good as native-backend
but the latter avoids FFI calls, which may explain the better performance. Otherwise, native-backend
is still slower than integer-gmp
but these results are expected because it only implements basic algorithms and it doesn’t use vectorised or optimised processor instructions.
When it comes to GHC performance tests, we took our baseline as GHC HEAD compiled with integer-gmp
. We compare the results with ghc-bignum
's gmp-backend
. There are no regressions. Noticeable improvements in memory use are seen in Table 1.
Next, we compared metrics obtained with native-backend
to those obtained with GHC HEAD built with integer-simple
. The new Haskell implementation results in noticeable changes (Table 2). Note that the first four tests were disabled with integer-simple
because they took too long or failed to complete (heap overflow) but they all passed with native-backend
. Also, T10678, the final test in the table, performs a lot of additions of a small number to a big number. As we saw above, this is the only case for which integer-simple
representation is better: in most iterations only the head of the list is modified without duplicating the tail. It is reflected again in this result.
Finally, we compared native-backend
with gmp-backend
: it tells us how far our Haskell implementation is from our best backend. Noticeable changes are reported in Table 3.
Conclusion
We are very proud to have made this contribution to GHC. IOHK and the whole community now benefits from the improved performance and robustness of the compiler. Programs that use the new Haskell implementation of big numbers (native-backend
) should get a performance boost when they switch from integer-simple
.
We are also eager to see what the community will do with this work. In particular, it should now be easier to create backends based on other libraries. There is also a lot of room for optimization in native-backend
. We also encourage everyone to test this new implementation and to report any issue on GHC’s bug tracker.
Cardano’s path to decentralization
Three mini-protocols are vital to the network’s operation
9 July 2020 6 mins read
The next releases of Cardano and the Ouroboros protocol contain changes that guide us towards decentralization and the Shelley era. This Deep Dive post explains how we are approaching this phase. With the release of the Praos algorithm for Shelley, which comes with the staking process, stake pools can be set up so ada owners can delegate their stake. The networking team is focused now on two features that will enable us to run a fully decentralized system. Let me first briefly describe how the networking is designed and engineered and give an overview of where we are at the moment. This post will start at the top of our abstractions and go down the stack. Hopefully, this will be an interesting journey through our design.
Typed protocols
At the very top of the stack is IOHK’s typed-protocols framework, which allows us to design application-level protocols. The top-level goal of protocol design for Ouroboros is to distribute chains of blocks and transactions among participants in the network and that is achieved by three mini-protocols:
- chain-sync is used to efficiently sync a chain of headers;
- block-fetch allows us to pull blocks;
- tx-submission is used to submit transactions.
All three mini-protocols were carefully designed after considering the threats that can arise running a decentralized system. This is very important, because cyber attacks are very common, especially against targets that present strong incentives. There is a range of possible attacks at this level that we need to be able to defend against and one type that we were very careful about is resource-consumption attacks. To defend against such attacks, the protocols allow the consumer side to stay in control of how much data it will receive, and ultimately keep use of its resources (eg, memory, CPU, and open file descriptors) below a certain level.
If you are interested in more details about typed-protocols, we gave talks and ran workshops at Haskell events last year and these were very well received by the engineering community. In particular, see the talk by Duncan Coutts talk at Haskell eXchange and the workshop I ran at Monadic Party.
Role of the multiplexer
TCP/IP protocols form the most ubiquitous protocol suite deployed on the internet. They are also some of the most studied protocols and are available on almost every operating system and computer architecture, so are a good first choice for our purposes. TCP/IP gives us access to a two-way communication channel between servers on the internet. The only high-level requirement of typed-protocols is an ordered delivery of network packets, which is guaranteed by the TCP protocol.
Operating systems limit the number of connections at any one time. For example, Linux, by default, can open 1,024 connections per process, but on macOS the limit is just 256. To avoid excessive use of resources we use a multiplexer. This allows us to combine communication channels into a single one, so we can run all three of our mini-protocols on a single TCP connection. Another way to save resources is to use the bi-directionality of TCP: this means that one can send and receive messages at both ends simultaneously. We haven't used that feature in the Byron Reboot era, but we do want to take advantage of it in the decentralized Shelley era.
The peer-to-peer-governor
We want to use bi-directional connections, running all three mini-protocols in both directions, so we need to have a component that is aware which connections are currently running. When a node connects to a new peer, we can first check if it already has an open connection with that peer, which would be the case if the peer had connected to it already. But this is only one part of connection management that we will need.
Another requirement comes from the peer-to-peer governor. This part of the system is responsible for finding peers, and choosing some of them to connect to. Making a connection takes some time, depending on factors such as the quality of the network connection and the physical distance. Ouroboros is a real-time system, so it is good to hide some latency here. It wouldn't be good if the system was under pressure and yet still needed to connect to new peers; it's much better if the system maintains a handful of spare connections that are ready to take on any new task. A node should be able to make an educated decision about which existing connections to promote to get the best performance. For this reason we decided to have three type of peer:
- cold peers know about their existence, but there is no established network connection.
- warm peers have a connection, but it is only used for network measurements and none of the node-to-node mini-protocols is used;
- hot peers have a connection, which is being used by all three node-to-node mini-protocols.
A node can potentially know about thousands of cold peers, maintain up to hundreds of warm peers, and have tens of hot peers (20 seems a reasonable figure at the moment). There are interesting and challenging questions around the design of policies that will drive decisions for the peer-to-peer governor. Choice of such policies will affect network topology and alter the performance characteristics of the network, including performance under load or malicious action. This will shape the timely distribution of block diffusion (parameterized by block sizes), or transactions. Since running such a system has many unknowns, we'd like to phase it into two parts. For the first phase, which will be released in a few weeks (probably shortly after Praos, also known as the Shelley release), we want to be ready with all the peer-to-peer components but still running in a federated mode. In addition, we will deliver the connection manager together with implementing a server accepting connections, and its integration with the peer-to-peer governor. In this phase, the peer-to-peer governor will be used as a subscription mechanism. Running various private and public testnets, together with our extensive testing should give us enough confidence before releasing this to mainnet.
In the second phase, we will extend the mini-protocols with a gossip protocol. This will allow exchange of information about peers, finalize network quality measures, and plug them into the block-fetch logic (which decides from whom to download a block) as well as the peer-to-peer governor. At this stage, we would like to design and run some experiments to discover how peer-to-peer policies shape the network, and check how they recover from any topologies that are suboptimal (or adversarial).
I hope this gives you a good sense of where we are with the design and implementation of decentralization for Cardano, and our roadmap towards the Shelley era. You can follow further progress in our weekly reports.
This is the third of the Developer Deep Dive technical posts from our software engineering teams.
Cardano Virtual Summit 2020: Shelley Edition
Bringing the global community together under one virtual roof.
3 July 2020 5 mins read
Day One of the Cardano Virtual Summit 2020: Shelley Edition started with a splash. IOHK CEO, Charles Hoskinson, took the stage to welcome almost 10k registered attendees from around the world to the digital conference space. Each participant was met with exhibition areas, lots of virtual hangout space, and most importantly, session programming across five digital stages, covering everything from governance and community, to science and enterprise. With so much content to enjoy, we can’t hope to summarize it all here. So take a look for yourself. Register now to enjoy all of today’s sessions either live or on-demand right after the summit closes this evening.
To bring you up to speed, here are the key take-outs from Thursday.
Shelley is here
The summit comes right after a significant milestone in the Shelley rollout. The pace of delivery has been ramping up considerably over the last couple of months, leading to the June 30th deployment of the first Shelley-complete node. The stage is now set for the continued rollout, with a high degree of confidence in the dates we have laid out, with delegation and staking due on mainnet in August. However, the confirmation of Shelley’s successful deployment was just the first of many announcements at the summit.
Goguen and Voltaire are coming
Within 120 days, we’ll be rolling out the native asset standard for Cardano. Within 150 days, we’ll see the roll-out of Plutus Foundations, the first point of entry for smart contracts on Cardano. Sustaining a decentralized project means more than simply allowing users to run the protocol and build on the platform. They must also decide what the platform does and where it goes. Project Catalyst and Voltaire seek to give that power to the community through decentralized democracy. A pool of funds and a voting protocol will allow the Cardano community to make their voices heard on proposed advancements to the blockchain. Ada holders will also be able to submit improvement proposals to help shape the direction of the ecosystem. Improvement proposals might include software updates, technical development options, funding decisions, and choices about the long-term plans of the ecosystem. Once delivered, the Cardano network will become a self-governed and user-run platform.
Atala PRISM
Day One continued on a successful note with the announcement of PRISM, a decentralized identity solution that enables people to own their personal data and interact with organizations seamlessly, privately, and securely. It will encourage better practice in consumer data privacy and security by offering users ‘self-sovereign’ digital identities, without Big Tech intermediaries accessing, storing or sharing personal data.
PRISM also promises to open up access to a blockchain marketplace of financial and social services for millions of users who might not previously have had access to banking and financial services. This will enable low-income populations to store and share personal information like credentials, land deeds, and health records. Through PRISM, users can also access financial products like loans and insurance.
Atala PRISM is built for compatibility with other blockchains and legacy systems to make it accessible worldwide. A pilot currently underway in Georgia will give employers the ability to verify graduate qualifications instantly. But PRISM is only one part of a suite of enterprise-grade ‘layer 2’ software solutions designed to bring advanced, highly flexible functionality to blockchains. Other elements of the evolving Atala suite of products include traceability for supply chains and the ability to monitor the movement of goods globally.
In an impressive demo, the team showcased the platform and mobile app with a smart city walkthrough. The app will work with Android and iOS devices, while the platform also supports paper wallets and smart cards.
cFund
Last but not least, we announced a landmark partnership with Wave Financial Group. The $10 million dollars pledged to the development of Cardano’s ecosystem represents the first venture capital to support IOHK. Together, the two companies will generate a $20 million dollar incubator fund to help startups who want to build on the blockchain. This provides several years of financial runway to target seed and early-stage opportunities. Wave Financial is looking to invest in global companies created on Cardano by giving funding of up to $500k. The move aligns both firms’ interests in driving Wave to become the industry’s leading digital asset management group while ensuring IOHK becomes the vanguard technology provider in the space.
We’ll revisit all these stories over the weeks ahead to give you a deeper dive into our plans and what they mean. But, as excited as we are by these opportunities for future adoption and growth, Day One was also a day of reflection.
Because we would not be where we are without the incredible support of the Cardano community, which helped to bring us here and joined from all around the world to celebrate Shelley. Connections were made between companies, platforms... and people. Day One was full of big announcements, and boosted by some fascinating speaker sessions with world-class scientists, technologists, and thought leaders sharing their vision of the future. It was also full of optimism, of ideas, buzzing with community contribution and opportunity. Thanks to everyone who helped make it such a special day. And see you back at the summit later today!
Documenting Cardano: a single source of developer truth, fit for the future
1 July 2020 3 mins read
With yesterday’s release of the first Shelley-complete node on the Cardano mainnet, a new era has begun. Between this, the launch today of a brand new Cardano website and brand identity, and tomorrow’s Cardano Virtual Summit, the dawn of Shelley is casting fresh light on IOHK's future. It has also highlighted the need for a documentation framework fit for our rapidly growing ecosystem.
We have written and gathered a substantial amount of technical material on Cardano over the years. Inevitably, this has developed organically and has been of variable quality and utility. Not all of it remains accurate or useful. The deployment of Shelley, the growth in developer interest, and the evolution of Cardano has moved us from project into product. So this has created the need to take a fresh look at how we explain Cardano.
Users and consumers of Cardano need to have clear, concise, and relevant material that matches the quality of the code on which the blockchain is built. Documentation needs to be correct and useful, and well organised and focused. In addition, we have identified the need to map categories of information to each user group, or persona. We're building a best-in-class blockchain, and this necessitates high-quality supporting documentation. The old Cardano docs site was in serious need of re-organization and updating, and our technical writers have been busy doing just that. Welcome to the new Cardano docs.
Documentation for the future
Cardano is a complex product with complex technological underpinnings. However, for the developer community and wider audience to embrace it, we need them to understand how it can provide rewarding experiences. They need to know what Cardano can do and how it can solve problems. To ensure that we have the right information for all users, we are reorganising the documentation suite and mapping what each audience needs.
Over the past few months, IOHK has been engaged in a company-wide effort to revamp all the content about the Cardano project. Our technical writing team has been immersed in transforming our previous repository, working closely with our developer teams to create a single source of technical truth, A resource of effective, informative, and up-to-date information that provides value to a broad set of audiences: exchange partners, stake pools, experienced blockchain developers, as well as people who want to understand Cardano. That also caters for skilled crypto enthusiasts and those who simply want to learn about Cardano from a technical perspective.
So today, we’re launching a new documentation site. Instead of a static repository, the site is now a living, breathing source of information on Cardano, plugged into the heart of our development methodology. Just like Cardano, this is an ever-evolving entity. Components are added regularly as we strive to improve on the content we are delivering. New functionality is implemented all the time. We also intend to publish a lot more documentation based on our plans and community feedback. We need to build this out based on what works for our users, whether they sit in the technology or commercial space.
With the virtual summit and our continuing momentum bringing a host of new faces into the ecosystem, we now have a jumping-off point for a single source of developer truth that we can build on.
Recent posts
2021: the year robots, and graffiti came to a decentralized, smarter Cardano by Anthony Quinn
27 December 2021
Cardano education in 2021: the year of the pioneers by Niamh Ahern
23 December 2021
Cardano at Christmas (and what to say if anyone asks…) by Fernando Sanchez
21 December 2021