Towards performance modeling of FastFabric

First published: 28 Feb 2019; Last updated: 28 Feb 2019

In our recent work [1], we presented a stochastic performance model for Hyperledger Fabric V1. Recently, a FastFabric architecture has been presented [2], where the authors suggest improvements in the internal architecture of each peer/transaction phase, resulting in an overall throughput improvement of more than 6x. In this blog post, I would like to share my thoughts on the possible changes required in our Fabric performance model to accommodate the recommendations proposed in the FastFabric. I have summarized my discussion by the transaction phase/nodes involved:

Client
For a system with a single client peer running at high throughput, the client peer needs to process multiple transactions in parallel (refer transition \text{T}_\text{Pr} in Fig. 2 in [1]), limited by max. hardware threads.

Endorsing
As recommended in [2], a single endorsing peer should endorse multiple transactions in parallel. We had a similar recommendation from our analysis, just that we recommended having multiple endorsing peers per org. where any peer could endorse the transaction on behalf of the Org (denoted by Org.member in the endorsement policy). We should not confuse this having an endorsement policy such as OR(Org1.member, Org2.member, …) where any of the peers from any org. could have endorsed the transaction.

Transactions \text{T}_\text{En0}, \text{T}_\text{En1} in our model capture this easily using marking-dependent firing rates (for e.g., see transitions \text{T}_\text{Pr} and \text{T}_\text{VSCC}). One limitation of our model is that we implicitly assumed endorsing and validating peers each have dedicated resources . Thus \text{CPU}_\text{max} in \text{T}_\text{VSCC} refers to max. hardware threads for VSCC validation, although one hardware thread was implicitly used by the endorser as well. Since [2] proposed having dedicated resources for endorsers, our current model would accurately capture the FastFabric architecture.

Ordering
a) Transmission from Client to the Ordering service (Transition \text{T}_\text{Tx} )
From our model and measurements, we found that it was the performance bottleneck. We measured it by considering the time between a client sending the transaction and the time the Kafka node writes it. From the observations in [2], it seems that significant delays were added due to 1) Orderer wasn’t processing txs. in parallel, 2) Communication overhead between Orderer and Kafka node.

From a modeling perspective, we might need to consider a marking-dependent firing rate for \text{T}_\text{Tx} (or perhaps an added immediate transition and place). With the improvements described in [2], the firing time characteristics for \text{T}_\text{Tx} would be much better than that from our setup.

b) Block transmission from ordering service to validating peers
It is interesting that the throughput maximizes at a particular block size (around 100 in their case). In our paper, we had only analyzed the latency, where it increases linearly with the block size (expected).

Committer (Validator)
It is interesting that the committer throughput maximizes at a particular block size (around 100 in their case). In our paper, we had only analyzed the latency, where it increases linearly with the block size (expected).

From our model perspective, the most important change is that the ledger write (both block storage and levelDB update) can be off-loaded to separate nodes. In our model, this time-consuming process is captured by transition \text{T}_\text{ledger}. It was the performance bottleneck of the committing peer in our setup, perhaps more so since our machines used HDDs rather than SSDs). For the proposed changes, we can remove this transition (+ \text{P}_\text{Ledger}) altogether and measure the throughput at \text{T}_\text{MVCC} instead.

The rest of the changes in the committer would speed up the firing time characteristics of the relevant transitions, given that the peer resources are pretty freed up.

Comments about their experimental setup

  1. All Fabric nodes were running natively rather than using docker containers, which might not be the case in the real-world [4].
  2. Server hardware – Pretty solid hardware, running 24 hardware threads, comparable to that in [3], which is recommended in my opinion. (Our setup with four hardware threads at each peer was somewhat lame).
  3. Workload – transaction complexity is similar to ours (two key-value reads/writes). All transactions are valid. No comments about the transaction arrival characteristics (tx. arrivals in our setup followed Poisson arrival process)
  4. Workload generator – No details (was it Caliper?)
  5. Endorsement – 5 peers are involved but no details of the endorsement policy. Given such good performance, I am assuming its OR(Org1.member, Org2.member, … Org5.member).
  6. Time duration of experiments – The expts. were run with 100k transactions and repeated 1000 times. Given that the transaction throughput was reaching 10k to 30k tps, each expt. probably lasted a few seconds. Not sure if this was enough to warm up the setup. Perhaps the authors could have run the expts. for a longer “fixed” duration and clipped the first and last few seconds as ramp-up and ramp-down.
  7. Orderer – Only one orderer is used. We had the same limitation (since Caliper supported only one orderer at that time (or even now?)). However, the proposed new parallelism in client transaction processing seems to help significantly.

References

  1. H. Sukhwani et al., “Performance Modeling of Hyperledger Fabric (Permissioned Blockchain Network)”. In IEEE International Symposium on Network Computing and Applications (NCA), 2018
  2. C. Gorenflo et al., FastFabric: Scaling Hyperledger Fabric to 20,000 Transactions per Second. https://arxiv.org/abs/1901.00910
  3. P. Thakkar et al., “Performance Benchmarking and Optimizing Hyperledger Fabric Blockchain Platform”. In IEEE MASCOTS, 2018.
  4. https://stackoverflow.com/questions/48070380/does-hyperledger-fabric-need-docker