juicebox-systems/juicebox-hsm-realm

GitHub: juicebox-systems/juicebox-hsm-realm

Stars: 19 | Forks: 5

# Juicebox HSM Realm This repo contains the source code for Juicebox's HSM-backed realms. An HSM-backed realm can be accessed via the [Juicebox SDK] alongside other realm implementations, such as [Juicebox software realms]. The HSM-backed realm is relatively complex to understand and operate. It's intended for use with programmable [Hardware Security Modules]. These can provide better privacy properties than commodity hardware, but they impose some constraints and complications on the architecture of the HSM realm. For development and testing purposes, you can run an HSM realm on commodity hardware. It's much more complicated to operate than using [Juicebox software realms] and doesn't offer any better security, so doing so is only useful for pre-production environments. ## Architecture Overview An HSM realm consists of _load balancers_, _agents_, _HSMs_, _cluster managers_, and it relies on [Google Cloud Bigtable], [Google Cloud Pub/sub] and [Google Cloud Secret Manager]. The architecture is designed to scale up to hundreds of HSMs in one realm and tolerates the failure (by stopping) of some of those HSMs. This section provides a brief overview of the architecture. The architecture assumes an adversary could inspect and control all components and the network, except that the adversary does not have visibility or control over HSM internals. Such an adversary could cause permanent denial of service in many ways, but it is our goal that the adversary cannot learn a PIN-protected secret without guessing the PIN in the limited number of attempts set by the user. Clients communicate with _load balancers_ over HTTPS. Client requests include an auth token and encapsulate an HSM request that is encrypted with a [Noise protocol]. The load balancer checks the auth token (using a key from Secret Manager) and produces a _record ID_ from its claims. The load balancer routes the request to a particular _agent_ based on this record ID. The _agent_ is a host-side daemon that is paired with each HSM. The system is designed for use with PCIe-attached HSMs, where an agent runs on each physical server that has an HSM. The agent receives the request from the load balancer and generally forwards it to the HSM. The HSMs are programmable and run portions of the code in this repo. Depending on context, we use the term _HSM_ to mean either the physical device or our code running on it. Through an initialization process, each HSM in the realm is given a copy of secret cryptographic keys. This allows the HSM to decrypt the client's Noise messages and process requests. The HSMs in the realm use a consensus protocol to provide strong consistency (linearizability) while tolerating the failure of some HSMs. The secrets-management application requires strong consistency to prevent an adversary from obtaining more guesses by causing a network partition, for example. The HSMs use an _authenticated consensus protocol_ rather than a traditional [consensus protocol]. In the authenticated consensus protocol, the HSMs add a layer of validating freshness and authenticity on top of a traditional consensus protocol that runs on commodity hardware. This limits the amount of state that the HSMs need to maintain and persist, and it lets the realm implementation leverage existing systems. The implementation uses [Google Cloud Bigtable] as an external storage system providing scalability and traditional consensus (Bigtable provides atomic operations within a single row). The HSMs are organized into replication groups, which use majority quorums and are leader-based. Each replication group should have a small number of HSMs and can tolerate the failure of any minority of those; for example, a group with 5 HSMs can tolerate the failure of any 2. The overall realm throughput scales up with more groups. Requests for a group are routed to a _leader_ HSM. Other HSMs in the group are called _witnesses_. Ideally, there should be one leader at any given time. When a leader fails, a witness can be promoted to be a leader. Each group maintains a _log_ for its operations. The HSMs create and process log entries, but they only persistently store metadata about a single recent log entry. Each group is responsible for serving up to one contiguous range of record IDs. The records for that range are organized in a _Merkle tree_. When a leader HSM modifies a Merkle tree, it creates a new _log entry_, which contains the root hash of the new Merkle tree. Both the logs and the Merkle trees are stored in Bigtable. Each replication group has fixed membership. New replication groups can be created dynamically, and ownership of a range of record IDs can be transferred from one group to another with an _ownership transfer protocol_. The replication groups themselves are the source of truth for what range of record IDs they serve. Agents register themselves with a service discovery table (stored in Bigtable), and load balancers query the agents to learn how to route requests. ## Code Structure This repo depends on a few crates from the [Juicebox SDK] repo (which is included as a Git submodule at the path `sdk`). This is a diagram of the current crates, showing their local dependencies: ![Dependency Graph](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/ba022368f3234127.png) Entrust is an HSM vendor, and the Entrust-specific code is named accordingly. Some of it is automatically generated from their C headers. Note that the HSM code and its dependencies do not use the Rust standard library. The standard library makes assumptions about the operating system that may not be valid on all HSMs. These crates depend only on the `core` and `alloc` crates, not `std`. There are also various tools in this repository: - `cluster_bench` runs a client benchmark against existing realm(s). - `cluster_cli` is used to manage realms from the command-line, including initializing realms and transferring ownership. - `codegen` generates bindings from Google's Protocol Buffers definitions. - `entrust_init` is used to set up Entrust HSMs before they can participate in a realm. - `entrust_ops` is used to manage Entrust HSMs more safely and conveniently. - `src/bin/demo_runner` runs a large realm on localhost and, by default, runs the demo against it. - `src/bin/hsm_bench` runs a small realm on localhost and, by default, runs a benchmark against it. ## Builds System dependencies: - [Rust](https://rustup.rs/) - [Protocol Buffers compiler](https://github.com/protocolbuffers/protobuf#protocol-compiler-installation), as located and used by [prost-build](https://docs.rs/prost-build/latest/prost_build/#sourcing-protoc). This is needed for `opentelemetry-otlp` and can also be used to regenerate the Google Cloud API messages. On Debian, `apt install protobuf-compiler` is sufficient. - See the section on the Bigtable emulator below, which you'll also need. Then: - `cargo test --all` to run the tests - to run the client rust demo: cd sdk cargo build -p juicebox_demo_cli cd .. cargo run --bin demo_runner -- --demo sdk/target/debug/demo - to run the swift demo: cd sdk/swift ./ffi.sh cd demo swift build cd ../../.. cargo run --bin demo_runner -- --demo sdk/swift/demo/.build/debug/demo ### Cross Compile This section describes how to cross-compile the HSM code to PowerPC so it can run on an Entrust HSM. The PowerPC CPU in the HSM doesn't support Altivec features, but Rust's prebuilt libraries for PowerPC assume those are available. We need to build Rust's standard library crates ourselves. This normally [requires the nightly toolchain](https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std), but we use a `RUSTC_BOOTSTRAP=1` hack to enable unstable features on the stable toolchain (the [Linux kernel](https://github.com/torvalds/linux/blob/706a741595047797872e669b3101429ab8d378ef/Makefile#L608) does this too). Install pre-requisites: rustup target add powerpc-unknown-linux-gnu rustup component add rust-src sudo apt install qemu-user qemu-user-binfmt gcc-12-powerpc-linux-gnu The `build-ppc.sh` and `test-ppc.sh` scripts build and test the PowerPC version. In addition to the options set in the scripts, the `.cargo/config.toml` file is used to set the linker and CPU target. ## Local Bigtable emulator You'll need the Bigtable emulator to run offline. You may also want the Cloud Bigtable CLI tool, which works with the emulator and is installed the same way. You can install these either using the hefty `gcloud` SDK or using the Go compiler. ### Option 1: Install using the `gcloud` SDK: Run: gcloud components update beta gcloud components install cbt And start the emulator: gcloud beta emulators bigtable start --host-port localhost:9000 ### Option 2: Install using Go: Run: go install cloud.google.com/go/bigtable/cmd/emulator@latest go install cloud.google.com/go/cbt@latest And start the emulator: emulator -host localhost -port 9000 ### Using cbt `cbt` is a Cloud Bigtable CLI tool. We'll make an alias for using it with the local emulator, then create a table. alias lbt='BIGTABLE_EMULATOR_HOST=localhost:9000 cbt -creds /dev/null -project prj -instance inst' List tables: lbt ls You can create tables like this: lbt createtable tab families=fam ## Local Pub/Sub emulator You'll need the Google pub/sub emulator to run offline. By default the tests run this in a docker container. You can optionally install the google cloud SDK and emulator and run it directly instead. For docker, pull the image first, then run tests as normal. docker pull gcr.io/google.com/cloudsdktool/google-cloud-cli:emulators To use the SDK directly, install the pubsub-emulator component. gcloud components install pubsub-emulator Then set the `PUBSUB_JAR` environment variable to the path containing the emulator jar. This is typically `~/google-cloud-sdk/platform/pubsub-emulator/lib/cloud-pubsub-emulator-0.8.6.jar` but can vary based on how the cloud SDK was originally installed. If you need to run the emulator manually, you can do docker run -i --init -p 9091:8085 gcr.io/google.com/cloudsdktool/google-cloud-cli:emulators gcloud beta emulators pubsub start --host-port=0.0.0.0:8085 or gcloud beta emulators pubsub start --project=prj --host-port 0.0.0.0:9091 ## OpenTelemetry traces The code sends OpenTelemetry traces over OTLP (GRPC) to `http://localhost:4317`. You can run a Jaeger instance, for example, to receive these and view them: Run: COLLECTOR_OTLP_ENABLED=true ./jaeger-1.42.0-linux-amd64/jaeger-all-in-one --collector.otlp.grpc.host-port=:4317 Open . ## Datadog otlp_config: receiver: protocols: grpc: endpoint: 0.0.0.0:4317 ## TLS Certificates The load balancer requires connections to use TLS. The load_balancer process takes cmdline arguments to specify where the key & cert files are. Demo and hsm_bench will generate a self signed cert for use during these runs. This requires you to have `openssl` in your `$PATH`. ## Further Reading - Blog post: [Running a Juicebox hardware realm](https://juicebox.xyz/blog/running-a-hsm-realm) - Whitepaper: [Juicebox Merkle-Radix Tree](https://juicebox.xyz/assets/whitepapers/merkleradix_revision1_20230629.pdf) - Whitepaper: [Juicebox protocol](https://juicebox.xyz/assets/whitepapers/juiceboxprotocol_revision7_20230807.pdf) - Blog Post: [Key to Simplicity: Squeezing the hassle out of encryption key recovery](https://juicebox.xyz/blog/key-to-simplicity-squeezing-the-hassle-out-of-encryption-key-recovery)
标签:通知系统