Introduction

The Nimbus Nim-Beacon-Chain (NBC) project is an implementation of the Ethereum 2 Beacon Chain specification in the Nim programming language.

The Auditors' Handbook aims to be provide a comprehensive introduction to:

  • The Nim programming language, as used in the project.
  • The NBC project.
  • The dependencies of the project.

A particular focus will be given to features related to safety, correctness, error handling, testing, fuzzing, or inspecting Nim code.

One of the major highlights of Nim is that it compiles to C or C++ before compiling to native code. All techniques available to audit C code can be used to audit Nim.

The dependencies NBC rely on are detailed per audit phase in the build system and dependencies section.

The Nim Programming Language

The Nim programming language is a compiled language, with strong static typing.

The rest of the Handbook will assume that Nim-by-example was read.

Nim compilation process is in 2 phases, first lowering the Nim code to C, C++ or Javascript. Then for machine code, rely on the C/C++ compiler to produce the final code.

Nim can target any combination of C compiler, host OS and hardware architecture as long as the C compiler supports it.

Installing

Nim can be installed via:

Nim Vagrant https://github.com/status-im/nim-vagrant is unmaintained but might help setting up your own virtualized environment.

We target Nim 1.2.2 and should be compatible with the latest stable, Nim 1.2.4

Casings

Nim has unusual partial case insensitivity for identifiers. The rationales being:

  • Preventing bugs when using SDL_QUIT instead of SDL_Quit.
  • Having consistent casing in a codebase even when relying on external dependencies with different casing.

The convention used in Nim-Beacon-Chain is:

  • snake_case for fields and procedures names from the Ethereum spec
  • MACRO_CASE for Ethereum spec constants
  • PascalCase for all types (Ethereum or additional)
  • camelCase for our own additional code
  • PascalCase for our additional constants

In summary, we respect the Ethereum spec for Ethereum specified identifiers and use Nim NEP-1 for the rest.

Checking the C code

By default the intermediate C code produced by the Nim compiler is available at

  • $HOME/.nim/compiled_project_d on UNIX systems
  • $HOME/nimcache/compiled_project_d on Windows

The suffix _d indicates a debug build, the suffix _d indicates a release build

Compiler options

At the time of writing, NBC targets Nim v1.2.2 compiler. The build system is at https://github.com/status-im/nimbus-build-system No patching is done at the moment on the Nim compiler, we use vanilla v1.2.2 upstream.

Nim compiler offers debug, release with -d:release and danger with -d:danger flag.

The debug and -d:release build differ by, the verbosity of stacktraces and passing -O3 or equivalent to the C compiler.

Runtime checks (overflow, array bounds checks, nil checks, ...) are still included in -d:release build. We also choose to have verbose stacktraces in NBC.

A danger build optimizes away all runtime checks and debugging help like stackframes. This might have a significant impact on performance as it may enable optimizations that were not possible like optimizing tail calls. This is not used in NBC.

References

Further resources are collected at:

Compiler configuration

Style Guide

Nim Routines

Nim offers several kinds of "routines" that:

  • do computation
  • produce side-effect
  • generate code

Those are:

  • proc and func
  • method
  • converter
  • iterator
  • template
  • macro

proc and func

proc and func are the most basic routines.

At the moment, Nim requires forward declaration of proc and func. Also it prevents circular dependencies, this means that a procedure is always coming from one of the imports.

Additionally, all dependencies are submodules and a proc can be found by greping procname*, the * being the export marker.

The only exception being the standard library. Procedures from the standard library are all listed in "The Index".

Function call syntax

Nim provides flexible call syntax, the following are possible:

prof foo(a: int) =
  discard

foo(a)
foo a
a.foo()
a.foo

Additionally this is also possible for strings:

let a = fromHex"0x12345" # Without spaces

Nim doesn't enforce namespacing by default but is an option

let a = byteutils.fromhex"0x12345"

Parameters

Mutable parameters must be tagged with var

TODO

Symbol resolution

If 2 procedures are visible in the same module (a module is a file) and have the same name the compiler will infer which to call from the arguments signatures. In case both are applicable, the compiler will throw an "ambiguous call" compile-time error.

Note that a procedure specialized to a concrete type has priority over a generic procedure, for example a procedure for int64 over a procedure for all number types.

func and side effect analysis

In Nim a proc is considered to have side-effect if it accesses a global variable. Mutating a declared function parameter is not considered a side-effect as there is no access to a global variable. Printing to the standard output or reading the standard input is considered a sideeffect.

func are syntactic sugar for proc without sideeffects. In particular this means that func behaviors are fully determined by their input parameters.

In the codebase, logging at the trace level are not considered a sideeffect.

Additionally some logging statements and metrics statement may be in an explicit {.noSideEffect.}: code-block.

Returning values

There are 3 syntaxes to return a value from a procedure:

  1. The return statement
  2. The implicit result variable
  3. The "last statement as expression"
proc add1(x: int): int =
  return x + 1

proc add2(x: int): int =
  result = x + 2

proc add3(x: int): int =
  x + 3

The main differences are:

  1. return allows early returns, in particular from a loop.
  2. result offers Return Value Optimization and Copy Elision which is particularly valuable for array types.
  3. Requires the last statement to be a valid expression. This is particularly interesting for conditional return values as forgetting to set the value in a branch will be a compile-time error, for example:
    proc select(ctl: bool, a, b: int): int =
      if ctl:
        echo "heavy processing"
        a
      else:
        echo "heavy processing"
        b
    
    Omitting a or b will be a compiletime error, unlike
    proc select(ctl: bool, a, b: int): int =
      if ctl:
        echo "heavy processing"
        return a
      else:
        echo "heavy processing"
        # Forgot to return b
    
    proc select(ctl: bool, a, b: int): int =
      if ctl:
        echo "heavy processing"
        result = a
      else:
        echo "heavy processing"
        # Forgot to result = b
    

Due to the differences we prefer using the "last statement as expression" unless

  • copying the type is expensive (SHA256 hash for example)
  • or we need early returns

Ignoring return values

Unlike C, return values MUST be used or explicitly discarded.

Mutable return values

TODO

At a low-level

Argument passing

Nim passes arguments by value if they take less than 3*sizeof(pointer) (i.e. 24 bytes on 64-bit OS) and passes them by pointer with the C backend or reference with the C++ backend if they are bigger. Mutable arguments are always passed by pointer.

This behavior can be changed on a type-by-type bases by tagging them {.bycopy.} or {.byref.}. This is only used for interfacing with non-Nim code.

Stacktraces

With --stacktrace:on, Nim create a stackframe on proc entry and destroys it on exit. This is used for reporting stacktraces.

NBC is always compiled with --stacktraces:on

NBC uses libbacktrace to have less costly stacktraces.

Name in the C code or Assembly

proc and func are materialized in the produced C code with name-mangling appended at the end. For the purpose of building Nim libraries, the name can be controlled by:

  • {.exportc.} so that the generated C name is the same as Nim
  • `{.exportc: "specific_name".} to generate a specific name

method

methods are used for dynamic dispatch when an object has an inherited subtype only known at runtime.

method are dispatched using a dispatch tree in the C code instead of a VTable.

There might be some cases where method were used not for their intended purpose

converter

Converters are procedures that are implicitly called on a value to change its type.

For example with a fictional option type that automatically extracts the boxed type.

type Option[T] = object
  case hasValue: bool
  of true:
    value: T
  else:
    discard

converter get[T](x: Option[T]): T =
  x.value

let x = Option[int](hasValue: true, value: 1)
let y = Option[int](hasValue: true, value: 2)

let z = x + y

Even though the + operator is not defined for Option[int] it is defined for int and Nim implicitly calls the converter.

converter are seldom used in the codebase as we prefer explicit over implicit.

Note that in case an operation is defined on both the convertible and the converted type, the operation without conversion should be preferred however the compiler might throw an ambiguous call instead.

Iterators

Iterators are construct that transforms a for loop.

For example to iterate on a custom array collection

const MaxSize = 7

type SmallVec[T] = object
    buffer*: array[MaxSize, T]
    len*: int

iterator items*[T](a: SmallVec[T]): T =
  for i in 0 ..< a.len:
    yield a.data[i]

Now iterating becomes

for value in a.items():
  echo a

A singly-linked list forward iterator could be implemented as

iterator items[T](head: ref T): ref T =
  ## Singly-linked list iterator
  assert: not head.isNil
  var cur = head
  while true:
    let next = cur.next
    yield cur
    cur = next
    if cur.isNil:
      break

a doubly-linked list backward iterator as

iterator backward[T](tail: ptr T): ptr T =
  var cur = tail
  while not cur.isNil:
    let prev = cur.prev
    yield cur
    cur = prev

an iterator to unpack individual bits from a byte as:

iterator unpack(scalarByte: byte): bool =
  yield bool((scalarByte and 0b10000000) shr 7)
  yield bool((scalarByte and 0b01000000) shr 6)
  yield bool((scalarByte and 0b00100000) shr 5)
  yield bool((scalarByte and 0b00010000) shr 4)
  yield bool((scalarByte and 0b00001000) shr 3)
  yield bool((scalarByte and 0b00000100) shr 2)
  yield bool((scalarByte and 0b00000010) shr 1)
  yield bool( scalarByte and 0b00000001)

In all cases, the syntax to iterate on the collection remains:

for value in a.items():
  echo a

for value in b.backward():
  echo b

for bit in s.unpack():
  echo s

The echo is inlined at "yield".

Iterators are not present in the produced C code, they are always inlined at the callsite.

Iterators are prone to code bloat, for example

iterator iterate[T](s: seq[T], backward: bool): T =
  if backward:
    for i in s.len-1 .. 0:
      yield s[i]
  else:
    for i in 0 ..< s.len:
      yield s[i]

for value in s.iterate(backward = false):
  ## Long-series of operations
  echo value

The long series of operation will be duplicated.

items and pairs

The items and pairs iterator are special cased and implictly call if there is respectively one and two iteration variables hence:

for x in collection:
  echo x

will automatically call the items proc defined for the collection (or error)

for x, y in collection:
  echo x
  echo y

will automatically call the pairs proc defined for the collection (or error)

fields and fieldPairs

fields and fieldsPairs are iterator-like magic, that allow "iterating" on an object field. Note that those are unrolled at compile-time.

Closures and closure iterators

Will be covered in a dedicated section.

They are the backbone of Chronos, our async/await framework and also have a major potential for memory leaks.

template

templates in Nim allows raw code substitution.

templates are hygienic and typechecked unlike the C preprocessor. Also they create their own scope unless tagged with the {.dirty.} pragma.

A major issue with templates is that as they "copy-paste" code, it is very easy to misuse them and do a computation twice.

For instance

proc foo(): int =
  echo "launch missile"
  return 1

template doSomething(a: int) =
  process(a)
  log(a)

This would be transformed to:

process(foo())
log(foo())

and triggers the "launch missile" side-effect twice.

Another issue with templates is that they may not generate stacktraces properly as they are not materialized in the C code.

Symbol visibility and {.inject.}

TODO

macro

TODO

The do notation

TODO

Operators and bit manipulation

Operators

A set of symbol and keywords can be used as infix operators Nim supports operator overloading.

Those symbols are:

=     +     -     *     /     <     >
@     $     ~     &     %     |
!     ?     ^     .     :     \

The keyword operators are

and or not xor shl shr div mod in notin is isnot of as

In-particular:

  • bitwise and, or, flip, xor are denoted and, or, not, xor instead of using a symbol
  • shift left and right are shl and shr
  • division and modulo are div and mod

Implementation-defined behaviors

mod is defined as the mathematical remainder, like C. With signed integers a mod b has the same sign as a

shr of a signed integer will not preserve the sign bit. ashr can be used for arithmetic right shift.

This distinction was introduced recently and may lead to subtle bugs, also ashr relies on the C compiler actually lowering >> to SAR for signed integer. This is specified for GCC and Clang (https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation) but implementation defined in general.

Operator precedence

Operator precedence is specified described in the manual:

https://nim-lang.org/docs/manual.html#syntax-precedence

Additional system operators

Nim system exports additional operators with a % like +% and -%. Those cast the signed integer operand to unsigned and cast the result back to signed. This is intended to use with the ByteAddress type for pointer arithmetic

Bit manipulation

https://github.com/status-im/nim-stew/blob/master/stew/bitops2.nim

Closures and closures iterators

TODO

At a low-level

Closures and closures iterators are implemented via a pointer + an environment that stores the captured state necessary to execute the function.

The Nim compiler has a limited form of borrow checking and prevents capturing mutable variable or openArray (pointer+length pair).

It otherwise copies the capture variables in case of objects with value semantics or increment the reference count in case of ref object.

The Chronos library needs to generate a closure iterator for all async proc instantiated which may lead to long-lived ref objects and unreclaimable memory.

This may also extend to other resources like socket connections or file descriptors and a significant effort is underway to track memory usage and detect such scenarios:

Nim data types

Builtin types

Numerical types

For integers, Nim offers uint8, uint16, uint32 and uint64 types as well as int8, int16, int32, int64 types.

Operations on signed integers are overflowed and underflowed checked unless the flag --overflowChecks:off is used.

Nim also offers int and uint types, which have the same size as the platform word size, so 4 bytes when compiled for 32-bit OS or 8 bytes for a 64-bit OS.

Conversion between int and int32 or int64 must be explicit except for string literals.

Integer literals default to int.

float32 and float64 maps to C float and double. float is an alias to float64 whether on a 32-bit or 64-bit platform

Binary blobs

Nim has a specific byte type which behaves like uint8.

It is the preferred type to represent binary blobs, i.e. we use seq[byte] over string, seq[char] or seq[uint8] for binary data.

Range types

Nim allows defining ranges of valid value which will be runtime checked everytime the value changes for example Nim defines by default type Natural = range[0 .. high(int)]. If the value of a Natural becomes less than 0 an RangeError exception will be thrown.

This is valuable to catch / prevent underflows.

Sequences

Sequences are heap-allocated containers similar to C++ vectors.

They have value-semantics and are copied on assignments.

Sequences have a data pointer, reserved memory and current length

Strings

Sequences are heap-allocated containers.

They have value-semantics and are copied on assignments.

Strings have a data pointer, reserved memory and current length.

The data is terminated by \0.

Nim strings automatically decays without copy to C strings in FFI calls

The representation is the same as a seq[byte] or seq[char] except for the terminating nul-byte, including within the GC tracking data structures Consequently, strings can be cast to seq[byte] but inversely casting seq[byte] to string will lead to non nul-terminated string in C FFI and buffer overflows.

Record / structs

Nim has 3 kinds of record types

  • Value

    type Foo = object
      field0: int
      field1: float
    
  • Reference

    type Foo = ref object
      field0: int
      field1: float
    
  • Pointer

    type Foo = ptr object
      field0: int
      field1: float
    

Value

A value object is allocated on the stack (unless if nested in heap-allocated types).

The equality check is structural.

Copy is done by copyMem (C memcpy)

Object variants do not have an equality operator set by default

Reference

A reference object is allocated on the heap and managed by the GC.

They are not thread-safe.

The equality check is by reference

Copy is done by copying the reference and increment the reference count

Pointer

A pointer object is manually managed, it can be on the heap (malloc) or on the stack (alloca)

Casting and low-level memory representation

Conversions

Casting to a signed integer will lead to a range check. Conversion to an unsigned integer even from a negative signed integer will NOT lead to a range check (https://github.com/nim-lang/RFCs/issues/175) https://nim-lang.org/docs/manual.html#statements-and-expressions-type-conversions

Casting integers

Unsigned integer casts behave like C.

Upcasting will lead to zero extension Downcasting will lead to truncation

Signed integer ranges will not be checked with a cast.

Casting to/from other types

Casting to or from any other types will be done through:

  • union casts by default
  • or C conversion if the type is a pointer.

In practice this means that the bit-pattern will be reinterpreted as the new type, similar to C++ reinterpret cast.

Nim memory management

Nim memory management is on a per-type basis.

Plain objects and char and numerical types are allocated on the stack.

Sequences and strings are allocated on the heap but have value semantics. They are copied on assignment

Ref types are allocated on the heap and have reference semantics, i.e. an unique instance can be held by multiple variables and only when all those variables go out-of-scope is the ref type discarded.

By default Nim uses a deferred reference counting GC. Additionally, if the type can lead to cycles, Nim will add "mark-and-sweep" passes to collect them.

Destructors

TODO

Nim allocator

Nim GCs are backed by a TLSF allocator which allows Nim to provide soft real-time guarantees if needed.

Analyzing memory leaks

Nim can be compiled with -d:useMalloc to bypass the TLSF allocator and directly use malloc/free

References

Generics and Static types

Nim types can be parametrized by types (generics) or compile-time values (static)

For example

type
  MySeq[T] = object
    len, reserved: int
    data: ptr UncheckedArray[T]

The generics can be restricted

type
  MySeq[T: int32 or int64] = object
    len, reserved: int
    data: ptr UncheckedArray[T]

With static types

type
  SmallSeq[MaxLen: static int, T] = object
    len: int
    data: array[MaxLen, T]

Arrays, openarrays, varargs

Arrays

TODO

Openarrays

Openarray are a parameter-only type that represent a (pointer, length) pair. In other languages they are also known as slices, ranges, views, spans.

The name openArray is inherited from Pascal, Oberon and Modula 2

Arrays and sequences are implictily converted to openArray.

The compiler has a limited form of escape analysis to prevent capturing openarrays in closures or returning them.

UncheckedArrays

TODO

Correctness, distinct, mutability, effects, exceptions

The Nim compiler provides several constraints that can be used to enforce proper usage of variables, types and error handling at compile-time.

One was already mentioned in previous paragraphs:

  • Side-Effect analysis via using func or {.noSideEffect.} (in the routines chapter)

Note that range types currently work at runtime only

Side-Effect

As mentioned in the routines chapter, using a func or a proc tagged with {.noSideEffect.} will prevent reading or writing to global variables (i.e. variables that are neither parameters or locals).

Note that side-effect analysis cannot analyse raw emitted C code

Additionally, allocating a sequence or a string, even if they technically access a global memory allocator, is not considered a side-effect.

The compiler will ignore statements in a {.noSIdeEffect.} block for the purpose of side-effect analysis. At the moment this is only used for trace and some debug logs, as writing to stdout/stderr is considered writing to a global variables and so a side-effect.

not nil

The compiler exposes a "not nil" annotation for reference and pointer types. This enforces that parameters are proven always initialized in tagged:

  • procedures
  • types

This is not used in the codebase as a more powerful prover is required for our application.

Currently, the compiler warns when it cannot prove that a result reference is not nil.

distinct types

A distinct type is a type that has the same representation as a base type at a low-level but cannot be used in its stead.

type Miles = distinct float32
type Meters = distinct float32

Procedures accepting:

  • float32 will not accept Miles or Meters
  • Miles will not accept float32 or Meters
  • Meters will not accept float32 or Miles

distinct type can reuse the base type procedures and fields via the borrow annotation as described in the manual (https://nim-lang.org/docs/manual.html#types-distinct-type)

Enforcing exception handling

The codebase uses a mix of Result and Exceptions for error handling with option types and bool in some cases.

As an effort to sanitize error handling and ensure that all exception paths are handled, we use the effect tracking system the following way:

proc foo() {.raises: [].} =
  discard

The procedure above will refuse to compile if its body can throw an unhandled exception.

proc foo() {.raises: [ValueError].} =
  discard

The procedure above will refuse to compile if its body can throw an exception besides a ValueError.

In particular Nim distinguishes between Defects, which are non-recoverable, and Exceptions, which we should recover from.

For our purposes, we allow all procedures to throw a Defect (for example an assertion broken), this is done by adding {.push raises:[Defect]} at the start of a file

{.push raises:[Defect]}

proc foo1() =
  discard

proc foo2() =
  discard

{.pop.}

Mutability

Only variables declared with var and var parameters are mutable.

Note that mutability analysis is not deep for reference types or pointer types. You can always mutate through a pointer.

Future

Significant improvements are in-progress planned to improve Nim safety:

however it is too early to use them in a production codebase.

Debugging Nim

Reference article: https://nim-lang.org/blog/2017/10/02/documenting-profiling-and-debugging-nim-code.html

GDB / LLDB

Nim can be instrumented with sourcemaps:

  • by passing --debugger:native to the compiler, so that the stacktraces in gdb shows the Nim source code
  • by passing --passC:"-g" to the compiler, so that the stacktraces in gdb shows the C source code

Sanitizers & Valgrind

LLVM and GCC sanitizers can be used with

nim c --cc:clang -r -d:release --debugger:native \
  --passC:"-fsanitize=address" --passL:"-fsanitize=address" \
  --outdir:build target_application.nim

Note on deactivating Nim memory allocator:

As mentioned in the memory management section, Nim has

  • a garbage collector, by default deferred reference counting + cycle detection via mark-and-sweep if the types can have cycles (and is not tagged {.acyclic.})
  • an allocator based on TLSF

Instead of Nim custom allocators, the sys malloc/free can be used by passing -d:useMalloc in the command-line

Some GC/versions might not properly accept the flag, this is a Nim bug and we can patch upstream and our own fork in that case

Fuzzers

TODO

Internal fuzzing

We are currently adding fuzzing to our repositories via libFuzzer and AFL.

External "Consensus" fuzzing

Sigma Prima is fuzzing all Ethereum 2 clients on the spec (Eth2 core/Validator core)

We provide them with a simple C API

That we implement:

And compile the nim code as a shared or static library.

On usage, the only extra limitation compared to a C library is the need to call NimMain() before calling any Nim function to initialize the Nim runtime.

Inspecting the generated intermediate code and objects

The intermediate C code and object files are stored in

  • $HOME/.cache/nim/compiled_target_d or $HOME/.cache/nim/compiled_target_r on UNIX (d for debug, r for release)
  • $HOME/nimcache/compiled_target_d or $HOME/.cache/nim/compiled_target_r on Windows

The cache directory can be set with the --nimcache=<targetdir> compiler flag.

Repositories vendoring nimbus-build-system do not use the default setting but generate directly in the (gitignored) nimcache folder of the repo.

Foreign language interop

Wrapping C

Using shared library

Example: secp256k1

https://github.com/status-im/nim-secp256k1/blob/master/secp256k1_abi.nim

Compiling directly the C files

Example: Apache Milagro Crypto

https://github.com/status-im/nim-blscurve/blob/master/blscurve/milagro.nim

Wrapping C++

Beyond the syntax for wrapping C, Nim offers a flexible syntax for wrapping C++, for example for vectors:

type
  CppVector* {.importcpp"std::vector", header: "<vector>", byref.} [T] = object

proc newCppVector*[T](): CppVector[T] {.importcpp: "std::vector<'*0>()", header: "<vector>", constructor.}
proc newCppVector*[T](size: int): CppVector[T] {.importcpp: "std::vector<'*0>(#)", header: "<vector>", constructor.}
proc len*(v: CppVector): int {.importcpp: "#.size()", header: "<vector>".}
proc add*[T](v: var CppVector[T], elem: T){.importcpp: "#.push_back(#)", header: "<vector>".}
proc `[]`*[T](v: CppVector[T], idx: int): T{.importcpp: "#[#]", header: "<vector>".}
proc `[]`*[T](v: var CppVector[T], idx: int): var T{.importcpp: "#[#]", header: "<vector>".}
proc `[]=`*[T](v: var CppVector[T], idx: int, value: T) {.importcpp: "#[#]=#", header: "<vector>".}

Example: ttmath

https://github.com/status-im/nim-ttmath/blob/master/src/ttmath.nim

Exporting

See "Fuzzing" chapter for exporting a C API for fuzzing

References

Nim threat model

Nim and its standard library are not the focus of the audits.

In particular the codebase intentionally limits reliance on the standard library so that it is not tied to Nim release schedule, instead of the standard library we use stew most of the time: https://github.com/status-im/nim-stew.

Nim standard library is implemented here:

We target Nim v1.2.2

Nim FAQ

Nimbus Beacon Chain

https://github.com/status-im/nimbus-eth2

Nimbus Beacon Chain (NBC) is an implementation of an Ethereum 2 client.

Audit scope

Network Core (leveraging the libp2p framework)

Sub-topic
Discovery Protocol (discv5)
Publish/Subscribe protocol
Eth2 Request/Response protocol
SSZ - (De)serialization & tree hashing
Wire encryption

ETH2 Specification core

Sub-topic
State transition logic
Signature verification
Epoch finalisation and justification
Reward processing
Eth1 data processing
Fork choice logic
Block processing and production
Attestation processing and production
Block synchronization
Peer pool management

Validator core and user experience

Sub-topic
Block/attestation signing
Slash-prevention mechanisms
RPC API
Accounts management & key storage
Command Line Interface (CLI)

High-level view of the stack

https://miro.com/app/board/o9J_kvfytDI=/

Diagram

TODO

Specifications

We target v1.0.1 phase0 of https://github.com/ethereum/consensus-specs

The p2p-interface specs in particular describe the subset of libp2p spec that are used to implement Ethereum 2

Resources

Ethereum 2.0 Ask Me Anything:

Nimbus build system & dependencies

Build system

NBC repo

nimbus-eth2 uses a build system based on makefiles. Usage is described in the README.

In particular a env.sh scripts is provided that setups the environment variable expected.

Libraries

Library repositories uses the official package manager called nimble for building and testing.

In particular, we use by convention the following command to run the test suite via nimble task system:

nimble test

The details are implemented in <package name>.nimble at the root of the repository.

The continuous integration setup for Travis, Azure Pipelines and/or Appveyor are also setting up a Nim environment from scratch.

Package dependencies

NBC repo

For the nimbus-eth2 application, all dependencies are vendored in the vendor folder.

With regards to the audit scope we have the following dependencies:

All

  • Async
    • asynctools
    • nim-chronos
  • Logging
    • jswebsockets
    • karax
    • nim-chronicles
    • nim-chronicles-tail (devtool, out-of-audit-scope)
    • nim-libbacktrace
    • websocket.nim
    • nim-protobuf-serialization
    • nim-rocksdb
  • CLI interface
    • nim-confutils
    • nim-prompt
    • nim-unicodedb
  • Metrics
    • nim-metrics (the server endpoint is out-of-audit-scope)
  • Helpers
    • nim-stew
    • nim-testutils (fuzzing)
    • nimYAML (should be test only)

Network core

  • Cryptography
    • nim-bearssl
    • nim-secp256k1
    • nimcrypto
  • Networking & Discovery
    • nim-eth
    • nim-libp2p
    • nim-nat-traversal
  • Serialization
    • nim-faststreams
    • nim-serialization
    • nim-snappy
  • BigInt
    • nim-stint (Note: one of the main use is 256-bit bloomfilter, a dedicated library should be used instead)

ETH2 Specification core

  • Cryptography
    • nim-blscurve
  • Database
    • nim-sqlite3-abi
  • Eth1 monitoring
    • nim-web3
  • Build system
    • nimbus-build-system

Validator core

  • Cryptography
    • nim-blscurve
  • RPC
    • news
    • nim-http-utils
    • nim-json-rpc
    • nim-json-serialization

NBC Threat model

NBC primarily targets resource restricted devices like Raspberry Pi or smartphones to desktop computers.

We focus on:

  • remote attacks

This includes but not limited to:

  • denial-of-services
  • retrieving a secret key
    • via software defect or misuse of libraries (cryptography, networking)
    • or poor key management practices or UI or documentation confusing the user
  • collusion of peers (eclipse, surround voting, ...)
  • loss of funds

In particular, we do not defend against a malicious cloud provider which would offer a “Rent-a-Raspberry-Pi” service for validators and would subsequently engage in attacks against those validators, the implest being, taking the validator offline when it's their duty time.

Sensitive data:

  1. Signing key: each validator has a secret signing key used to sign attestation and blocks with 32 ETH at stake (about 7500 USD at the current rate) Each beacon node instance can have dozens to hundreds of validators attached. Leaking the signing key would allow an attacker to double-vote with the key, leading to slashing and ultimately ejecting the validator once its stake reaches below 16 ETH. The signing key must be present in memory as a validator may have signing duties every 6 seconds
  2. Withdrawal key: a withdrawal key allows withdrawing the stake. A withdrawal key is unnecessary during day-to-day operation and can be stored in cold storage, possibly hardware wallet
  3. IP address: if a validator IP address leaks, they might be subject to target DoS attacks taking their node offline and making them miss their duties. In a regular setting, the network will have "relayer"/"listener" nodes and validator nodes that shouldn't be distinguishable.
  4. Random number generation:
    • Strong recommendations to generate the withdrawal key offline.
    • The RNG is used in particular to select which peer to connect to from a pool of candidates.

Cryptographic libraries

A review in in progress to select BLS signature backend.

See https://notes.status.im/nim-bls-curve-backends#Threat-model

Resources

Surround vote detection (not in audit scope): https://github.com/protolambda/eth2-surround

Serialization

Async/Await with the Chronos library

Cryptography

Ethereum Networking