Detect real-time threats on Op-Stack compatible blockchains

  • By Base
  • Last update: Aug 13, 2023
  • Comments: 15

pessimism

Because you can't always be optimistic

Pessimism is a public good monitoring service that allows for OP Stack and EVM compatible blockchains to be continuously assessed for real-time threats using custom defined user heuristic rule sets. To learn about Pessimism's architecture, please advise the documentation.

GitHub contributors GitHub commit activity GitHub Stars GitHub repo size GitHub

GitHub pull requests by-label GitHub Issues

Warning: Pessimism is currently experimental and very much in development. It means Pessimism is currently unstable, so code will change and builds can break over the coming months. If you come across problems, it would help greatly to open issues so that we can fix them as quickly as possible.

Setup

To use the template, run the following command(s):

  1. Create local config file (config.env) to store all necessary environmental variables. There's already an example config.env.template in the repo that stores default env vars.

  2. Download or upgrade to golang 1.19.

  3. Install all project golang dependencies by running go mod download.

To Run

  1. Compile pessimism to machine binary by running the following project level command(s):

    • Using Make: make build-app
  2. To run the compiled binary, you can use the following project level command(s):

    • Using Make: make run-app
    • Direct Call: ./bin/pessimism

Docker

  1. Ensure docker is installed on your machine

  2. Pull the latest image from Github container registry (ghcr) via docker pull ghcr.io/base-org/pessimism:latest

  3. Make sure you have followed the above instructions to create a local config file (config.env) using the config.env.template

  4. Run the following:

    • Without genesis.json:
    docker run -p 8080:8080 -p 7300:7300 --env-file=config.env -it ghcr.io/base-org/pessimism:latest
    • With genesis.json:
    docker run -p 8080:8080 -p 7300:7300 --env-file=config.env -it -v ${PWD}/genesis.json:/app/genesis.json ghcr.io/base-org/pessimism:latest

Note: If you want to bootstrap the application and run specific heuristics/pipelines upon start, update config.env BOOTSTRAP_PATH value to the location of your genesis.json file then run

Building and Running New Images

  • Run make docker-build at the root of the repository to build a new docker image.

  • Run make docker-run at the root of the repository to run the new docker image.

Linting

golangci-lint is used to perform code linting. Configurations are defined in .golangci.yml It can be ran using the following project level command(s):

  • Using Make: make lint
  • Direct Call: golangci-lint run

Testing

Unit Tests

Unit tests are written using the native go test library with test mocks generated using the golang native mock library. These tests live throughout the project's /internal directory and are named with the suffix _test.go.

Unit tests can run using the following project level command(s):

  • Using Make: make test
  • Direct Call: go test ./...

Integration Tests

Integration tests are written that leverage the existing op-e2e testing framework for spinning up pieces of the bedrock system. Additionally, the httptest library is used to mock downstream alerting services (e.g. Slack's webhook API). These tests live in the project's /e2e directory.

Integration tests can run using the following project level command(s):

  • Using Make: make e2e-test
  • Direct Call: go test ./e2e/...

Bootstrap Config

A bootstrap config file is used to define the initial state of the pessimism service. The file must be json formatted with its directive defined in the BOOTSTRAP_PATH env var. (e.g. BOOTSTRAP_PATH=./genesis.json)

Example File

[
    {
        "network": "layer1",
        "pipeline_type": "live",
        "type": "contract_event", 
        "start_height": null,
        "alerting_params": {
            "message": "",
            "destination": "slack"
        },
        "heuristic_params": {
            "address": "0xfC0157aA4F5DB7177830ACddB3D5a9BB5BE9cc5e",
            "args": ["Transfer(address, address, uint256)"]
        }
    },
    {
        "network": "layer1",
        "pipeline_type": "live",
        "type": "balance_enforcement", 
        "start_height": null,
        "alerting_params": {
            "message": "",
            "destination": "slack"
        },
        "heuristic_params": {
            "address": "0xfC0157aA4F5DB7177830ACddB3D5a9BB5BE9cc5e",
            "lower": 1,
            "upper": 2
       }
    }
]

Spawning a heuristic session

To learn about the currently supported heuristics and how to spawn them, please advise the heuristics' documentation.

Download

pessimism.zip

Comments(15)

  • 1

    Import testify/mock replaced with gomock

    Closes #26

    Hi, I have replaced testify/mock with gomock. EthClient is autogenerated with mockgen. Please let me know if the tests are rewritten correctly.

  • 2

    State Key Representation is Insecure

    Problem

    The current state key representation is very insecure given that a struct is used with no pointer references by the respective stateful component definitions. Currently this results in data loss of the state keys that are held in the component's structure once higher level callers garbage collect the value. Because of this, active components fail to lookup necessary stateful values for secure operation

    This bug is currently only prevalent in the current account_balance oracle implementation, consequently resulting in the existing balance_enforcement invariant to not work.

    There are some bandaid fixes that we could easily apply to remediate this in the short-term: I. Make the balance_oracle store a reference to the stateKey type instead of a direct value

    However, the fundamental representation of the key itself is flawed as by nature it should be a low-memory primitive type. Additionally, a lower level representation would ensure that there'd be need to no have a string representation in memory.

    Problem Solution - 1 (32 byte array)

    Update state key representation to be adherent with existing ID representations that are already used across the application. A byte slice representation would be ideal that encodes the necessary metadata for stateful lookups using the following 256 bit representation:

    type StateKey [32]byte
    

    With a byte encoding schema like:

    0       1        2        3                      22           32
    |-------|--------|--------|----------------------|--------------|
    
    nested    prefix  register     (optional) address         PUUID
                         byte
    

    This would avoid the need for pointers when referencing keys. Additionally, chars in go occupy 4 bytes, meaning that this rep stored as a string would be 128 bytes versus 32.

    Problem Solution - 2 (57 byte struct)

    Store references to state keys in the oracle metadata. This would require keeping the existing state key struct types as is; ie:

    // StateKey ... Represents a key in the state store
    type StateKey struct {
    	Nested bool // Indicates whether the key is nested
    	Prefix uint8
    	Key    string
    }
    
    func (sk StateKey) IsNested() bool {
    	return sk.Nested
    }
    
    // WithPUUID ... Adds a pipeline UUID to the state key prefix and returns a new state key
    func (sk StateKey) WithPUUID(pUUID PipelineUUID) StateKey {
    	return StateKey{
    		sk.Nested,
    		sk.Prefix,
    		pUUID.String() + ":" + sk.Key,
    	}
    }
    

    NOTE: We should probably update the struct definition to store the PUUID instead of doing a string concatenation inside a clone method

    Versus solution - 1

    In comparison to solution - 1, 2 will occupy more space as the Nested value would be a boolean; occupying 4 bytes instead of 1 and would require holding 25 bytes for the entire PUUID. However, other data fields (ie. nested, prefix, registerType, address) would occupy the same space as they do in solution - 1. It's important to note that this solution is more intuitive for developers and readability purposes.

  • 3

    Bump github.com/urfave/cli from 1.22.9 to 1.22.14

    Bumps github.com/urfave/cli from 1.22.9 to 1.22.14.

    Release notes

    Sourced from github.com/urfave/cli's releases.

    v1.22.14

    What's Changed

    Full Changelog: https://github.com/urfave/cli/compare/v1.22.13...v1.22.14

    v1.22.13

    What's Changed

    Full Changelog: https://github.com/urfave/cli/compare/v1.22.12...v1.22.13

    v1.22.12

    What's Changed

    Full Changelog: https://github.com/urfave/cli/compare/v1.22.11...v1.22.12

    v1.22.11

    What's Changed

    Full Changelog: https://github.com/urfave/cli/compare/v1.22.10...v1.22.11

    v1.22.10

    What's Changed

    Full Changelog: https://github.com/urfave/cli/compare/v1.22.9...v1.22.10

    Commits
    • f5ca62f Merge pull request #1748 from urfave/v1-update-deps
    • 8b495cc Update dependencies for v1
    • ed683a7 Merge pull request #1712 from urfave/v1-update-deps
    • 3133e8d Update dependencies for v1-maint
    • f0d71d7 Merge pull request #1693 from urfave/v1-bump-go-versions
    • e44ce3a Shift tested Go versions in v1-maint
    • 49f9838 Merge pull request #1654 from urfave/v1-ci-updates
    • a5c98d3 Do not run the toc command on windows
    • 9b65c47 Update toc command for windows compat
    • d391605 Compare file contents ignoring Windows-specific line endings
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • 4

    `Pipeline` Lacks Introspective Capabilities into its Components

    Problem

    Existing Pipeline struct receiver logic has no support for an event loop routine that can read existing component states to understand when: I. A component crashes or stops II. An oracle component finishes backfilling: syncing --> live III. (STRETCH) Inter-component communication latency is below some threshold

    This logic is critical for the etlManager abstraction to be able to: I. Perform merging operations when a syncing pipeline has become live; with there existing >1 identical pipelines with the same ID and live state II. Understand when pipelines have successfully shutdown IV. Understand when pipelines have crashed in real-time

    Proposed Solution

    Two current possible solution paths, both of which are open for discussion:

    Solution 1: Higher Order Event Polling

    This event loop should run in the following fashion to understand when key changes to component (consequently pipeline) state occur using some interval based polling algorithm like this:

    
    while True:
      time.sleep(x)
      oracle_state = pipeline[0].state
      
      if pipeline.state != PipelineState(oracle_state): 
         pipeline.state = oracle_state
         emit pipeline.state
         
      for i, state in enumerate([c.state for c in pipeline]):
         if i == 0:
            continue
             
         if state != "active" :
            pipeline.state = state
            emit pipeline.state 
    

    This issues with this approach are: I. Polling increases computational load II. Increased latency given time discrepancies between when a component state change happens versus when the poller performs a pipeline read

    Solution 2: Event Based

    Otherwise, a more event (listener/subscriber) based model could be leveraged where the pipeline is actively listening for components to emit activationState events. This

    The issues with this approach are: I. (abstraction leak) Components need to have higher order knowledge (i.e, go channel) of the greater Pipeline. II. Increased concurrency management.

    NOTE

    These proposed solutions don't take failure management into considerations For example, in the instance of a failed pipeline, we could enact some retry procedure to re-attempt running it N times. These fail safe or recovery procedures should be explored in a subsequent issue.

  • 5

    Execution Timing Metric & Code Comments

    Fixes Issue

    Fixes #

    Changes proposed

    • Added execution timing metric for risk engine
    • Added execution timing

    Screenshots (Optional)

    Note to reviewers

  • 6

    Bump github.com/go-chi/chi from 1.5.4 to 4.1.2+incompatible

    Bumps github.com/go-chi/chi from 1.5.4 to 4.1.2+incompatible.

    Release notes

    Sourced from github.com/go-chi/chi's releases.

    v4.1.2

    v4.1.1

    v4.1.0

    • middleware.LogEntry: Write method on interface now passes the response header and an extra interface type useful for custom logger implementations.
    • middleware.WrapResponseWriter: minor fix
    • middleware.Recoverer: a bit prettier
    • History of changes: see https://github.com/go-chi/chi/compare/v4.0.4...v4.1.0

    v4.0.4

    v4.0.3

    v4.0.2

    minor fixes. see https://github.com/go-chi/chi/compare/v4.0.1...v4.0.2

    v4.0.1

    Fixes issue with compress middleware: #382 #385

    v4.0.0

    • chi v4 requires Go 1.10.3+ (or Go 1.9.7+) - we have deprecated support for Go 1.7 and 1.8
    • router: respond with 404 on router with no routes (#362)
    • router: additional check to ensure wildcard is at the end of a url pattern (#333)
    • middleware: deprecate use of http.CloseNotifier (#347)
    • middleware: fix RedirectSlashes to include query params on redirect (#334)
    • History of changes: see https://github.com/go-chi/chi/compare/v3.3.4...v4.0.0

    v3.3.4

    Minor middleware improvements. No changes to core library/router. Moving v3 into its own branch as a version of chi for Go 1.7, 1.8, 1.9, 1.10, 1.11

    History of changes: https://github.com/go-chi/chi/compare/v3.3.3...v3.3.4

    Master will switch into v4, where we will only support Go versions inline with Go's own policy, https://golang.org/doc/devel/release.html#policy (aka, last 2 versions)

    ... (truncated)

    Changelog

    Sourced from github.com/go-chi/chi's changelog.

    Changelog

    v5.0.8 (2022-12-07)

    v5.0.7 (2021-11-18)

    v5.0.6 (2021-11-15)

    v5.0.5 (2021-10-27)

    v5.0.4 (2021-08-29)

    v5.0.3 (2021-04-29)

    v5.0.2 (2021-03-25)

    v5.0.1 (2021-03-10)

    v5.0.0 (2021-02-27)

    • chi v5, github.com/go-chi/chi/v5 introduces the adoption of Go's SIV to adhere to the current state-of-the-tools in Go.
    • chi v1.5.x did not work out as planned, as the Go tooling is too powerful and chi's adoption is too wide. The most responsible thing to do for everyone's benefit is to just release v5 with SIV, so I present to you all, chi v5 at github.com/go-chi/chi/v5. I hope someday the developer experience and ergonomics I've been seeking will still come to fruition in some form, see golang/go#44550

    ... (truncated)

    Commits
    • 86f9a6e Release v4.1.2
    • fdba45d cosmetic, move public methods to top of source file
    • 234035e README, add note on go-chi/httprate middleware
    • e7728c6 Replace wildcards correctly in RoutePattern (#515)
    • 5704d7e fix: handle methodnotallowed with path variables (#512)
    • ccb4c33 README
    • 2333c5c Trying Github sponsor thing
    • 1fafc30 Release v4.1.1
    • 99ad97f Route recursive-search regexp patterns (#506)
    • 23b8ec2 middleware.RouteHeaders: new middleware to allow header based routing flows (...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • 7

    DoS vector w/ Infinite Go Routines

    Risk Vector

    Currently, the /v0/invariant endpoint can be exploited by an attacker to create infinite go routines on an application instance of Pessimism. While deduplication policies do exist within the ETL implementation, an attacker could meticulously request invariant deployments where deduplication would not occur for some time (eg. backfill from L1 genesis block). Eventually the machine running the Pessimism app would exhaust computation resources (ie. cpu, ram) to the point where the app could no longer function.

    Mitigation(s)

    • Introduce concurrency rate limiting within the Pessimism application logic (ie. MAX_GO_ROUTINES) OR
    • Introduce a max number of active pipelines that can exist at once (ie. MAX_PIPELINES)
  • 8

    ETL DAG & Pipeline Support

    Fixes Issue

    Closes https://github.com/base-org/pessimism/issues/7

    Changes proposed

    Added higher level component management abstractions for handling component connectivity, routing, and introspection:

    • manager - Used for managing ETL pipelines and pipeline DAG for deploying new pipelines provided some higher order config.
    • graph - Used to for managing and representing ETL pipeline components as graph nodes.
    • pipeline - Used for storing pipeline metadata; i,e. components, ids, internal activity states.
    • Changed conduit to etl; seems more intuitive that way ;)

    Extended existing component level logic with constructs to better support component modularity and seamless inter-connectivity:

    • Added metaData struct that's inherited by all component types to store component agnostic field data
    • Added ingress struct that tracks all component entrypoint (register_type-->channel) information. Useful for algorithmic connectivity in pipeline DAG and supporting future multi-data source imports per a component. (i,e. https://github.com/base-org/pessimism/issues/12)
    • Added localized ID schemas for components and pipelines that are both represented as encoded byte arrays.

    Moved away from common models data package to more generalized core module that stores subsystem (ETL, Risk Engine, API) agnostic constructs.

    Screenshots (Optional)

    2023-04-17T03:40:24.950-0700	INFO	component/pipe.go:69	Starting event loop	{"ID": "layer1:live:pipe:blackhole.tx"}
    2023/04/17 03:40:24 ===============================================
    2023/04/17 03:40:24 Reading layer 1 EVM blockchain for live contract creation txs
    2023/04/17 03:40:24 ===============================================
    2023/04/17 03:40:25 17065860
    2023-04-17T03:40:25.779-0700	DEBUG	component/oracle.go:102	Sending data	{"ID": "layer1:live:oracle:geth.block"}
    2023-04-17T03:40:25.779-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023-04-17T03:40:25.779-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023/04/17 03:40:26 17065861
    2023-04-17T03:40:26.315-0700	DEBUG	component/oracle.go:102	Sending data	{"ID": "layer1:live:oracle:geth.block"}
    2023-04-17T03:40:26.315-0700	DEBUG	component/pipe.go:90	Received output data	{"ID": "layer1:live:pipe:contract.create.tx", "Length": 1}
    2023-04-17T03:40:26.315-0700	DEBUG	component/pipe.go:98	Sending data batch	{"ID": "layer1:live:pipe:contract.create.tx", "Type": "contract.create.tx"}
    2023/04/17 03:40:26 ===============================================
    2023/04/17 03:40:26 Received Contract Creation (CREATE) Transaction {Timestamp:2023-04-17 03:40:26.315047 -0700 PDT m=+2.379844187 Type:contract.create.tx Value:0xc0001a5960}
    2023/04/17 03:40:26 ===============================================
    2023/04/17 03:40:26 As parsed transaction &{inner:0xc0003b3400 time:{wall:13909198564892146232 ext:2379696577 loc:0x10066e5e0} hash:{v:<nil>} size:{v:<nil>} from:{v:{signer:0xc000029e80 from:[173 113 73 21 42 101 230 236 151 173 215 177 177 249 23 220 175 207 155 33]}} rollupGas:{v:<nil>}}
    2023-04-17T03:40:26.315-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023/04/17 03:40:27 17065862
    2023-04-17T03:40:27.172-0700	DEBUG	component/oracle.go:102	Sending data	{"ID": "layer1:live:oracle:geth.block"}
    2023-04-17T03:40:27.172-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023-04-17T03:40:27.172-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023/04/17 03:40:28 17065863
    2023-04-17T03:40:28.212-0700	DEBUG	component/oracle.go:102	Sending data	{"ID": "layer1:live:oracle:geth.block"}
    2023-04-17T03:40:28.212-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023-04-17T03:40:28.212-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023/04/17 03:40:29 17065864
    2023-04-17T03:40:29.170-0700	DEBUG	component/oracle.go:102	Sending data	{"ID": "layer1:live:oracle:geth.block"}
    2023-04-17T03:40:29.170-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023-04-17T03:40:29.170-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023/04/17 03:40:30 17065865
    2023-04-17T03:40:30.172-0700	DEBUG	component/oracle.go:102	Sending data	{"ID": "layer1:live:oracle:geth.block"}
    2023-04-17T03:40:30.172-0700	DEBUG	component/pipe.go:90	Received output data	{"ID": "layer1:live:pipe:blackhole.tx", "Length": 1}
    2023-04-17T03:40:30.172-0700	DEBUG	component/pipe.go:98	Sending data batch	{"ID": "layer1:live:pipe:blackhole.tx", "Type": "blackhole.tx"}
    2023-04-17T03:40:30.172-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023/04/17 03:40:30 ===============================================
    2023/04/17 03:40:30 Received Blackhole (NULL) Transaction {Timestamp:2023-04-17 03:40:30.172014 -0700 PDT m=+6.236718216 Type:blackhole.tx Value:0xc00029e540}
    2023/04/17 03:40:30 ===============================================
    2023/04/17 03:40:30 As parsed transaction &{inner:0xc0002c6300 time:{wall:13909198569043363528 ext:6235853559 loc:0x10066e5e0} hash:{v:<nil>} size:{v:<nil>} from:{v:{signer:0xc000029800 from:[200 38 218 174 175 182 13 85 114 124 151 228 44 197 44 255 113 201 173 220]}} rollupGas:{v:<nil>}}
    2023/04/17 03:40:31 17065866
    2023-04-17T03:40:31.342-0700	DEBUG	component/oracle.go:102	Sending data	{"ID": "layer1:live:oracle:geth.block"}
    2023-04-17T03:40:31.342-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023-04-17T03:40:31.342-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023/04/17 03:40:32 17065867
    2023-04-17T03:40:32.161-0700	DEBUG	component/oracle.go:102	Sending data	{"ID": "layer1:live:oracle:geth.block"}
    2023-04-17T03:40:32.161-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023-04-17T03:40:32.161-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023/04/17 03:40:33 17065868
    2023-04-17T03:40:33.237-0700	DEBUG	component/oracle.go:102	Sending data	{"ID": "layer1:live:oracle:geth.block"}
    2023-04-17T03:40:33.237-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023-04-17T03:40:33.237-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023/04/17 03:40:34 17065869
    2023-04-17T03:40:34.190-0700	DEBUG	component/oracle.go:102	Sending data	{"ID": "layer1:live:oracle:geth.block"}
    2023-04-17T03:40:34.190-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    2023-04-17T03:40:34.190-0700	DEBUG	component/pipe.go:94	Received output data of length 0
    

    Next Steps

    1. Explore and threat model pipeline edge cases to ensure robust & resilient system
    2. Begin developing open-source architectural documentation to showcase overarching Pessimism architecture
  • 9

    Changed `invariant` Abstraction to `Heuristic`

    Fixes Issue

    Fixes #

    Changes proposed

    The term invariant is rather unorthodox and difficult for people to reason about; especially provided that its a more formal mathematical term. Because of this, we have opted to use a more clean/concise abstraction. Internal conversations polled for the term Heuristic to be used.

    Screenshots (Optional)

    Note to reviewers

  • 10

    Refactored Invariant Registry & Added Fault Detector Implementation

    Fixes Issues

    • https://github.com/base-org/pessimism/issues/94
    • https://github.com/base-org/pessimism/issues/76

    Fixes #

    Changes proposed

    • Added a CI action to upload test coverage percentages in each PR
    • Added an implementation of fault detection
    • Refactored invariant registry implementation and added a lot of unit tests

    Screenshots (Optional)

    Note to reviewers

  • 11

    Bump github.com/stretchr/testify from 1.8.2 to 1.8.4

    Bumps github.com/stretchr/testify from 1.8.2 to 1.8.4.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • 12

    [RFC] Native Bridge Supply Monitoring

    Heuristic Description

    A heuristic should exist that monitors for discrepancies between the locked ETH amount on the L1 OptimismPortal and the unlocked amount on L2. This allows for deep introspection into the safety of the native bridge and allows chain operators to detect real-time anomalies.

    Consideration(s)

    1. ETH can be burnt arbitrarily on L2 by either sending it to the 0x000 blackhole address or immobilizing it via SELFDESTRUCT operations that specify the calling contract as the beneficiary. This could result in potential discrepancies between the circulating supply on L2 and the locked supply on L1. Burns can also be trigged by anyone on the L2ToL1MessagePasser contract to delete initiated withdrawal funds.
    2. This implementation requires having access to all historical L1/L2 bridge events from genesis of the L2 chain (first L2 block, first deposit transaction block on L1). Unfortunately, Pessimism has no way to persist this data nor does it have efficient backfilling capabilities. Integrating with the OP Indexer is required for seamless unblocking.
    3. There are a few ways to calculate variations of the l1 supply: prospective - Compute using initialized withdrawals from L2 Intermittent - Compute using proven withdrawals on L1 literal - Compute using accredited withdrawals on L1 Each of these can be subtracted by either the deposit sum or the OptimismPortal contract ETH value to represent an L1 supply.

    Pseudocode

    The following pseudocode demonstrates a high level implementation for how to compute different native bridge supplies on both L1 and L2. Additionally, it showcases a lightweight heuristic analysis example.

    
    # Get L1 deposit supply by traversing all op_portal deposit events
    def get_l1_deposit_supply() -> int:
        deposit_events = get_deposit_events(op_portal)
        return sum([ e for event.value in deposit_events])
    
    # Get L2 deposit supply via summating all deposit tx type values
    def get_l2_deposit_supply() -> int:
      all_deposits = get_deposit_txs()
      deposit_sum = sum([e for e.value in all_deposits])
      return deposit_sum
      
    # Get L2 withdrawal supply via iterating through proven_withdrawals
    # on L1 and finding the associated message event on L2 that holds the
    # the ETH value, summate when found
    def get_l2_withdrawal_supply() -> int:
       proven_withdrawals = get_l1_proven_withdrawals()
       amt = 0
       for withdrawal in proven_withdrawals:
         l2_message = get_l2_message(withdrawal.from, withdrawal.to)
         amt+=l2.message.value
    
       return amt
    
    # Gets the total amount of ETH burnt via the L2ToL1MessagePasser on L2 
    def get_l2_burn_supply(message_passer) -> int:
       amt = 0
       burns = get_burn_events(message_passer)
       for event in burns:
          amt += event.value
       
        # lets assume any ETH locked in the contract will inevitably be burnt
        amt += message_passer.value()
       return amt
    
    # Compute L1 supply by subtracting the amount burnt from the total amount deposited
    # Note - there will be a slight discrepancy given that the l2_withdrawal_supply will likely
    # have withdrawals that haven't been accredited/proven on L1 yet
    prospective_l1_supply = get_l1_deposit_supply() - get_l2_withdrawal_supply()
    
    # Compute L1 supply by subtracting the amount in proven withdrawals from the total amount
    # Note - This will require correlating a withdrawal hash on L1 to a sentMessage on L2
    l1_supply_waiting = get_l1_deposit_supply() - get_l1_proven_withdrawal_amount()
    
    # Get the actual L1 supply by getting the ETH amount for the op_portal
    actual_l1_supply = optimism_portal.balance()
    
    # Compute L2 supply by subtracting the amount deposited from the amount burnt
    # by the L2ToL1MessagePasser contract
    l2_supply = get_l2_deposit_supply() - get_l2_burn_supply()
    
    ## Run invariant analysis
    # NOTE - Assume x is a set of user defined float inputs per each invariant
    
    # I0 
    if percent_diff(actual_l1_supply, l1_supply_waiting) > x0:
       ALERT()
       
    #I1
    if percent_diff(actual_l1_supply, prospective_l1_supply) > x1:
       ALERT()
       
    #I2
    if percent_diff(prospective_l1_supply, l2_supply) > x2:
       ALERT()
    
    
  • 13

    [RFC] Event/Transaction Frequency Heuristic

    Heuristic Description

    NOTE - Some the provided use cases could be redundant with existing telemetry that's already leveraged by OP Stack chain operators

    Context

    Some system contract events that emit from OP Stack contracts occur on semi-deterministic time intervals. The same applies to some protocol transactions as well. For example, proposer submission to the L2OutputOracle typically occur around every hour on Base (see Etherscan). If this event were to not occur on its expected interval and be missed, it could be indicative of potential sequencer/protocol liveness failures.

    Example Use Cases

    Some further examples of other potential liveness failures that could be caught via this heuristic:

    1. Anomalous batch submission failure where the BatchInbox is not being posted to by the op-batcher
    2. Proposer failure where OutputProposal events fail to post to the L2OutputOracle contract via the op-proposer
    3. Anomalous (deposit,withdrawal) bridge transaction/event frequencies could be indicative of either protocol (e.g, paused withdrawals on L1) or integration failure (e.g, crashed entry-point UIs that most users use to bridge funds to layer2).

    Technical Details

    There should exist some calculation policies for how an event/transaction delta is computed that is configurable via the heurisitc_params; e.g:

    • static: Keep track of the last time a specific event (address,sig) or transaction (to,from) was emitted/executed. If the time delta from the last event goes above a user defined threshold, then alert. This is useful for monitoring frequencies that are expected to be constant (e.g, sequencer/batcher l1 submission times).
    • dynamic: Compute the moving average (MA) of n prior time deltas for some (address,event) or (to,from) pair. After computing MA, calculate the percent_diff between MA and the most recent time delta. If percent_diff falls above some user defined threshold, then alert. This could be useful for monitoring repetitive events with differing frequencies based on entropic factors like user usage (e.g, bridging deposits, withdrawals).

    Additionally a type enum should supported that assumes either transaction or contract_event.

  • 14

    Bump github.com/libp2p/go-libp2p from 0.25.1 to 0.27.8

    Bumps github.com/libp2p/go-libp2p from 0.25.1 to 0.27.8.

    Release notes

    Sourced from github.com/libp2p/go-libp2p's releases.

    v0.27.8

    This patch release contains backports of:

    Note that in order to be protected against the DoS attack making use of large RSA keys, it's necessary to update to this patch release AND to use the updated Go compiler (1.20.7 or 1.19.12, respectively).

    Full Changelog: https://github.com/libp2p/go-libp2p/compare/v0.27.7...v0.27.8

    v0.27.7

    What's Changed

    • fix: in the swarm move Connectedness emit after releasing conns #2373
    • identify: set stream deadlines for Identify and Identify Push streams #2382

    Full Changelog: https://github.com/libp2p/go-libp2p/compare/v0.27.6...v0.27.7

    v0.27.6

    What's Changed

    • Clean up stream scope in case of error

    Full Changelog: https://github.com/libp2p/go-libp2p/compare/v0.27.5...v0.27.6

    v0.27.5

    What's Changed

    Full Changelog: https://github.com/libp2p/go-libp2p/compare/v0.27.3...v0.27.5

    v0.27.4

    What's Changed

    • identify
      • Fixed an issue where we now avoid spuriously triggering pushes
      • Fixed an issue where signed peer records weren’t rejected if the signature didn’t match
    • swarm
      • Fixed duplicate tracking in dial worker loop

    v0.27.3

    This patch release contains a fix for a rare panic that occurs on Windows systems (backport of libp2p/go-libp2p#2276).

    Full Changelog: https://github.com/libp2p/go-libp2p/compare/v0.27.1...v0.27.3

    v0.27.2

    What's Changed

    quic: fix race condition when generating random holepunch packet (libp2p/go-libp2p#2263) webtransport: initialize the certmanager when creating the transport (libp2p/go-libp2p#2268)

    ... (truncated)

    Changelog

    Sourced from github.com/libp2p/go-libp2p's changelog.

    Table Of Contents

    v0.28.0

    🔦 Highlights

    Smart Dialing

    This release introduces smart dialing logic. Currently, libp2p dials all addresses of a remote peer in parallel, and aborts all outstanding dials as soon as the first one succeeds. Dialing many addresses in parallel creates a lot of churn on the client side, and unnecessary load on the network and on the server side, and is heavily discouraged by the networking community (see RFC 8305 for example).

    When connecting to a peer we first determine the order to dial its addresses. This ranking logic considers a number of corner cases described in detail in the documentation of the swarm package (swarm.DefaultDialRanker). At a high level, this is what happens:

    • If a peer offers a WebTransport and a QUIC address (on the same IP:port), the QUIC address is preferred.
    • If a peer has a QUIC and a TCP address, the QUIC address is dialed first. Only if the connection attempt doesn't succeed within 250ms, a TCP connection is started.

    Our measurements on the IPFS network show that for >90% of established libp2p connections, the first connection attempt succeeds, leading a dramatic decrease in the number of aborted connection attempts.

    We also added new metrics to the swarm Grafana dashboard, showing:

    • The number of connection attempts it took to establish a connection
    • The delay introduced by the ranking logic

    This feature should be safe to enable for nodes running in data centers and for most nodes in home networks. However, there are some (mostly home and corporate networks) that block all UDP traffic. If enabled, the current implementation of the smart dialing logic will lead to a regression, since it preferes QUIC addresses over TCP addresses. Nodes would still be able to connect, but connection establishment of the TCP connection would be delayed by 250ms.

    In a future release (see #1605 for details), we will introduce a feature called blackhole detection. By observing the outcome of QUIC connection attempts, we can determine if UDP traffic is blocked (namely, if all QUIC connection attempts fail), and stop dialing QUIC in this case altogether. Once this detection logic is in place, smart dialing will be enabled by default.

    More Metrics!

    Since the last release, we've added metrics for:

    WebTransport

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.
  • 15

    Configurable Alerting Definition

    Problem

    Currently, the severity routing configuration is hardcoded. While this works for some opinionated use cases, it fails to be generalized to many unique use cases. Additionally the service only supports integration with a single slack webhook and two pagerduty integration keys. Ideally, a user could have up to many slack webhooks and pager-duty services to alert.

    Problem Solution

    Support some alerting definition JSON that allows for severity routing definitions of multiple downstream dependencies for a severity. This will be a global definition that the application ingests and uses to understand where to alert.

    This would look something like:

    {
        "sev_low": [
              {
                "destination": "slack",
                "config": {},
              }
           ]
         "sev_mid": [
             {
               "destination": "slack",
               "config": {},
              },{
                  "destination": "pagerduty",
                  "config": {},
             }]
    }