A lightweight event collection system with some spice.

  • By silverton
  • Last update: Dec 6, 2022
  • Comments: 8

Buz

License GitHub tag (latest SemVer)

honey

Event Collection, Validation, and Delivery.

Buz is a system for collecting events from various sources, validating data quality, and delivering them to where they need to bee.

Quickstart

Quickstart documentation for setting up an end-to-end streaming analytics stack with Buz, Redpanda, Materialize, and Kowl can be found here.

Documentation

Documentation can be found here.

Download

buz.zip

Comments(8)

  • 1

    Short-circuit and return 400 when snowplow requests are invalid

    Right now, sometimes sending unexpected payloads can cause 500 responses with panics:

    import requests
    payload = {
      "schema": "iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0",
      "data": {
        "schema": "iglu:com.my_company/viewed_product/jsonschema/1-0-0",
        "data": {
          "product_id": "ASO01043",
          "price": 49.95
        }
      }
    }
    requests.post('http://localhost:8080/com.snowplowanalytics.snowplow/tp2', json=payload)
    
    2022/09/06 23:08:07 [Recovery] 2022/09/06 - 23:08:07 panic recovered:
    POST /com.snowplowanalytics.snowplow/tp2 HTTP/1.1
    Host: localhost:8080
    Accept: */*
    Accept-Encoding: gzip, deflate
    Connection: keep-alive
    Content-Length: 208
    Content-Type: application/json
    User-Agent: python-requests/2.27.1
    
    
    runtime error: invalid memory address or nil pointer dereference
    /usr/local/Cellar/go/1.19/libexec/src/runtime/panic.go:260 (0x103b9e4)
            panicmem: panic(memoryError)
    /usr/local/Cellar/go/1.19/libexec/src/runtime/signal_unix.go:835 (0x1053a1c)
            sigpanic: panicmem()
    /Users/jpcassil/buz/pkg/snowplow/eventBuilder.go:169 (0x1bf16f9)
            setTstamps: e.DvceCreatedTstamp = *getTimeParam(params, "dtm")
    /Users/jpcassil/buz/pkg/snowplow/eventBuilder.go:355 (0x1bf3ee7)
            BuildEventFromMappedParams: setTstamps(&event, params)
    /Users/jpcassil/buz/pkg/envelope/builderSnowplow.go:106 (0x1bf70fe)
            BuildSnowplowEnvelopesFromRequest: spEvent := snowplow.BuildEventFromMappedParams(c, event.Value().(map[string]interface{}), *conf)
    /Users/jpcassil/buz/pkg/handler/snowplow.go:21 (0x2d9da2e)
            SnowplowHandler.func1: envelopes := envelope.BuildSnowplowEnvelopesFromRequest(c, h.Config, h.CollectorMeta)
    /Users/jpcassil/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:168 (0x181145c)
            (*Context).Next: c.handlers[c.index](c)
    /Users/jpcassil/buz/pkg/middleware/identity.go:50 (0x2da8e5c)
            Identity.func1: c.Next()
    /Users/jpcassil/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:168 (0x181145c)
            (*Context).Next: c.handlers[c.index](c)
    /Users/jpcassil/buz/pkg/middleware/cors.go:28 (0x2da899a)
            CORS.func1: c.Next()
    /Users/jpcassil/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:168 (0x181145c)
            (*Context).Next: c.handlers[c.index](c)
    /Users/jpcassil/go/pkg/mod/github.com/gin-gonic/[email protected]/recovery.go:99 (0x18220d7)
            CustomRecoveryWithWriter.func1: c.Next()
    /Users/jpcassil/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:168 (0x181145c)
            (*Context).Next: c.handlers[c.index](c)
    /Users/jpcassil/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:555 (0x181fe93)
            (*Engine).handleHTTPRequest: c.Next()
    /Users/jpcassil/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:511 (0x181f964)
            (*Engine).ServeHTTP: engine.handleHTTPRequest(c)
    /usr/local/Cellar/go/1.19/libexec/src/net/http/server.go:2947 (0x14ad873)
            serverHandler.ServeHTTP: handler.ServeHTTP(rw, req)
    /usr/local/Cellar/go/1.19/libexec/src/net/http/server.go:1991 (0x14a73db)
            (*conn).serve: serverHandler{c.server}.ServeHTTP(w, w.req)
    /usr/local/Cellar/go/1.19/libexec/src/runtime/asm_amd64.s:1594 (0x10705a0)
            goexit: BYTE    $0x90   // NOP
    

    We should consider building in some nice mechanisms for graceful failures and helpful responses, where applicable.

    I'm not sure what about this payload in particular caused a panic.

  • 2

    Snowplow example not working.

    Perhaps I misunderstood this section: https://github.com/silverton-io/honeypot/blob/main/website/docs/examples/quickstart.md#4-send-events-to-honeypot

    but it indicates that the page it serves up at 8080 has snowplow integrated for easy testing. I don't think that's the case... I just get 404 page not found and I also get 404s when trying to POST or GET anything from localhost or where I have this served up in GCP right now. Not sure if I'm doing something wrong...

  • 3

    Add configurable sink-level retries

    Right now sinks do not retry if they fail. Having that level of guarantee is important.

    Sinks should be independently configurable to retry yes/no, and it would be cool to configure the retry strategy. Exponential decay with min/max? Something else?

  • 4

    Make application secrets more flexible

    Currently there's one way of providing secrets to honeypot - a file titled config.yml in the same directory as the binary. This is obviously non-ideal.

    Options (probably all of them are ideal)

    1. Add env var specifying config file location override
    2. Add env-var-based config overrides: LEVEL1_LEVEL2_VARNAME
    3. Add default config.yml to the docker build
  • 5

    Add tf outputs, add system/env flexibility

    This PR:

    1. Shuffles the terraform directory under deploy
    2. Adds system and env vars to multiple environments can be rapidly provisioned using the same code
    3. Shuffles variable building from main to locals
    4. Autogenerate cloud run service revision names
    5. Adds outputs
    6. Make bucket naming be user-flexible (admittedly this might be a step backward, since the using project id's in bucket names to keep them globally unique is πŸ”₯ )
  • 6

    391: GCP Terraform

    Issue 391

    Terraform for the GCP deploy as outlined by the GCP documentation steps. First ever Terraform for GCP so may not be following all the best standards. Main hurdle is that it has to be applied in two steps because of the manual process of uploading the image. Otherwise seems to work.

  • 7

    Make sink config simpler

    It would be cool if this thing just split events into tables without being configured to do so. But could be configured in reverse to say "no don't do that".

  • 8

    Create swagger documentation for endpoints

    buz endpoints like /stats, /schemas etc right now are kind of 'need to know', but ideally we have an auto generated documentation for them with descriptions of what they do and what we can expect back when we hit them.