A monitoring and troubleshooting tool for microservice architectures.

  • By Coroot
  • Last update: Dec 29, 2022
  • Comments: 12

Coroot is a monitoring and troubleshooting tool for microservice architectures.

Go Report Card License

Features

eBPF-based service mapping

Thanks to eBPF, Coroot shows you a comprehensive map of your services without any code changes.

Log analysis without storage costs

Node-agent turns terabytes of logs into just a few dozen metrics by extracting repeated patterns right on the node. Using these metrics allows you to quickly and cost-effectively find the errors relevant to a particular outage.

Cloud topology awareness

Coroot uses cloud metadata to show which regions and availability zones each application runs in. This is very important to known, because:

  • Network latency between availability zones within the same region can be higher than within one particular zone.
  • Data transfer between availability zones in the same region is paid, while data transfer within a zone is free.

Advanced Postgres observability

Coroot makes troubleshooting Postgres-related issues easier not only for experienced DBAs but also for engineers not specialized in databases.

Integration into your existing monitoring stack

Coroot uses Prometheus as a Time-Series Database (TSDB):

  • The agents are Prometheus-compatible exporters
  • Coroot itself is a Prometheus client (like Grafana)

Built-in Prometheus cache

The built-in Prometheus cache allows Coroot to provide you with a blazing fast UI without overloading your Prometheus.

Installation

You can run Coroot as a Docker container or deploy it into any Kubernetes cluster. Check out the Installation guide.

Documentation

The Coroot documentation is available at coroot.com/docs/coroot-community-edition.

License

Coroot is licensed under the Apache License, Version 2.0.

Download

coroot.zip

Comments(12)

  • 1

    Coroot UI doesn't see pg-agent metrics

    Hello, Thank you for this wonderful and convinient instrument! Unfotunately, I met some troubles with pg-agent.

    After installation I've seen the relevant metrics in Prometheus, but my Coroot UI doesn't see them (though it registered the instance of pg-agent)

    image image

    I started pg-agent with docker run -d --name coroot-pg-agent -p <port>:80 --env DSN="postgresql://<user>:<password>@<ip>:5432/postgres?connect_timeout=1&statement_timeout=30000" ghcr.io/coroot/coroot-pg-agent

    But got in logs of Coroot UI container: couldn't find actual instance for "postgres", initial instance is "[email protected]" (map[]) (yes, I have the role of postgres_exporter in PostgreSQL, but use another role for Coroot pg-agent). What have I done wrong?

    Also, I've got none of those metrics: pg_lock_awaiting_queries, pg_wal_receiver_status, pg_wal_replay_paused, pg_wal_receive_lsn, pg_wal_reply_lsn. May be, it was happened because of using PostgreSQL 11 version?

  • 2

    Not getting service maps

    Coroot 0.2.2 Coroot-node-agent 1.0.19 k3s v1.24.4+k3s1

    Status reports everything is ok:

    prometheus: ok
    coroot-node-agent: 7 nodes found
    kube-state-metrics: 186 applications found
    

    As an example, i took loki in distributed setup.

    In app search, i can see them: image

    But if i open app details, i can't see inter-communication between components: image

    Wondering if i'm missing something? Maybe specific labels?

  • 3

    Node Agent is configured but the coroot is not identifying

    I have a problem, even configuring node-agent the coroot is not working and collecting node metrics. Below is the YAML files. Basically I implemented it in the default, I'm also using kube-prometheus-stack but I don't use the podSelector configuration

    image

    As pictured above, the configuration with promtheus is OK.

    image

    The agent node daemon sets are apparently healthy too but looking into coroot web UI always show coroot-node-agent : no agent installed

    I've followed the document below to install node-agent

    https://coroot.com/docs/metric-exporters/node-agent/overview

    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    
    ---
    
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      labels:
        app: coroot-node-agent
      name: coroot-node-agent
      namespace: monitoring
    spec:
      selector:
        matchLabels:
          app: coroot-node-agent
      template:
        metadata:
          labels:
            app: coroot-node-agent
          annotations:
            prometheus.io/scrape: 'true'
            prometheus.io/port: '80'
        spec:
          tolerations:
            - operator: Exists
          hostPID: true
          containers:
            - name: coroot-node-agent
              image: ghcr.io/coroot/coroot-node-agent:latest
              args: ["--cgroupfs-root", "/host/sys/fs/cgroup"]
              ports:
                - containerPort: 80
                  name: http
              securityContext:
                privileged: true
              volumeMounts:
                - mountPath: /host/sys/fs/cgroup
                  name: cgroupfs
                  readOnly: true
                - mountPath: /sys/kernel/debug
                  name: debugfs
                  readOnly: false
          volumes:
            - hostPath:
                path: /sys/fs/cgroup
              name: cgroupfs
            - hostPath:
                path: /sys/kernel/debug
              name: debugfs
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: PodMonitor
    metadata:
      name: coroot-node-agent
      namespace: monitoring
    spec:
      selector:
        matchLabels:
          app: coroot-node-agent
      podMetricsEndpoints:
        - port: http
    
  • 4

    microk8s failed to inspect container

    I'm having trouble getting service maps working. I installed coroot into a single node microk8s cluster, all applications show external endpoints, and no CPU/Memory data is picked up.

    I used helm which installed the following coroot versions:

    $ helm install --namespace coroot --create-namespace coroot coroot/coroot
    
       image: ghcr.io/coroot/coroot-node-agent:1.6.1
       image: ghcr.io/coroot/coroot:0.11.0
    
    $ microk8s version
    MicroK8s v1.25.4 revision 4221
    
    $ uname -a
    Linux micro.k8s 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
    

    node agent logs are showing "failed to inspect container" errors

    + coroot-node-agent-cc898 › node-agent
    coroot-node-agent-cc898 node-agent W1228 12:01:01.468515 2813613 registry.go:323] failed to inspect container 3525641293dd8d8f2974aa8e1dd605f291f233356a9a5a9a931535c8ebc13df7: Error: No such container: 3525641293dd8d8f2974aa8e1dd605f291f233356a9a5a9a931535c8ebc13df7
    coroot-node-agent-cc898 node-agent W1228 12:01:01.468993 2813613 registry.go:332] failed to inspect container 3525641293dd8d8f2974aa8e1dd605f291f233356a9a5a9a931535c8ebc13df7: container "3525641293dd8d8f2974aa8e1dd605f291f233356a9a5a9a931535c8ebc13df7" in namespace "k8s.io": not found
    coroot-node-agent-cc898 node-agent W1228 12:01:01.469017 2813613 registry.go:244] failed to get container metadata for pid 2814392 -> /kubepods/burstable/podb2eadec9-fe40-4534-b14f-c4f4fbe695fc/3525641293dd8d8f2974aa8e1dd605f291f233356a9a5a9a931535c8ebc13df7: failed to interact with dockerd (Error: No such container: 3525641293dd8d8f2974aa8e1dd605f291f233356a9a5a9a931535c8ebc13df7) or with containerd (container "3525641293dd8d8f2974aa8e1dd605f291f233356a9a5a9a931535c8ebc13df7" in namespace "k8s.io": not found)
    coroot-node-agent-cc898 node-agent W1228 12:01:01.639019 2813613 registry.go:323] failed to inspect container c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0: Error: No such container: c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0
    coroot-node-agent-cc898 node-agent W1228 12:01:01.639488 2813613 registry.go:332] failed to inspect container c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0: container "c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0" in namespace "k8s.io": not found
    coroot-node-agent-cc898 node-agent W1228 12:01:01.639507 2813613 registry.go:244] failed to get container metadata for pid 169108 -> /kubepods/besteffort/pod6cfd3792-d0be-4835-905e-11415fab06bb/c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0: failed to interact with dockerd (Error: No such container: c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0) or with containerd (container "c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0" in namespace "k8s.io": not found)
    coroot-node-agent-cc898 node-agent I1228 12:01:01.639514 2813613 registry.go:197] TCP connection error from unknown container {connection-error none 169108 10.1.93.101:33846 192.168.1.197:8126 8 0 <nil>}
    coroot-node-agent-cc898 node-agent W1228 12:01:01.639827 2813613 registry.go:323] failed to inspect container c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0: Error: No such container: c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0
    coroot-node-agent-cc898 node-agent W1228 12:01:01.640051 2813613 registry.go:332] failed to inspect container c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0: container "c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0" in namespace "k8s.io": not found
    coroot-node-agent-cc898 node-agent W1228 12:01:01.640063 2813613 registry.go:244] failed to get container metadata for pid 168714 -> /kubepods/besteffort/pod6cfd3792-d0be-4835-905e-11415fab06bb/c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0: failed to interact with dockerd (Error: No such container: c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0) or with containerd (container "c14e5fc91c78fd05debecb9e5a805403bb7a6d4d2a75750ad4b5ba25a3b249c0" in namespace "k8s.io": not found)
    coroot-node-agent-cc898 node-agent I1228 12:01:01.640067 2813613 registry.go:197] TCP connection error from unknown container {connection-error none 168714 10.1.93.101:33858 192.168.1.197:8126 8 0 <nil>}
    coroot-node-agent-cc898 node-agent W1228 12:01:01.960432 2813613 registry.go:323] failed to inspect container 9352666a52fae4b94047fd1f07620b4ef23826c079ff36064cc025ac24877f84: Error: No such container: 9352666a52fae4b94047fd1f07620b4ef23826c079ff36064cc025ac24877f84
    coroot-node-agent-cc898 node-agent W1228 12:01:01.960823 2813613 registry.go:332] failed to inspect container 9352666a52fae4b94047fd1f07620b4ef23826c079ff36064cc025ac24877f84: container "9352666a52fae4b94047fd1f07620b4ef23826c079ff36064cc025ac24877f84" in namespace "k8s.io": not found
    coroot-node-agent-cc898 node-agent W1228 12:01:01.960978 2813613 registry.go:244] failed to get container metadata for pid 3742007 -> /kubepods/burstable/pod3faa3f58-e25e-4dac-9a6e-6a59a20d6cdb/9352666a52fae4b94047fd1f07620b4ef23826c079ff36064cc025ac24877f84: failed to interact with dockerd (Error: No such container: 9352666a52fae4b94047fd1f07620b4ef23826c079ff36064cc025ac24877f84) or with containerd (container "9352666a52fae4b94047fd1f07620b4ef23826c079ff36064cc025ac24877f84" in namespace "k8s.io": not found)
    coroot-node-agent-cc898 node-agent I1228 12:01:01.960994 2813613 registry.go:191] TCP connection from unknown container {connection-open none 3742007 127.0.0.1:45680 127.0.0.1:8080 14 1338699945448561 <nil>}
    coroot-node-agent-cc898 node-agent W1228 12:01:02.317373 2813613 registry.go:323] failed to inspect container 3525641293dd8d8f2974aa8e1dd605f291f233356a9a5a9a931535c8ebc13df7: Error: No such container: 3525641293dd8d8f2974aa8e1dd605f291f233356a9a5a9a931535c8ebc13df7
    
    

    image image

  • 5

    cpu throttling calculation

    Hi guys!

    I see in your documentation you counting throttling. image I see coroot just has metric container_resources_cpu_throttled_seconds_total. Without period

    Coroot uses the [container_resources_cpu_throttled_seconds_total](https://coroot.com/docs/metric-exporters/node-agent/metrics#container_resources_cpu_throttled_seconds_total) metric to find out how long each container has been throttled for. If this metric of related containers is correlating with the application SLIs (Service Level Indicators), that means the lack of CPU time is caused by throttling.

    i search in your open code but did't find what kind of formula are you use for calculation percent of throttling in container?

  • 6

    [feature request] allow customization of monitoring-related namespaces

    Hello, it seems that namespaces related to monitoring are hardcoded: https://github.com/coroot/coroot/blob/5e461f4dafe3935e9d610825282f1bed227c55f5/model/application.go#L119 It will be nice if they can be customized, for example i have parca in ns parca which is also related to monitoring, and i bet other people have different software in different namespaces too.

  • 7

    [feature request] allow blacklisting of certain services

    Hello! I'm not sure if this should be part of node-agent or UI, so putting it here.

    I want to be able to blacklist certain services from UI. As an example, i have iscsid running on host nodes for longhorn (also used by openebs jiva, maybe something else too). It creates useless links on map (will be running on every node, connected to all instance managers with local replica, so there will be tons of them). Hardcoding them all to coroot code seems pointless, so maybe just add setting to hide those nodes from UI?

    On the other side, blacklisting them in agent (maybe with argument, like "--ignore-services=iscsid,haproxy"?) might reduce cardinality in prometheus, assuming those services will be dropped at probe time.

  • 8

    Support for ARM64 architectures

    Hello there,

    I am trying coroot for an in-house kubernetes clusters deployed on Raspberry Pi 4 machines. I followed the installation guide but I am getting exec /opt/coroot/coroot: exec format error . This tells me that the arm64 arch is not supported. Considering all cloud providers like AWS, GCP, Oracle and now Azure have arm64 support, it would be great if we can get arm64 bit support.

  • 9

    Random crashes

    Hello! Got this error, appears randomly. Does not crash pod with coroot, but stops responding in UI

    2022/10/18 00:32:49 http: panic serving 10.251.3.145:50394: runtime error: integer divide by zero
    goroutine 60260 [running]:
    net/http.(*conn).serve.func1()
    	/usr/local/go/src/net/http/server.go:1825 +0xbf
    panic({0xaefee0, 0x1253250})
    	/usr/local/go/src/runtime/panic.go:844 +0x258
    github.com/coroot/coroot/auditor.(*appAuditor).logs(0xc001afebb8)
    	/tmp/src/auditor/logs.go:92 +0x86a
    github.com/coroot/coroot/auditor.Audit(0xc0124584d0)
    	/tmp/src/auditor/auditor.go:32 +0x159
    github.com/coroot/coroot/api/views/overview.Render(0xc0124584d0)
    

    Another one:

    2022/10/18 00:35:14 http: panic serving 10.251.6.173:57792: runtime error: integer divide by zero
    goroutine 62487 [running]:
    net/http.(*conn).serve.func1()
    	/usr/local/go/src/net/http/server.go:1825 +0xbf
    panic({0xaefee0, 0x1253250})
    	/usr/local/go/src/runtime/panic.go:844 +0x258
    github.com/coroot/coroot/auditor.(*appAuditor).logs(0xc01cbd6bb8)
    	/tmp/src/auditor/logs.go:92 +0x86a
    github.com/coroot/coroot/auditor.Audit(0xc01459a000)
    	/tmp/src/auditor/auditor.go:32 +0x159
    github.com/coroot/coroot/api/views/overview.Render(0xc01459a000)
    	/tmp/src/api/views/overview/overview.go:41 +0x9d
    github.com/coroot/coroot/api/views.Overview(...)
    	/tmp/src/api/views/views.go:20
    
  • 10

    Bump vuetify from 2.6.9 to 2.6.10 in /front

    Bumps vuetify from 2.6.9 to 2.6.10.

    Release notes

    Sourced from vuetify's releases.

    v2.6.10

    :wrench: Bug Fixes

    • VCalendar: prevent XSS from eventName function (ade1434), closes #15757
    • VDialog: don't try to focus tabindex="-1" or hidden inputs (89e3850), closes #15745
    • VMenu: disable activatorFixed when attach is enabled (#15709) (464529a), closes #14922
    • VTextField: only show clear icon on hover or when focused (7a51ad0)
    • VTextField: prevent tabbing to clear button (f8ee680), closes #11202
    • web-types: add support for VDataTable pattern slots (#15694) (ac45c98)

    :microscope: Code Refactoring

    • VSelect: render highlight with vnodes instead of innerHTML (4468e3c)

    BREAKING CHANGES

    • VCalendar: eventName function can no longer render arbitrary HTML, convert to VNodes instead. eventSummary can no longer be used with v-html, replace with <component :is="{ render: eventSummary }" />
    Commits
    • fdfb6fc chore(release): publish v2.6.10
    • cd193e4 fix(VSelectList): correct mask class
    • 89e3850 fix(VDialog): don't try to focus tabindex="-1" or hidden inputs
    • 4468e3c refactor(VSelect): render highlight with vnodes instead of innerHTML
    • ade1434 fix(VCalendar): prevent XSS from eventName function
    • 464529a fix(VMenu): disabled activatorFixed when attach is enabled (#15709)
    • 7a51ad0 fix(VTextField): only show clear icon on hover or when focused
    • f8ee680 fix(VTextField): prevent tabbing to clear button
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • 11

    Node agent is crashing with `netlink receive: no such file or directory`

    I was trying out coroot on local minikube setup and I see that node agent is going into crashloopbackoff with this error

    $ kubectl -n coroot logs -f coroot-node-agent-fgwbg
    I1206 16:28:32.657690   15296 main.go:76] agent version: 1.4.1
    I1206 16:28:32.657822   15296 main.go:82] hostname: minikube
    I1206 16:28:32.657831   15296 main.go:83] kernel version: 5.15.0-56-generic
    I1206 16:28:32.657916   15296 main.go:69] machine-id:  845c2b4a10104ec4926fec08a1d703fc
    I1206 16:28:32.658081   15296 metadata.go:63] cloud provider: 
    I1206 16:28:32.658091   15296 collector.go:152] instance metadata: <nil>
    F1206 16:28:32.658338   15296 main.go:103] netlink receive: no such file or directory
    

    minikube version:

    $ minikube version
    minikube version: v1.20.0
    commit: c61663e942ec43b20e8e70839dcca52e44cd85ae
    

    kubectl version

    $ kubectl version
    WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
    Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}
    Kustomize Version: v4.5.7
    Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:20:00Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
    WARNING: version difference between client (1.25) and server (1.20) exceeds the supported minor version skew of +/-1
    
  • 12

    kube-state-metrics not showing in project configuration

    I've configured coroot in a kubernetes cluster, but kube-state-metrics even if deployed is not showing in project configuration page

    prometheus: ok coroot-node-agent: 5 nodes found

    NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
    kube-state-metrics   1/1     1            1           127m
    

    I just followed this installation instructions: https://coroot.com/docs/metric-exporters/kube-state-metrics/installation Am I missing something?