Tortoise: Shell-Shockingly Good Kubernetes "Truly-Automated" Autoscaling (not production-ready)

  • By Mercari, Inc.
  • Last update: Apr 11, 2023
  • Comments: 13

tortoise

Tortoise

Tortoise, they are living in the Kubernetes cluster.

Tortoise, you need to feed only very few parameters to them.

Tortoise, they will soon start to eat historical usage data of Pods.

Tortoise, once you start to live with them, you no longer need to configure autoscaling by yourself.

Install

Tortoise, you cannot get it from the breeder.

Tortoise, you need to get it from GitHub instead.

# Install CRDs into the K8s cluster specified in ~/.kube/config.
make install
# Deploy controller to the K8s cluster specified in ~/.kube/config.
make deploy

Tortoise, you don't need a rearing cage, but need VPA in your Kubernetes cluster before installing it.

Documentations

API definition

Contribution

Before implementing any feature changes as Pull Requests, please raise the Issue and discuss what you propose with maintainers.

Also, please read the CLA carefully before submitting your contribution to Mercari. Under any circumstances, by submitting your contribution, you are deemed to accept and agree to be bound by the terms and conditions of the CLA.

https://www.mercari.com/cla/

LICENSE

Copyright 2023 Mercari, Inc.

Licensed under the MIT License.

Download

tortoise.zip

Comments(13)

  • 1

    enable the container registry via GitHub Packages

    We need to push the image somewhere so that we can pull the image in the cluster https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry

  • 2

    update tortoise status before update HPA and VPA

    What this PR does / why we need it:

    We need to update tortoise status before updating HPA and VPA so that we can prevent the data difference between the recommendation on tortoise and the actual parameters on HPA/VPA when updating tortoise status is failed.

    Which issue(s) this PR fixes:

    Fixes #

    Special notes for your reviewer:

  • 3

    handle emergency tortoise which isn't handled by the controller as soon as possible

    What this PR does / why we need it:

    When starting the reconciliation for one tortoise, the controller checks the last time that tortoise is handled, and determine if the controller reconciles the tortoise now or not.

    As an exception case, emergency tortoises are handled soon because of emergency situations. So far, all .spec.updateMode == Emergency tortoises are handled without checking the last update time. But, this PR improve that logic; In emergency tortoises case, we need to focus on the tortoise which isn't handled by the controller yet. And we don't need to rush on reconciliation of emergency tortoises which is already handled (minReplicas increased) by the controller before.

    Which issue(s) this PR fixes:

    Fixes #

    Special notes for your reviewer:

  • 4

    add rbac kubebuilder comment to generate rbac for deployment, hpa, and vpa

    What this PR does / why we need it:

    add rbac kubebuilder comment to generate rbac for deployment, hpa, and vpa

    Which issue(s) this PR fixes:

    Fixes #

    Special notes for your reviewer:

  • 5

    fix mutating webhook to initialize all container's autoscaling policy

    What this PR does / why we need it:

    fix mutating webhook to initialize all container's autoscaling policy.

    Which issue(s) this PR fixes:

    Fixes #

    Special notes for your reviewer:

  • 6

    implement mutating webhook for HPA

    What this PR does / why we need it:

    implement mutating webhook for HPA. HPA may be updated by the users and we need to modify HPA by the current recommendation value from tortoise when they apply new change on HPA.

    Which issue(s) this PR fixes:

    Fixes #6

    Special notes for your reviewer:

  • 7

    implement the multiple container pod's container resizing for Horizontal

    What this PR does / why we need it:

    implement the multiple container pod's container resizing for Horizontal.

    Which issue(s) this PR fixes:

    Fixes #24

    Special notes for your reviewer:

  • 8

    add the documentation to describe how to create a new release

    What this PR does / why we need it:

    add the documentation to describe how to create a new release

    Which issue(s) this PR fixes:

    Fixes #3

    Special notes for your reviewer:

  • 9

    enable the container registry via GitHub Packages

    What this PR does / why we need it:

    enable the container registry via GitHub Packages

    Which issue(s) this PR fixes:

    Fixes #4

    Special notes for your reviewer:

  • 10

    VPAs and HPA created by the controller should be deleted after tortoise gets deleted

    https://github.com/mercari/tortoise/pull/57/files#r1162381665

    • delete VPAs managed by tortoises when a tortoise gets deleted.
      • we can use owner references, or delete them in the reconcilation.
    • delete HPA managed by tortoises if HPA is created by the controller.
      • If .spec.targetRefs.HorizontalPodAutoscalerName is not nil, we shouldn't delete HPA as that's created by users.
  • 11

    add EmergencyPhase test case

    What this PR does / why we need it:

    Test case for Emergency mode The difference here would be the minReplicas == maxReplicas and TargetUtilization is 90 (same as default)

    Which issue(s) this PR fixes:

    Fixes # https://github.com/mercari/tortoise/issues/2

    Special notes for your reviewer:

  • 12

    The feature: scheduled scaling up

    Sometimes we can predict the increase of the resource consumption before it actually happens. (like TV, push notification on app, etc) This feature allows people to schedule scaling up before it actually happens. They will configure it with "when scaling up" and "how long scaling up" so that it can be back to normal afterward.

  • 13

    The integration test for the controller and webhook

    Currently, the huge functions mostly have enough UTs. But, we don't have the integration tests from the controller package.

    This issue means the integration test (not e2e test), we don't need to run up the clusters (kind, minikube etc) as they're too mendokusai to wait for them to start. Let's just use envtest. https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/envtest

    • [x] test for the reconcile with the tortoise TortoisePhaseWorking https://github.com/mercari/tortoise/pull/22
    • [ ] test for the reconcile with the tortoise TortoisePhaseEmergency
    • [ ] test for the webhook