tortoise
Tortoise, they are living in the Kubernetes cluster.
Tortoise, you need to feed only very few parameters to them.
Tortoise, they will soon start to eat historical usage data of Pods.
Tortoise, once you start to live with them, you no longer need to configure autoscaling by yourself.
Install
Tortoise, you cannot get it from the breeder.
Tortoise, you need to get it from GitHub instead.
# Install CRDs into the K8s cluster specified in ~/.kube/config.
make install
# Deploy controller to the K8s cluster specified in ~/.kube/config.
make deploy
Tortoise, you don't need a rearing cage, but need VPA in your Kubernetes cluster before installing it.
Documentations
- Concept: describes a brief overview of tortoise.
- Horizontal scaling: describes how the Tortoise does the horizontal autoscaling.
- Vertical scaling: describes how the Tortoise does the vertical autoscaling.
- The emergency mode: describes the emergency mode.
- Flag configurations for admin: describes how the cluster admin can configure the global behavior via flags
- Technically details: describes the technically details of Tortoise. (mostly for the contributors)
API definition
Contribution
Before implementing any feature changes as Pull Requests, please raise the Issue and discuss what you propose with maintainers.
Also, please read the CLA carefully before submitting your contribution to Mercari. Under any circumstances, by submitting your contribution, you are deemed to accept and agree to be bound by the terms and conditions of the CLA.
LICENSE
Copyright 2023 Mercari, Inc.
Licensed under the MIT License.
enable the container registry via GitHub Packages
We need to push the image somewhere so that we can pull the image in the cluster https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry
update tortoise status before update HPA and VPA
What this PR does / why we need it:
We need to update tortoise status before updating HPA and VPA so that we can prevent the data difference between the recommendation on tortoise and the actual parameters on HPA/VPA when updating tortoise status is failed.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
handle emergency tortoise which isn't handled by the controller as soon as possible
What this PR does / why we need it:
When starting the reconciliation for one tortoise, the controller checks the last time that tortoise is handled, and determine if the controller reconciles the tortoise now or not.
As an exception case, emergency tortoises are handled soon because of emergency situations. So far, all
.spec.updateMode == Emergency
tortoises are handled without checking the last update time. But, this PR improve that logic; In emergency tortoises case, we need to focus on the tortoise which isn't handled by the controller yet. And we don't need to rush on reconciliation of emergency tortoises which is already handled (minReplicas increased) by the controller before.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
add rbac kubebuilder comment to generate rbac for deployment, hpa, and vpa
What this PR does / why we need it:
add rbac kubebuilder comment to generate rbac for deployment, hpa, and vpa
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
fix mutating webhook to initialize all container's autoscaling policy
What this PR does / why we need it:
fix mutating webhook to initialize all container's autoscaling policy.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
implement mutating webhook for HPA
What this PR does / why we need it:
implement mutating webhook for HPA. HPA may be updated by the users and we need to modify HPA by the current recommendation value from tortoise when they apply new change on HPA.
Which issue(s) this PR fixes:
Fixes #6
Special notes for your reviewer:
implement the multiple container pod's container resizing for Horizontal
What this PR does / why we need it:
implement the multiple container pod's container resizing for Horizontal.
Which issue(s) this PR fixes:
Fixes #24
Special notes for your reviewer:
add the documentation to describe how to create a new release
What this PR does / why we need it:
add the documentation to describe how to create a new release
Which issue(s) this PR fixes:
Fixes #3
Special notes for your reviewer:
enable the container registry via GitHub Packages
What this PR does / why we need it:
enable the container registry via GitHub Packages
Which issue(s) this PR fixes:
Fixes #4
Special notes for your reviewer:
VPAs and HPA created by the controller should be deleted after tortoise gets deleted
https://github.com/mercari/tortoise/pull/57/files#r1162381665
.spec.targetRefs.HorizontalPodAutoscalerName
is not nil, we shouldn't delete HPA as that's created by users.add EmergencyPhase test case
What this PR does / why we need it:
Test case for Emergency mode The difference here would be the minReplicas == maxReplicas and TargetUtilization is 90 (same as default)
Which issue(s) this PR fixes:
Fixes # https://github.com/mercari/tortoise/issues/2
Special notes for your reviewer:
The feature: scheduled scaling up
Sometimes we can predict the increase of the resource consumption before it actually happens. (like TV, push notification on app, etc) This feature allows people to schedule scaling up before it actually happens. They will configure it with "when scaling up" and "how long scaling up" so that it can be back to normal afterward.
The integration test for the controller and webhook
Currently, the huge functions mostly have enough UTs. But, we don't have the integration tests from the controller package.
This issue means the integration test (not e2e test), we don't need to run up the clusters (kind, minikube etc) as they're too mendokusai to wait for them to start. Let's just use
envtest
. https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/envtestTortoisePhaseWorking
https://github.com/mercari/tortoise/pull/22TortoisePhaseEmergency