Here is how you can partially reconstruct the deleted manifests to make sure your provider will keep monitoring them, withdrawing from their leases as well as allow users to re-send their deployment manifest (without redeploying) to restore it & their ingresses (uri's), avoiding the Pod restart, hence keeping the ephemeral data on a Pod.
You can also reconstruct the manifests even when the deployments were gone (but bid/lease are still active/open). This requires the tenants to re-submit their manifests.
This reconstruction is based on existing namespaces, there is no way to entirely reconstruct the manifests unless they were manually backed up.
NOTE: look at the comment section if you do not even have
ns
(namespace) and still want to recover the active/open bid/lease.
MANIFEST_TEMPLATE='{"apiVersion":"akash.network/v2beta2","kind":"Manifest","metadata":{"generation":2,"labels":{"akash.network":"true","akash.network/lease.id.dseq":"$dseq","akash.network/lease.id.gseq":"$gseq","akash.network/lease.id.oseq":"$oseq","akash.network/lease.id.owner":"$owner","akash.network/lease.id.provider":"$provider","akash.network/namespace":"$ns"},"name":"$ns","namespace":"lease"},"spec":{"lease_id":{"dseq":"$dseq","gseq":$gseq,"oseq":$oseq,"owner":"$owner","provider":"$provider"}}}'
kubectl get ns -A -l akash.network=true -o json \
| jq --arg lid 'akash.network/lease.id' -r '.items[].metadata.labels | select(.[$lid+".dseq"] != null) | [.[$lid+".owner", $lid+".dseq", $lid+".gseq", $lid+".oseq", $lid+".provider"]] | @tsv' \
| while read owner dseq gseq oseq provider; do \
ns=$(akash provider show-cluster-ns --owner $owner --dseq $dseq --gseq $gseq --oseq $oseq --provider $provider)
echo "$MANIFEST_TEMPLATE" | owner=$owner dseq=$dseq gseq=$gseq oseq=$oseq provider=$provider ns=$ns envsubst | kubectl create -f -
done
kubectl -n akash-services delete pods -l app=akash-provider
Now akash-provider should keep the track of the partially reconstructed manifests, withdraw from the leases, check deployment status, etc.
The drawbacks
- The provider will report these kind of errors on start, only once. They are related to the reconsctructed deployments.
E[2022-06-01|12:16:16.242] cleaning stale resources module=provider-cluster-kube err="values: Invalid value: []string{}: for 'in', 'notin' operators, values set can't be empty"
- Clients will have to re-send their deployment manifest once again should they want to regain the ability to
akash provider lease-shell
into their deployment again as well as to restore the nginx ingresses (uris) to their deployments.akash provider send-manifest
will do. Also, thecleaning stale resources
error seen above will disappear.
Reference
post-mainnet6, bid/lease based recovery
Two leases disappeared because of this error:
Leaving no namespaces which were used in the pre-mainnet6 recovery (
kubectl get ns ...
).And yet, it is possible recover the deployments when bids/leases/deployments have active/open status.
The only caveat: the client has to re-submit the manifests.
The reason one would want to recover is usually to keep the hostnames (uri).
Recovering
1) prepare manifest templates
Need to match up the amount of services otherwise will get this kind of error:
2) scale down Akash Provider
3) Reconstruct the missing manifests, namespace and basic (skeleton) deployment
4) scale Akash Provider back up
5) re-send the manifests
6) remove the old "fake" (/tetris) ones: