Zheng Zeng zheng022

func (a *UpgradeGatewayAPIV1) runCluster() http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		clientNodes, _ := connector.AgentClients(r.Context())
		
		// filter out cluster delegate and non-delegate nodes
		var clusterDelegateNode *catalog.Agent
		var nonDelegateNodes []*catalog.Agent
		for _, node := range clientNodes {

customer instance

https://github.com/github/ghes/issues/16434
bundle: https://esbtools-staff.githubapp.com/bundles/191214 (no increase on resource)
69,336 active series with a total of 211.8 million datapoints across the ~48-hour window
⚠️ observability.metrics.prometheus-endpoint-enabled true

Factor	Customer
Active series	69,336

🚀 Improving the GHES Manage Developer Experience — What We Did and What's Changed

You've told us this was hard!!
Building new APIs in GHES Manage has too often felt like a challenging expedition: confusing paths, hidden "gotchas," and the occasional moment where you're just… banging your head against the desk wondering why something that should be straightforward is taking so long.

GHES Manage was created three years ago as a modern alternative to the legacy Enterprise Management Console, but adoption hasn't met our expectations. We've heard consistent feedback that implementing new API endpoints can be challenging and unintuitive.

So we've been focused on improving the GHES Manage API.

The following snippet helps to copy script from enterprise2 to a ghebooted instance:

#!/bin/bash
#/ Usage: ./copy.sh <ghe-boot-host-name>
TARGET_HOST="${1}"
TARGET_USER="admin"
TARGET_PORT="122"

set -e

What Operators do

Operator manage and configure instance with UI
Operator builds automation with API
Operator does fire fighting with ssh + bash script

What can the purpose of an API/CLI

Part of the UI architecture and support a robust UI solution
Help operators build an reliable and efficient automation

Availability Review Retrospect

We initiated a systematic availability review process following our July 2024 offsite (see Revival of the GHES Availability Review Process). The first availability issue was then created on August 16th, marking almost a year since our previous review.

Our journey began by exploring what availability truly means for GHES. We recognized that an escalation's value extends beyond mere resolution - we aimed to foster deeper discussions, prevent recurrence through measured repair items, and share knowledge via comprehensive runbooks.

Over the past 6 months, we've made significant strides in our availability review processes:

Created 29 availability issues
Generated 29 repair items, with 22 successfully resolved
Conducted 9 availability review meetings

GitHub Enterprise Server (GHES)

You are an enterprise engineer!

Introduction

Because GitHub Enterprise Server (GHES) drives our revenue and supports our largest and most recognizable clients, every engineer at GitHub, including yourself, is an enterprise engineer! This lab is an opportunity to practice a few of the concepts you'll need to test code that you write in a GHES environment.

As a prerequisite to this lab, you should watch each part of the Engineering for Enterprise Lecture(TODO). The lecture provides an overview of the tools and concepts that we will be practicing during this self-directed exercise. After watching the lecture, you should be familiar with the key concepts required to complete this lab:

In the past two weeks, we've held two Availability Review meetings featuring excellent presenters. These meetings facilitated fruitful discussions on how we can reflect and learn from customer incidents. (In case you missed any, you can find the recording for 08-27 and 09-04)🤗

To enhance the efficiency of our AR meetings, here's a guide on the current Availability Review process and how AR issues should be completed. We're also integrating GHES-specific requirements into overall GitHub automations. Future improvements are expected to ease and eliminate more manual steps.😌

🐾 Process at a glance:

#	Step	Info
1	Availability Review created at end of GHES SEV 1	this will be automated in future

	#!/bin/bash
	#/ Usage: ghe-storage-verify-mysql-migration <backup-path> <destination-path>
	#/ Verifies that two MySQL datadir trees are identical by comparing md5sums
	#/ of every file beneath each path.
	#/
	#/ Example (after ghe-storage-migrate-mysql):
	#/ ghe-storage-verify-mysql-migration /data/user/mysql-backup/github_enterprise /data/multi-disk/db/mysql/github_enterprise

	set -e

	{
	"incidentStatusedTime": "2024-10-07T11:30",
	// "resolutionTime": "2024-10-31T04:50", // Dotcom specific
	// "visibility": "public", //Dotcom specific
	// "mostSignificantServiceStatus": "red",// Dotcom specific
	//"impactedServices": [],// Dotcom specific
	"resolvingIncidentCommander": "hubot",
	// "incidentUrl": "https://status-staging.githubapp.com/incidents/27863", // Dotcom specific
	// "impactStartTime": "2024-10-31T03:40", // Dotcom specific
	// "impactDetectionTime": "2024-10-31T03:40", // Dotcom specific