Cloud Native Rejekts NA 24 | Theater | Day 2

Building an Edge Device to Monitor Atmospheric Conditions

The speaker, Junji, introduces himself as a developer who writes code and blogs, with an interest in Cloud native technologies, and explains that he will be giving a talk on behalf of himself and his co-speaker about building an edge device to monitor atmospheric conditions (00:05:49).
The project was initiated by the National Space Research and Development Agency (NASDA) in Nigeria, which aimed to foster interest in space research in universities, and Junji's school was selected to participate (00:06:54).
The goal of the project was to detect and visualize air quality problems in the school, which was experiencing increased diesel consumption and degrading air quality due to the expansion of the school (00:07:21).
The team built a prototype using an ESP32, temperature sensors, humidity sensors, and a carbon monoxide and dioxide sensor, and used MQTT to publish sensor data to a Raspberry Pi, which was then visualized using Prometheus (software) and Grafana (00:08:27).
The results of the project included the ability to detect CO2 levels, visualize air quality data, and display safety values for the air quality, with plans to deploy the device to various locations in the school (00:09:50).
Junji notes that hardware is hard, and that networking for IoT devices can be challenging, but that the project was a valuable learning experience, and that the community aspect of Cloud native technologies is important (00:10:31).

Migrating a Distributed Service to a Cloud-Native Infrastructure

Migrating a distributed service to a cloud-native infrastructure can be daunting, but having a measured risk methodology can help make the process less intimidating (00:41:48).
In cloud-native environments, services often depend on other services, and debugging can be challenging, especially with serverless services (00:42:35).
A methodology for migration involves charting the course, setting up an A/B experiment or traffic redirection, identifying a rollout plan, and designing a compass to measure progress (00:43:17).
The compass should include metrics such as fallback rates, infrastructure health, and logical checks to ensure resources are healthy and functioning correctly (00:47:53).
Logical checks can include verifying DNS settings, network connections, and other use-case-specific requirements to ensure resources are healthy (00:49:27).
Using scatter charts and zooming out can help identify issues and patterns in resource health and performance (00:51:46).
Noisy neighbor patterns can cause issues in services, and having a compass to monitor and debug these issues is helpful, with local health probes being more efficient than relying on the cloud's control plane (00:52:33).
Service limits and rate limiting should be considered when designing services to avoid overwhelming other services with requests, and retries should be implemented carefully to avoid retry storms (00:54:42).
When migrating services, it's essential to monitor scale limits, ensure that all production traffic is accounted for, and use a framework to tune and debug the migration process (00:55:55).
A successful migration can be achieved by using an AB experiment, being mindful of scale limits, and having a dashboard to act as a compass to monitor the migration process (00:58:30).

Building a "Half Mesh" for Service Discovery and Load Balancing

The main issue with full-service meshes like LinkerD and ISO is the complexity and scale of the resources required, which can be overwhelming for solving simple problems (01:16:57).
The concept of a "half mesh" is introduced, which involves creating micro meshes that solve specific problems using only a sidecar and a control plane (01:17:56).
The half mesh solutions will be demonstrated using two different approaches: one focusing on HTTP with a sidecar and the other using GRPC with a control plane (01:18:28).
The demonstrations will utilize XDS (Extensible Discovery Service), a standardized configuration language for service meshy things, which has four key components: listeners, routes, clusters, and endpoints (01:21:42).
The first half mesh demonstration will involve building a sidecar-only mesh using XDS files and Kubernetes config map to configure the sidecar (01:22:27).
A Kubernetes config map is used to configure the Envoy sidecar, which is used for service discovery and load balancing, and the configuration is based on the XDS specification (01:24:41).
The Envoy configuration is defined in a YAML file, which includes the cluster and listener configurations, and is used to route traffic to the correct service (01:24:55).
The solution uses a single config map and a sidecar to solve the service discovery and load balancing problem, and can be easily extended to support multiple services (01:26:16).
The config map is used to configure the Envoy sidecar, and changes to the config map are automatically detected and applied by the sidecar, allowing for dynamic reconfiguration (01:30:11).
The solution is demonstrated using a Kubernetes cluster, and shows how the Envoy sidecar can be used to load balance traffic between multiple services (01:27:59).
The second half mesh solution focuses on a control plane and GRPC, and uses a similar approach to the first half mesh solution, but without a sidecar (01:32:05).
The approach involves creating smart clients that can talk to a control plane, eliminating the need for sidecars, and leveraging the Envoy team's work to build a custom control plane in about 13 minutes (01:32:39).
The control plane is built using a Go control plane package provided by the Envoy team, which allows for efficient configuration of XDS and scaling (01:33:30).
The control plane watches Kubernetes services and endpoints, keeping an in-memory index of services and endpoints, and updates the cache whenever changes occur (01:34:24).
The cache is used to generate XDS resources, which are then used to configure clients, allowing them to get updates instantly (01:36:28).
The control plane is set up using a basic Kubernetes deployment, running the control plane code and listening on Port 18,000 for clients (01:39:47).
Clients are run in GRPC mode and given a magic Env variable, allowing them to connect to the control plane and receive updates (01:39:55).
A demo is run to show a control plane with less than 400 lines of code, where a Kubernetes service is used to find all the required stuff, and the control plane is run to talk to the Kubernetes service (01:40:03).
The demo shows how to create a new cluster, run the control plane, clients, and servers, and how to update the Kubernetes service to change the behavior of the clients (01:40:27).
The control plane is built in two parts, and the XDS config doesn't care where the data comes from, as long as the metadata is provided (01:41:16).
The demo also shows how to use a config map to update the behavior of the clients, and how to use Kubernetes resources to manage the mesh (01:42:30).
The speaker mentions that the control plane can be used with HTTP services, and that it's possible to configure your own sidecars to work with the control plane (01:45:22).
The speaker also compares their approach to other mesh solutions like Linkerd, and mentions that their approach is more lightweight and easier to understand (01:46:42).

Securing the AI Model Supply Chain with Attestations

The speakers introduce themselves as Taylor and Mi Marusak, co-founders of a supply chain security company and members of Google's Open-source software security team, respectively (02:05:28).
The main goal is to tie an AI model to how it was built in a tamper-proof way, using attestations, to prevent malicious activities (02:07:25).
The development of AI models involves several steps, including data set preparation, model framework selection, and training, each with potential supply chain risks (02:07:54).
To protect the supply chain, the speakers propose signing and verifying artifacts, and using Salsa (food) attestations to record information about the build process (02:09:54).
Salsa attestations can be used to capture information about AI model dependencies, including data sets, training frameworks, and iterations (02:10:57).
The speakers aim to create a complete Salsa attestation that includes all necessary information about the AI model build process (02:11:55).
They propose combining Salsa attestations with other information, such as data sets and licenses, to create a comprehensive AI bill of materials (02:12:47).
Traditional software supply chains have a wide but shallow dependency tree, whereas machine learning (ML) supply chains have a narrow but deep dependency tree with multiple iterations (02:14:19).
The ML supply chain involves starting with one model, having a few dependencies, and iterating through multiple steps, including data operations and model training (02:15:42).
To analyze the entire graph of dependencies, tooling such as Salsa (food), Sixt, and Guacamole can be used to record and query metadata (02:16:02).
Guac is an open-source project that ingests software metadata, including Salsa attestations, and puts it into a graph database for querying and analysis (02:17:11).
Guac can be used to make policy decisions, gain insights, and automate policies around the metadata (02:17:00).
A demo of Guac shows how it can be used to query the dependencies of a PyTorch model and visualize the entire graph of dependencies (02:19:53).
A graph database is used to store information about models, including their dependencies and data sets, allowing for easy querying and visualization of the data (02:22:09).
The system can detect which models are impacted if a data set is poisoned or becomes illegal to use, and can certify a data set as bad to prevent its use in model training (02:24:04).
The system can also integrate with vulnerability scans to automatically detect vulnerabilities in ML models and data sets, although currently, marking a data set as bad is a manual process (02:29:14).
The system uses hashes to identify data sets, and can still detect impacted models even if the hash covers a larger chunk of the data set (02:29:52).
The system can be used to create policies and attestations around model training, and can integrate with tools like Salsa (food) and Guacamole to provide a complete understanding of how a model was built (02:27:03).

Running AI Workloads on Kubernetes with Kaido and Kao

Kaido is Kubernetes AI toolchain operator that allows users to run AI workloads on Kubernetes efficiently, with a focus on large language models (Large language model). (02:39:57)
The benefits of running LLMs directly on Kubernetes include control over data and security, localization of data, observability, customizability, and community support. (02:43:00)
Kaido streamlines the workflow by taking care of tasks such as provisioning GPUs, installing Nvidia drivers, and setting up models, making it easier to run AI workloads on Kubernetes. (02:45:31)
Kaido uses a CRD approach and provides a simple workspace CRD that allows users to specify the skew and preset (model) they want to use. (02:45:42)
Kaido provisions nodes using the Carpenter Core API, which is vendor-agnostic, and allows for pre-flight validations to ensure the model can run on the specified hardware. (02:46:55)
Kaido stores model weights in the images themselves, intentionally, to simplify version control and distribution, making it more cloud-native friendly. (02:47:40)
Kao is a vendor-agnostic platform that allows users to run AI workloads on Kubernetes, and it supports fine-tuning and Retrieval-augmented generation (Retrieve-Augment-Generate) workloads (02:50:36).
Fine-tuning involves adding adapters to a base model to specialize it for a specific task, and Kao supports Laura and Cura (software) adapters (02:52:12).
RAG is a method that allows users to provide context to a prompt, and Kao supports RAG with various storage options, including Feice, Chroma DB, and Cosmos DB (02:55:22).
Kao provides a simple way to get started with running AI workloads on Kubernetes, and it is collaborating with upstream working groups to improve its functionality (02:57:38).
Kao is seeking new collaborators and maintainers, and it is an open-source project that can be found on GitHub (02:58:44).
The toolkit for running AI workloads in Kubernetes is Cube Flow, but it's complex to set up, and the speaker has tried it and run into issues (03:00:00).
The project doesn't currently support Auto scaling, but they're working on it, and they're gathering metrics like GPU usage to determine when to scale (03:02:03).
If a model isn't one of the presets supported, users can build their own image using a Hugging Face model ID, and the project provides steps for doing so (03:03:23).
The project uses VM to run models, but it also supports Hugging Face Accelerate Library as a runtime engine, and they want to support more engines in the future (03:04:27).
The project doesn't currently support allocating a fraction of a GPU, but they're working on Multi-Instance GPU (MIG) to make it possible (03:06:24).
The recommended node size for running models depends on the model's parameters and context length, but the project has benchmarks and tests the maximum model size that can be used with a given GPU (03:06:59).
Pre-provisioning nodes means creating nodes in advance, as opposed to Auto scaling, which creates nodes when a pod is stuck in pending (03:05:09).

Building AI Co-pilots for Software Platforms

Jeremy Louie is going to talk about why software platforms need AI co-pilots and how to build them (04:34:22).
He gives a demo of an AI co-pilot called Foil, which helps operate software and provides suggestions for debugging and fixing issues (04:35:12).
Foil learns about the user's platform and infrastructure through their interactions and provides helpful suggestions based on that knowledge (04:38:28).
Jeremy demonstrates how Foil can assist in debugging a "no healthy upstream" error in a staging service by suggesting logging commands and analyzing the output (04:37:05).
Foil also helps in identifying and fixing an issue with a Docker (software) image name in a deployment manifest (04:40:02).
The issue at hand is a permission problem accessing a secret in Google Cloud Secret Manager, which is required to access the OpenAI API key for generating a greeting, and the solution involves checking the service account configuration and adding the Secret Manager Secret Accessor role to the IAM policy (04:42:57).
The process of debugging and resolving the issue is facilitated by an AI co-pilot that analyzes the logs and provides the necessary commands to execute, and the AI adapts to the output and situation, making it easier to solve the problem (04:43:53).
The use of AI in this context is principled, as it is well-suited for today's technologies like Large language model, and it helps to solve a specific problem in a more efficient way (04:47:42).
The complexity of shipping and operating software is a major challenge, and it cannot be solved by building more tools, but rather by making existing tools easier to use and more accessible to application developers (04:50:41).
The solution to this complexity is to use playbooks that provide a step-by-step guide to accomplishing tasks, but the problem is that writing and maintaining these playbooks is difficult due to combinatorial complexity (04:51:01).
Generative artificial intelligence can be used to write playbooks dynamically and personalize them to specific issues, making it a perfect solution for solving this problem (04:52:42).
The process of using generative AI to write playbooks involves feeding intent to an LLM and asking it to predict the next command to execute, and then iterating on this process to build an AI that walks through the process of debugging and resolving issues (04:53:21).
A platform is being developed that uses an IDE for operations, combining command execution and output analysis, to create a better experience for operating applications, with the goal of collecting data to train a large language model (LLM) to solve problems, (04:54:10).
The platform uses human feedback to improve the quality of the data and the LLM's predictions, creating a data flywheel where more usage leads to better AI assistance, (04:55:31).
The speaker estimates that using the platform saves around 20-40 minutes per day and invites the audience to try it out, providing a sandbox environment and installation instructions, (04:56:04).
The platform's AI is trained using context-dependent fuse shot prompting, with a database of past examples of intent-action pairs, and can be used with existing tools and opinionated systems, (04:59:46).
The speaker discusses the importance of curating the AI to quickly learn new ways of doing things and spread that knowledge, using data curation and retraining the AI, (04:58:43).
The platform is designed to work well with existing tools, such as CLIs and APIs, and can learn from custom CLIs and guardrail systems, (05:01:08).
The speaker mentions that the platform is still in its early days but has shown promising results, and invites the audience to provide feedback and try it out, (04:56:19).

Managing PostgreSQL on Kubernetes with Cloud Native PG

The speaker introduces themselves as Gabriel, a PostgreSQL (PG) contributor and Kubernetes Ambassador, and Leonardo, a principal software developer and maintainer of Cloud Native PG. (05:10:00)
Cloud Native PG is a production-ready operator for running PostgreSQL on Kubernetes, focusing on license and vendor neutrality, and open governance. (05:13:17)
The operator uses a Cluster Resource (CR) to define the PostgreSQL cluster, following the convention over configuration principle, and supports features like synchronous replication and transactional Data definition language. (05:14:35)
The main use cases for Cloud Native PG include running PostgreSQL together with the application in the same Kubernetes namespace, and using the operator to manage database changes and migrations. (05:16:23)
The operator provides DevOps capabilities like version control, automated testing, and shift-left security, and supports continuous integration and delivery, trunk-based development, and automated deployment. (05:17:57)
A database alone is not useful, it needs a schema, and designing a schema requires mental discipline and understanding of the problem being solved (05:25:40).
There are two approaches to changing a schema: declarative migrations and change-based migrations, with declarative migrations being a newer approach that uses a "magic migration box" to detect the SQL DDL to apply to the database (05:27:15).
Change-based migrations are more painful and require writing scripts to migrate the previous version to the new one, but can be more powerful and flexible (05:27:54).
A middle way is to use a declarative migration tool to scaffold SQL scripts and then review them manually (05:32:01).
A migration engine should apply changes entirely, lead to a consistent state, respect constraints, and not conflict with queries, and should use transactional Data definition language (05:32:39).
Migrations should be tested like any other code base, and can be executed using a "magical migration box" or an init container in Kubernetes (05:34:32).
When migrating databases, ensure your migration engine uses a primitive like an advisory lock to handle progress, otherwise you may encounter trouble, and be prepared for old versions of your application to cope with the new version of the schema (05:35:00).
The most popular database is useful for most use cases, but may not excel in specific ones, and you can achieve cloud neutrality with a fully open-source stack using Kubernetes, PGUS, and CMPG (05:36:02).
For backup, PGUS has advanced backup and continuous backup, allowing for point-in-time recovery, and works out of the box, with support for volume snapshots (05:37:49).
The operator ensures the primary always runs in a schedulable node, and if not, it switches over, and there are tools like Carpenter that try to manage that stuff (05:41:07).

Installing and Using the OpenTelemetry Operator on Kubernetes

The OpenTelemetry operator is primarily responsible for managing the deployment of the collector and injecting auto-instrumentation into Kubernetes pods (05:51:00).
To install the operator, you must have a Kubernetes cluster running on at least version 1.23 (05:51:50).
To deploy the OpenTelemetry Collector, a cluster with a State Manager already installed is required, and the cluster can be created using a tool like Kind, Minikube, or a cloud service provider-hosted cluster (05:51:59).
The OpenTelemetry Collector can be installed using Kubecuddle or Helm (package manager), with version 3.9 or later required for Helm (05:52:44).
The Collector has a custom resource for managing its deployment, which includes configuration options such as mode and config attribute (05:54:06).
When deploying the OpenTelemetry Collector, it's essential to check if the resources were deployed correctly, including the Collector pod, config map, and other objects (05:55:35).
The Collector CR version should be checked, as there are differences in the config sections between V1 beta 1 and V1 alpha 1 (05:57:38).
Instrumentation is the process of adding code to software to generate telemetry signals, and OpenTelemetry supports zero-code instrumentation, also known as auto-instrumentation (06:00:37).
The operator provides auto-instrumentation by injecting it into application pods using the instrumentation resource, which is a new resource available in the API (06:01:47).
An instrumentation resource is required for auto-instrumentation, which can be defined for multiple languages, including Python (programming language) and Java (programming language), and can reside in the same or different namespaces (06:02:54).
The instrumentation resource must be deployed before the deployment, and an annotation must be added to the deployment to enable auto-instrumentation (06:04:18).
The annotation must be placed in the template metadata section of the deployment, and the correct language must be specified, such as "instrumentation.doop.telemetry.doio.inject.python" (06:04:34).
If multiple instrumentation resources are defined in the same namespace, the instrumentation name must be specified in the annotation (06:05:38).
To troubleshoot auto-instrumentation issues, check the deployment order, annotation configuration, and endpoint configurations, and review the operator logs for error messages (06:06:20).
Additional troubleshooting tips include installing the instrumentation resource before the operator, checking resource deployment, and verifying API versions (06:10:31).
The otel and user Sig has a docs usability survey open until the 15th, and users are encouraged to participate to help improve the docs (06:12:28).

The Right Way to Reconcile in Kubernetes

Scott Nichols will be talking about the wrong way to reconcile, focusing on helping explain what good operators, reconcilers, and controllers in the industry should do (06:19:18).
Every single CRD needs a status with conditions, including fields like status type, observe generation, and condition type, which should be either ready or succeeded (06:20:39).
Declarative APIs in Kubernetes open the door to disagreements between the spec and the world, and reconcilers settle these disagreements (06:21:30).
The replica set is an example of a core Kubernetes resource with a spec, status, and generation, where the generation gets bumped when the spec is updated (06:23:11).
The work queue is a strongly typed queue that interacts with the cache and informers, and is usually interacted with by injections into the client cache (06:25:37).
As a reconciler operator, the job is to take a string from the work queue and turn it into a query for the resource, using the owner reference to point back to the type being reconciled (06:26:19).
When modifying a resource, it's recommended to exit the loop after making a modification to avoid re-adding the key to the bottom of the work queue (06:28:22).
When interacting with clients directly, ensure extra caution is taken to use the same client and the same caches to avoid synchronization problems (06:28:56).
A reconciler is an implementation for a particular thing, a controller is the thing that helps set up other clients and holds the reconciler, and an operator is a marketing term for a controller (06:29:39).
Resource models should be useful from the bottom up, with each layer building upon the previous one, and should not reference parent objects to maintain a directional DAG sense (06:31:03).
Annotations can be used as strategy selectors for reconcilers, allowing for A/B testing of different logic, while labels are mostly used for filtering and identifying things (06:32:43).
The general shape of a required status includes status, conditions, and an array of stuff, with observed generation now inside the condition in the conditions list (06:33:47).
Conditions are a flat DAG where the top-level condition is ready or succeeded, and subconditions wrap around sections of logic, with the code compiled of several of these conditions (06:36:21).
Conditions can be true, unknown, or false, with true meaning everything is fine, unknown meaning the condition is being worked on, and false meaning the condition has failed (06:38:43).
Subconditions in Kubernetes are used to tell humans what's happening, and their messages should be as useful as possible, with true meaning everything's fine, unknown meaning it's not what you asked for but the system is working on it, and false meaning something fundamentally needs to change in the cluster. (06:39:51)
% of all issues in Kubernetes are caused by typos, and it's essential to scan the code and not trust status if observe generation doesn't match generation. (06:40:37)
Emitting events is a useful pattern for explaining to humans why something is doing wacky things, and events should be emitted on meaningful changes, not on the happy path. (06:42:16)
Using one cache can simplify life, and referencing things in the spec of something else is considered an anti-pattern. (06:45:03)
A subcondition manager is missing in the wild controller runtime, and duct typing support is big, but not all things that can be done in K native can be done with it. (06:45:46)

GitOps and Argo CD for Continuous Delivery on Kubernetes

GitOps is a practice of deploying applications onto Kubernetes using Git as a single source of truth, with core principles including version control, declarative configurations, drift management, and Continuous delivery. (07:05:34)
Argo CD is a GitOps-based continuous delivery tool for Kubernetes, with its main objective being a reconciliation engine that synchronizes configurations pushed to the Git repository with the actual state of workflows running on the cluster. (07:07:41)
Argo CD is not just a continuous deployment tool, but a Git syncing tool that is part of a larger ecosystem, and its ideal checklist for a continuous deployment tool would vary from organization to organization, depending on requirements. (07:09:34)
Grand control deployments across multiple environments, such as Dev, Staging, and Production clusters, with the ability to customize deployment patterns and blackout windows (07:13:14).
Auto remediations can automate tasks, such as increasing the max replica count of Horizontal Pod Autoscaling (HPA) when it reaches its maximum (07:13:56).
Microservices dependency gaps can be managed by defining interdependent microservices and bundling them into a release for easier rollbacks (07:14:27).
Workflow flexibility allows for running pre-deployment and post-deployment stages, such as database migrations, application deployment, and notification sending (07:15:22).

Defron: A Kubernetes-Native Software Delivery Platform

Defron is an open-source, Kubernetes-native software delivery platform that provides a single solution for all things Kubernetes, including CI/CD pipelines, deployment workflows, and observability (07:17:31).
Defron's dashboard provides features such as multi-cloud, multi-cluster application management, workflow creation, and approval-based deployments (07:19:38).
Defron integrates with various plugins, such as GitHub and Jira (software), for tasks like migrations and ticket closure, and allows users to create custom plugins (07:20:57).
Defron allows users to save customer scripts as plugins, giving environment variables and documentation for creating a CD pipeline (07:21:36).
Users can control configurations, enabling approvals for production environments, and requiring approver validation before deployment (07:21:59).
Defron provides a configuration diff feature, allowing users to review changes before deploying to a production environment (07:22:47).
Users can roll back to previous releases, viewing the entire diff of changes, and set up SLO-based rollbacks in their workflow (07:22:55).
Defron offers a resource group view, displaying real-time application status, deployment status, and application metrics across multiple environments (07:24:11).
Defron is an open-source platform, available on GitHub, and can be installed on Kubernetes clusters (07:25:14).
Defron differentiates itself from Cargo through its plugin ecosystem, deployment windows, and deployment filters (07:26:39).

Headlamp: A Modern and Extensible UI for Kubernetes

Headlamp is a modern, generic UI for Kubernetes, supporting both desktop and service deployments, and is extensible through plugins (07:41:01).
The cloud native ecosystem has a lot of tools, but this leads to fragmentation, affecting the learning curve and user experience due to inconsistent University of Illinois Springfield and different ways of working (07:42:57).
Headlamp is a customizable UI for Cloud Native Computing Foundation tools, allowing users to add new routes, app menus, and change the sidebar, logo, or branding, and has libraries for common Kubernetes operations (07:44:13).
Headlamp has a plugin system, with examples including Prometheus (software) integration, Compose plugin, and OpenCost plugin, which can be seamlessly integrated into the UI or have their own section (07:45:20).
To build a plugin, users can use the Headlamp Plugin tool, an npm tool that creates a plugin template, and then build and release the plugin using npm commands (07:48:49).
Discovering plugins is done through the Plugin Catalog plugin, which is shipped with the desktop version of Headlamp and allows users to find and install plugins (07:51:12).
To publish a plugin, users need to release it on GitHub, GitLab, or Bitbucket, and create a release draft, upload the tarball, and associate the tag with the release (07:51:48).
To publish a plugin, create a repository on Artifact Hub, a centralized source for plugins, by adding a file called artifact-hub-repo at the root of the repository with the name and email of the owner in YAML format (07:54:42).
Once the repository is created, a package needs to be created with a name, logo, creation date, and annotations specific to the package type, in this case, a Headlamp plugin (07:55:11).
Headlamp only shows official plugins by default for security reasons, and to be marked as official, a ticket needs to be opened with Artifact Hub, or the plugin needs to be in the allow list (07:57:48).
The plugin download mechanism is in the Headlamp desktop side for security reasons, and users can see the plugin's download URL and decide whether to trust it (07:59:23).
To show non-official plugins, users can go to settings and disable the official and verified filters, but this is at their own risk (08:00:16).
Resources for contributing to Headlamp include the GitHub repository, the Headlamp channel on Kubernetes Slack, and a newly drafted contribution guide (08:00:35).
Headlamp and Lens are both Kubernetes management tools, but Headlamp is focused on providing a comprehensive Kubernetes UI, whereas Lens has an unstable open-source aspect and is not part of the Cloud Native Computing Foundation. (08:03:21)
Headlamp has a plugin system and allows integration with other tools, such as Backstage, which is a portal for developers, and K9s, a command-line tool. (08:05:58)
Headlamp can be integrated into Backstage, allowing users to access Kubernetes functionality from within Backstage, and also provides a backlink to navigate from Headlamp to Backstage. (08:07:07)
Headlamp's plugin system is flexible and allows running CLI tools, but with some limitations for security reasons, and can be used to unify the CNCF tool set. (08:08:39)
The speaker emphasizes the importance of collaboration between projects and tools in the CNCF ecosystem, as demonstrated by the integration of Headlamp and Backstage. (08:09:35)

Networking and Collaboration in the Open Source World

To become a key figure in the Open-source software, like Kevin Bacon in the film industry, one needs to put in the work and network on their own terms, utilizing various platforms such as GitHub, email, social media, and organizations like the Cloud Native Computing Foundation. (08:14:57)
Networking should be done honestly and transparently, focusing on specific points of interest and showing genuine value, rather than just complimenting someone's talk. (08:16:51)
There are many non-technical opportunities to network, such as talking about common interests like pets, and being approachable and willing to introduce oneself to others. (08:17:58)

Updates and Projects in the Cloud Native Ecosystem

Flatcar, a Linux distribution optimized for containers, has gained significant traction and growth, with over 50,000 contributions from over 200 companies, and has been recognized as an incubating project within the CNCF. (08:20:16)
The Flatcar project is now under a vendor-neutral home, allowing for broader participation, and is considered stable and successfully used in production, with future themes focusing on security, architecture evolution, and community growth. (08:21:16)
Cluster API is moving away from using image Builder to build tightly coupled images, instead using a separate CEX that can be loaded at boot time, allowing for loosely coupled updates and support for various clouds, arm, and GPU. (08:23:06)
Score is a Cloud Native Computing Foundation sandbox project that provides an abstraction to Kubernetes for developers, allowing them to describe how they want to deploy their workload without knowing the technical details. (08:24:32)
Score can generate a compose file or a Kubernetes manifest from a simple score file, making it easier for developers to test and deploy their applications. (08:25:22)
KitOps is an open-source project that helps solve the problem of storing different components of AI, such as data sets, model weights, and model code, in one single location. (08:31:51)
Model Kit is an OCI-compliant packaging format that allows developers to package together all the components of their machine learning code into one single package. (08:32:08)
Machine learning models can be easily deployed to production using tools like Docker (software) and Kubernetes, with 80% of models never making it to production due to deployment issues (08:32:43).
A new project called Hyperlite has been released, which is a process VM that treats virtual machines as an isolation boundary for a single process, allowing for fast creation and low overhead (08:36:15).
Hyperlite can create a new VM in 1-2 milliseconds and has a call latency of 60-100 microseconds, making it suitable for high-load applications and services (08:37:07).

Ephemeral Environments and Collaboration in Kubernetes

Ephemeral environments in Kubernetes have been in existence since the Jenkins (software) era, but their implementation in today's cloud-native ecosystem is being rethought (08:40:22).
Ephemeral environments are short-lived environments or deployments created for cost-effectiveness and are dynamically created and destroyed for each developer or feature (08:40:47).
Kubernetes allows for the creation of ephemeral environments using namespaces, which can be scaled down or deleted when no longer needed, reducing infrastructure costs and making it easier to reuse environments (08:42:31).
Developers often experience pain when working with infrastructure and tools, and this shared pain can bring people together and foster collaboration (08:45:03).

Combining Observability and SIEM Tools

Observability and Security information and event management (SIEM) tools have similar goals and functions, but are often kept separate, leading to unnecessary complexity and costs (08:47:55).
Combining observability and SIEM tools can lead to fewer admin tools, a lower security profile, and reduced costs, but also presents challenges such as migrating knowledge and integrating with existing systems (08:49:43).
Tools like Shifton can help mitigate these downsides by providing an easy way to set up monitoring and logging, and integrating with various applications and services (08:50:53).

Managing Access to Kubernetes Clusters with Fine-Grained Permissions

Access and share fine-grain access of Kubernetes cluster can be achieved using a product called Access, which generates a kube config file to grant temporary access to specific resources (08:52:49).
The Access tool allows users to specify fine-grained permissions, such as allowing access to logs of a particular pod in a specific namespace (08:55:19).

The Cloud Native Community and Call to Action

The Cloud Native landscape has 28 projects, and a survey found that many contributors wear multiple hats, including code and non-code contributions (08:59:01).
The Cloud Native ecosystem is a wider community where everyone's skills, perspectives, and experiences matter, and there are no rejects, only opportunities waiting to be explored (09:00:16).
The Call to action (marketing) is to have as many conversations as possible with people at events like KubeCon to find new opportunities and inspiration (09:01:38).
The conference attendees are thanked for their time, and it's hoped they've had a wonderful conference and will have a wonderful week (09:02:25).
Attendees are encouraged to introduce themselves to others, especially if they feel bashful, and to seek help from those who appear more confident (09:02:47).
A group selfie is suggested as a big sendoff, with attendees asked to gather in a crowd for the photo (09:03:01).

luebken/Cloud Native Rejekts NA 24 | Theater | Day 2.md

Cloud Native Rejekts NA 24 | Theater | Day 2

Building an Edge Device to Monitor Atmospheric Conditions

Migrating a Distributed Service to a Cloud-Native Infrastructure

Building a "Half Mesh" for Service Discovery and Load Balancing

Securing the AI Model Supply Chain with Attestations

Running AI Workloads on Kubernetes with Kaido and Kao

Building AI Co-pilots for Software Platforms

Managing PostgreSQL on Kubernetes with Cloud Native PG

Installing and Using the OpenTelemetry Operator on Kubernetes

The Right Way to Reconcile in Kubernetes

GitOps and Argo CD for Continuous Delivery on Kubernetes

Defron: A Kubernetes-Native Software Delivery Platform

Headlamp: A Modern and Extensible UI for Kubernetes

Networking and Collaboration in the Open Source World

Updates and Projects in the Cloud Native Ecosystem

Ephemeral Environments and Collaboration in Kubernetes

Combining Observability and SIEM Tools

Managing Access to Kubernetes Clusters with Fine-Grained Permissions

The Cloud Native Community and Call to Action