Skip to content

Instantly share code, notes, and snippets.

@yuvalif
Last active August 26, 2024 14:19
Show Gist options
  • Save yuvalif/52032debe4065d8fb2ddc06308ed94d9 to your computer and use it in GitHub Desktop.
Save yuvalif/52032debe4065d8fb2ddc06308ed94d9 to your computer and use it in GitHub Desktop.

existing functionality

  • frontend requests tracing on the RGW
  • OSD traces
  • jaeger orchestration via cephadm
  • multipart upload tracing when the process is done across multiple RGWs
  • end2end (RGW<->OSD) tracing of PUT object operations
  • conditional tracing on the RGW using Lua scripting

work in progress

future development

  • add more end2end (RGW<->OSD) trace points
  • add more RGW multisite trace points
  • add detailed documentation of all trace points and the information we store in the traces
    • current documentation only covers deplymenmt and configuration
    • add info on what is the correlation id for traces
  • current information in the traces is geared towards developers (e.g. function names) and not end users. this requires a redesign of the trace points, the names and the information stored in the traces
  • add tracing best practices and guidlines doc
  • jaeger orchestration via rook (phase1 - just documentation, phase2 - changes to the rook operator)
  • NFS genesha end2end tracing
  • other end2ednd tracing (e.g. RBD, cephFS)
  • jaeger v2 transition
    • change the client protocol
    • changes to cephadm
    • make sure that we do not break RADOS compatibility when we upgrade
  • investigate replacing the jaeger agent+collector with the OTEL collector

non-development

  • existing talk: Sustainability Through Accountability in a CNCF Ecosystem
  • submit a cephalocon 2024 tallk and/or prepare a tech talk aimed at users (one talk submitted by Deepika/Yuval)
  • record a code walkthrough explaining how to add more tracepoints (for developers)
  • demonstrate (talk/blog) how tracing could be used to debug latencies in the system
  • demonstrate (talk/blog) how to deploy tracing and jaeger in a multisite scenario using kafka+ingester so that information from multiple sites is funneled to the same backend
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment