Skip to content

Instantly share code, notes, and snippets.

@gaol
Last active June 1, 2022 09:38
Show Gist options
  • Save gaol/4d96eace8290e6549635fdc0ea41d0b4 to your computer and use it in GitHub Desktop.
Save gaol/4d96eace8290e6549635fdc0ea41d0b4 to your computer and use it in GitHub Desktop.
OpenJDK 17.0.2 - Cgroup v1 initialization causes NullPointerException when cgroup path does not start with the mount root

When I ran wildfly testsuite on JDK 17 within a podman container, I got NPE for all tests, please refer to NPE.stacktrace.java below on the stack trace, all work fine if I run it on base metal.

It relates to JDK issue: https://bugs.openjdk.java.net/browse/JDK-8272124, but this demostrates another case when the cgroup path does not start with the mount root.

In this case /proc/self/cgroup has the following line:

9:memory:/user.slice/user-1000.slice/session-3.scope

while /proc/self/mountinfo has the following line:

941 931 0:36 /user.slice/user-1000.slice/session-50.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory

The environment:

  • Java version
openjdk version "17.0.2" 2022-01-18
OpenJDK Runtime Environment 21.9 (build 17.0.2+8)
OpenJDK 64-Bit Server VM 21.9 (build 17.0.2+8, mixed mode, sharing)
  • RHEL 8.5:
[jenkins@testjenkins ~]$ uname -a
Linux testjenkins 4.18.0-348.el8.x86_64 #1 SMP Mon Oct 4 12:17:22 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
[jenkins@testjenkins ~]$ cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.5 (Ootpa)
  • podman version:
[jenkins@testjenkins ~]$ podman --version
podman version 3.4.2
```
[ERROR] Failed to execute goal org.wildfly.plugins:wildfly-maven-plugin:2.0.1.Final:execute-commands (apply-elytron) on project wildfly-ts-integ-smoke: Failed to execute commands: Exception in thread "main"
java.lang.NullPointerException
[ERROR] at java.base/java.util.Objects.requireNonNull(Objects.java:208)
[ERROR] at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:263)
[ERROR] at java.base/java.nio.file.Path.of(Path.java:147)
[ERROR] at java.base/java.nio.file.Paths.get(Paths.java:69)
[ERROR] at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(CgroupUtil.java:67)
[ERROR] at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
[ERROR] at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:69)
[ERROR] at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
[ERROR] at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
[ERROR] at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:175)
[ERROR] at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:149)
[ERROR] at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:84)
[ERROR] at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getInstance(CgroupV1Subsystem.java:60)
[ERROR] at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:116)
[ERROR] at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167)
[ERROR] at java.base/jdk.internal.platform.SystemMetrics.instance(SystemMetrics.java:29)
[ERROR] at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:58)
[ERROR] at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
[ERROR] at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:182)
[ERROR] at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:280)
[ERROR] at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(PlatformMBeanProviderImpl.java:199)
[ERROR] at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(ManagementFactory.java:488)
[ERROR] at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
[ERROR] at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
[ERROR] at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1779)
[ERROR] at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
[ERROR] at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
[ERROR] at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
[ERROR] at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
[ERROR] at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
[ERROR] at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
[ERROR] at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:489)
[ERROR] at org.jboss.modules.ModuleLoader$RealMBeanReg$1.run(ModuleLoader.java:1258)
[ERROR] at org.jboss.modules.ModuleLoader$RealMBeanReg$1.run(ModuleLoader.java:1256)
[ERROR] at java.base/java.security.AccessController.doPrivileged(AccessController.java:318)
[ERROR] at org.jboss.modules.ModuleLoader$RealMBeanReg.<init>(ModuleLoader.java:1256)
[ERROR] at org.jboss.modules.ModuleLoader$TempMBeanReg.installReal(ModuleLoader.java:1240)
[ERROR] at org.jboss.modules.ModuleLoader.installMBeanServer(ModuleLoader.java:273)
[ERROR] at org.jboss.modules.Main.main(Main.java:605)
```
[
{
"Id": "1a2b19d8915046a04773d7ac350c95d861bcb295d06e9ea2558178eaaa10a1ac",
"Created": "2022-05-02T20:48:21.078644068+08:00",
"Path": "/var/jenkins_home/workspace/eap-7.4.x-testsuite/hera/wait.sh",
"Args": [
"/var/jenkins_home/workspace/eap-7.4.x-testsuite/hera/wait.sh"
],
"State": {
"OciVersion": "1.0.2-dev",
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 7632,
"ConmonPid": 7619,
"ExitCode": 0,
"Error": "",
"StartedAt": "2022-05-02T20:48:21.288271319+08:00",
"FinishedAt": "0001-01-01T00:00:00Z",
"Healthcheck": {
"Status": "",
"FailingStreak": 0,
"Log": null
}
},
"Image": "bfebb38e834973abcdd7928c858b36c7b1f2540f409ce440671bb6237bdbe03f",
"ImageName": "localhost/automatons:latest",
"Rootfs": "",
"Pod": "",
"ResolvConfPath": "/run/user/1000/containers/overlay-containers/1a2b19d8915046a04773d7ac350c95d861bcb295d06e9ea2558178eaaa10a1ac/userdata/resolv.conf",
"HostnamePath": "/run/user/1000/containers/overlay-containers/1a2b19d8915046a04773d7ac350c95d861bcb295d06e9ea2558178eaaa10a1ac/userdata/hostname",
"HostsPath": "/run/user/1000/containers/overlay-containers/1a2b19d8915046a04773d7ac350c95d861bcb295d06e9ea2558178eaaa10a1ac/userdata/hosts",
"StaticDir": "/home/jenkins/.local/share/containers/storage/overlay-containers/1a2b19d8915046a04773d7ac350c95d861bcb295d06e9ea2558178eaaa10a1ac/userdata",
"OCIConfigPath": "/home/jenkins/.local/share/containers/storage/overlay-containers/1a2b19d8915046a04773d7ac350c95d861bcb295d06e9ea2558178eaaa10a1ac/userdata/config.json",
"OCIRuntime": "runc",
"ConmonPidFile": "/run/user/1000/containers/overlay-containers/1a2b19d8915046a04773d7ac350c95d861bcb295d06e9ea2558178eaaa10a1ac/userdata/conmon.pid",
"PidFile": "/run/user/1000/containers/overlay-containers/1a2b19d8915046a04773d7ac350c95d861bcb295d06e9ea2558178eaaa10a1ac/userdata/pidfile",
"Name": "automaton-slave-eap-7.4.x-testsuite-19",
"RestartCount": 0,
"Driver": "overlay",
"MountLabel": "system_u:object_r:container_file_t:s0:c334,c907",
"ProcessLabel": "system_u:system_r:container_t:s0:c334,c907",
"AppArmorProfile": "",
"EffectiveCaps": null,
"BoundingCaps": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FOWNER",
"CAP_FSETID",
"CAP_KILL",
"CAP_NET_BIND_SERVICE",
"CAP_NET_RAW",
"CAP_SETFCAP",
"CAP_SETGID",
"CAP_SETPCAP",
"CAP_SETUID",
"CAP_SYS_CHROOT"
],
"ExecIDs": [
"cd38093eb72bc4947e6a09a297cd767ec49ef26cac452b2839d7c47713cb0de6"
],
"GraphDriver": {
"Name": "overlay",
"Data": {
"LowerDir": "/home/jenkins/.local/share/containers/storage/overlay/904c65c233abd3aa8ecbe08a1f3d6a4a4a7704abe47e673ffcea9030d082a9b8/diff:/home/jenkins/.local/share/containers/storage/overlay/930a368523e8d9ca054beac02aa2ec0009395486ab58c5a9d7646102e7a33601/diff:/home/jenkins/.local/share/containers/storage/overlay/ee494184ff635908355ff3828a5fdb21dae83f453bac2cdd4f147b315fbb75d2/diff:/home/jenkins/.local/share/containers/storage/overlay/1e4f5e9e1e495a879ebbb6c9c406a3b77016d74024d2d8ebdc4a56ee7434a580/diff:/home/jenkins/.local/share/containers/storage/overlay/5370b65977de2ed522b13ebaeca0fd41501f6eab2c55513c5d927594c2c253e4/diff:/home/jenkins/.local/share/containers/storage/overlay/93749af418e72b7f9d1998cdf41d4007dc27065fe4d79a3a05abf4bf274a2fac/diff",
"MergedDir": "/home/jenkins/.local/share/containers/storage/overlay/5f2f3e72dd86114d232fed1d07a450bcb60c84c3d55123ed797c82941a8eb9a6/merged",
"UpperDir": "/home/jenkins/.local/share/containers/storage/overlay/5f2f3e72dd86114d232fed1d07a450bcb60c84c3d55123ed797c82941a8eb9a6/diff",
"WorkDir": "/home/jenkins/.local/share/containers/storage/overlay/5f2f3e72dd86114d232fed1d07a450bcb60c84c3d55123ed797c82941a8eb9a6/work"
}
},
"Mounts": [
{
"Type": "bind",
"Source": "/opt",
"Destination": "/opt",
"Driver": "",
"Mode": "",
"Options": [
"rbind"
],
"RW": false,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/home/jenkins/.m2",
"Destination": "/var/jenkins_home/.m2/",
"Driver": "",
"Mode": "",
"Options": [
"rbind"
],
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/home/jenkins/.ssh",
"Destination": "/var/jenkins_home/.ssh/",
"Driver": "",
"Mode": "",
"Options": [
"rbind"
],
"RW": false,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/home/jenkins/.gitconfig",
"Destination": "/var/jenkins_home/.gitconfig",
"Driver": "",
"Mode": "",
"Options": [
"rbind"
],
"RW": false,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/home/jenkins/.netrc",
"Destination": "/var/jenkins_home/.netrc",
"Driver": "",
"Mode": "",
"Options": [
"rbind"
],
"RW": false,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/home/jenkins/current/jobs/eap-7.4.x-build/builds/7/archive",
"Destination": "/parent_job/",
"Driver": "",
"Mode": "",
"Options": [
"rbind"
],
"RW": false,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/home/jenkins/current/workspace/eap-7.4.x-testsuite",
"Destination": "/var/jenkins_home/workspace/eap-7.4.x-testsuite",
"Driver": "",
"Mode": "",
"Options": [
"rbind"
],
"RW": true,
"Propagation": "rprivate"
}
],
"Dependencies": [],
"NetworkSettings": {
"EndpointID": "",
"Gateway": "",
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "",
"Bridge": "",
"SandboxID": "",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {},
"SandboxKey": ""
},
"ExitCommand": [
"/usr/bin/podman",
"--root",
"/home/jenkins/.local/share/containers/storage",
"--runroot",
"/run/user/1000/containers",
"--log-level",
"warning",
"--cgroup-manager",
"cgroupfs",
"--tmpdir",
"/run/user/1000/libpod/tmp",
"--runtime",
"runc",
"--storage-driver",
"overlay",
"--events-backend",
"file",
"container",
"cleanup",
"--rm",
"1a2b19d8915046a04773d7ac350c95d861bcb295d06e9ea2558178eaaa10a1ac"
],
"Namespace": "",
"IsInfra": false,
"Config": {
"Hostname": "1a2b19d89150",
"Domainname": "",
"User": "1000:1000",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm",
"container=oci",
"HOME=/var/jenkins_home/",
"HOSTNAME=1a2b19d89150"
],
"Cmd": [
"/var/jenkins_home/workspace/eap-7.4.x-testsuite/hera/wait.sh"
],
"Image": "localhost/automatons:latest",
"Volumes": null,
"WorkingDir": "/var/jenkins_home/workspace/eap-7.4.x-testsuite",
"Entrypoint": "",
"OnBuild": null,
"Labels": {
"architecture": "x86_64",
"build-date": "2022-03-16T16:53:14.681638",
"com.redhat.build-host": "cpt-1001.osbs.prod.upshift.rdu2.redhat.com",
"com.redhat.component": "ubi8-container",
"com.redhat.license_terms": "https://www.redhat.com/en/about/red-hat-end-user-license-agreements#UBI",
"description": "The Universal Base Image is designed and engineered to be the base layer for all of your containerized applications, middleware and utilities. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly.",
"distribution-scope": "public",
"io.buildah.version": "1.23.1",
"io.k8s.description": "The Universal Base Image is designed and engineered to be the base layer for all of your containerized applications, middleware and utilities. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly.",
"io.k8s.display-name": "Red Hat Universal Base Image 8",
"io.openshift.expose-services": "",
"io.openshift.tags": "base rhel8",
"maintainer": "Red Hat, Inc.",
"name": "ubi8",
"release": "236.1647448331",
"summary": "Provides the latest release of Red Hat Universal Base Image 8.",
"url": "https://access.redhat.com/containers/#/registry.access.redhat.com/ubi8/images/8.5-236.1647448331",
"vcs-ref": "3aadd00326f3dd6cfe65ee31017ab98915fddb56",
"vcs-type": "git",
"vendor": "Red Hat, Inc.",
"version": "8.5"
},
"Annotations": {
"io.container.manager": "libpod",
"io.kubernetes.cri-o.Created": "2022-05-02T20:48:21.078644068+08:00",
"io.kubernetes.cri-o.TTY": "false",
"io.podman.annotations.autoremove": "TRUE",
"io.podman.annotations.init": "FALSE",
"io.podman.annotations.privileged": "FALSE",
"io.podman.annotations.publish-all": "FALSE",
"org.opencontainers.image.base.digest": "sha256:e2cbaf307f898bb43ad5e0b67bd9325c585d5e1890de0dfe7d4832d634abd39e",
"org.opencontainers.image.base.name": "registry.access.redhat.com/ubi8/ubi:latest",
"org.opencontainers.image.stopSignal": "15"
},
"StopSignal": 15,
"CreateCommand": [
"podman",
"run",
"--name",
"automaton-slave-eap-7.4.x-testsuite-19",
"--userns=keep-id",
"-u",
"1000:1000",
"--add-host=olympus:10.88.0.1",
"--rm",
"-v",
"/home/jenkins/current//jobs/eap-7.4.x-build/builds/7/archive:/parent_job/:ro",
"--workdir",
"/var/jenkins_home/workspace/eap-7.4.x-testsuite",
"-v",
"/home/jenkins/current/workspace/eap-7.4.x-testsuite:/var/jenkins_home/workspace/eap-7.4.x-testsuite:rw",
"-v",
"/opt:/opt:ro",
"-v",
"/home/jenkins/.m2/:/var/jenkins_home/.m2/:rw",
"-v",
"/home/jenkins/.ssh/:/var/jenkins_home/.ssh/:ro",
"-v",
"/home/jenkins/.gitconfig:/var/jenkins_home/.gitconfig:ro",
"-v",
"/home/jenkins/.netrc:/var/jenkins_home/.netrc:ro",
"-d",
"localhost/automatons",
"/var/jenkins_home/workspace/eap-7.4.x-testsuite/hera/wait.sh"
],
"Umask": "0022",
"Timeout": 0,
"StopTimeout": 10
},
"HostConfig": {
"Binds": [
"/opt:/opt:ro,rprivate,rbind",
"/home/jenkins/.m2:/var/jenkins_home/.m2/:rw,rprivate,rbind",
"/home/jenkins/.ssh:/var/jenkins_home/.ssh/:ro,rprivate,rbind",
"/home/jenkins/.gitconfig:/var/jenkins_home/.gitconfig:ro,rprivate,rbind",
"/home/jenkins/.netrc:/var/jenkins_home/.netrc:ro,rprivate,rbind",
"/home/jenkins/current/jobs/eap-7.4.x-build/builds/7/archive:/parent_job/:ro,rprivate,rbind",
"/home/jenkins/current/workspace/eap-7.4.x-testsuite:/var/jenkins_home/workspace/eap-7.4.x-testsuite:rw,rprivate,rbind"
],
"CgroupManager": "cgroupfs",
"CgroupMode": "host",
"ContainerIDFile": "",
"LogConfig": {
"Type": "k8s-file",
"Config": null,
"Path": "/home/jenkins/.local/share/containers/storage/overlay-containers/1a2b19d8915046a04773d7ac350c95d861bcb295d06e9ea2558178eaaa10a1ac/userdata/ctr.log",
"Tag": "",
"Size": "0B"
},
"NetworkMode": "slirp4netns",
"PortBindings": {},
"RestartPolicy": {
"Name": "",
"MaximumRetryCount": 0
},
"AutoRemove": true,
"VolumeDriver": "",
"VolumesFrom": null,
"CapAdd": [],
"CapDrop": [
"CAP_AUDIT_WRITE",
"CAP_MKNOD"
],
"Dns": [],
"DnsOptions": [],
"DnsSearch": [],
"ExtraHosts": [
"olympus:10.88.0.1"
],
"GroupAdd": [],
"IpcMode": "private",
"Cgroup": "",
"Cgroups": "default",
"Links": null,
"OomScoreAdj": 0,
"PidMode": "private",
"Privileged": false,
"PublishAllPorts": false,
"ReadonlyRootfs": false,
"SecurityOpt": [],
"Tmpfs": {},
"UTSMode": "private",
"UsernsMode": "private",
"ShmSize": 65536000,
"Runtime": "oci",
"ConsoleSize": [
0,
0
],
"Isolation": "",
"CpuShares": 0,
"Memory": 0,
"NanoCpus": 0,
"CgroupParent": "",
"BlkioWeight": 0,
"BlkioWeightDevice": null,
"BlkioDeviceReadBps": null,
"BlkioDeviceWriteBps": null,
"BlkioDeviceReadIOps": null,
"BlkioDeviceWriteIOps": null,
"CpuPeriod": 0,
"CpuQuota": 0,
"CpuRealtimePeriod": 0,
"CpuRealtimeRuntime": 0,
"CpusetCpus": "",
"CpusetMems": "",
"Devices": [],
"DiskQuota": 0,
"KernelMemory": 0,
"MemoryReservation": 0,
"MemorySwap": 0,
"MemorySwappiness": 0,
"OomKillDisable": false,
"PidsLimit": 0,
"Ulimits": [],
"CpuCount": 0,
"CpuPercent": 0,
"IOMaximumIOps": 0,
"IOMaximumBandwidth": 0,
"CgroupConf": null
}
}
]
#subsys_name hierarchy num_cgroups enabled
cpuset 6 4 1
cpu 8 126 1
cpuacct 8 126 1
blkio 11 126 1
memory 9 167 1
devices 3 126 1
freezer 7 4 1
net_cls 5 4 1
perf_event 4 4 1
net_prio 5 4 1
hugetlb 2 4 1
pids 10 159 1
rdma 12 1 1
12:rdma:/
11:blkio:/system.slice/sshd.service
10:pids:/user.slice/user-1000.slice/session-3.scope
9:memory:/user.slice/user-1000.slice/session-3.scope
8:cpu,cpuacct:/
7:freezer:/
6:cpuset:/
5:net_cls,net_prio:/
4:perf_event:/
3:devices:/system.slice/sshd.service
2:hugetlb:/
1:name=systemd:/user.slice/user-1000.slice/[email protected]/user.slice/podman-637674.scope
931 921 0:86 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c375,c611",mode=755,uid=100000,gid=100000
932 931 0:26 /user.slice/user-1000.slice/[email protected]/user.slice/podman-637196.scope/2be1ee6e076d20b38a61a1e3289974662d646fb50489ccf22f7fa4e3dc082295 /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
933 931 0:29 / /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,hugetlb
934 931 0:30 /user.slice /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,devices
935 931 0:31 / /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,perf_event
936 931 0:32 / /sys/fs/cgroup/net_cls,net_prio ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,net_cls,net_prio
937 931 0:33 / /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpuset
938 931 0:34 / /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,freezer
940 931 0:35 /user.slice /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpu,cpuacct
941 931 0:36 /user.slice/user-1000.slice/session-50.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory
942 931 0:37 /user.slice/user-1000.slice/session-50.scope /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,pids
943 931 0:38 /user.slice /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,blkio
944 931 0:39 / /sys/fs/cgroup/rdma ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,rdma
diff --git a/test/jdk/jdk/internal/platform/cgroup/TestCgroupSubsystemFactory.java b/test/jdk/jdk/internal/platform/cgroup/TestCgroupSubsystemFactory.java
index 369d4244533..a84cb70d7f5 100644
--- a/test/jdk/jdk/internal/platform/cgroup/TestCgroupSubsystemFactory.java
+++ b/test/jdk/jdk/internal/platform/cgroup/TestCgroupSubsystemFactory.java
@@ -43,6 +43,7 @@ import jdk.internal.platform.CgroupSubsystemFactory;
import jdk.internal.platform.CgroupSubsystemFactory.CgroupTypeResult;
import jdk.internal.platform.CgroupV1MetricsImpl;
import jdk.internal.platform.cgroupv1.CgroupV1Subsystem;
+import jdk.internal.platform.cgroupv1.CgroupV1SubsystemController;
import jdk.internal.platform.Metrics;
import jdk.test.lib.Utils;
import jdk.test.lib.util.FileUtils;
@@ -72,8 +73,10 @@ public class TestCgroupSubsystemFactory {
private Path cgroupv1MntInfoDoubleCpusets;
private Path cgroupv1MntInfoDoubleCpusets2;
private Path cgroupv1MntInfoColonsHierarchy;
+ private Path cgroupv1MntInfoPrefix;
private Path cgroupv1SelfCgroup;
private Path cgroupv1SelfColons;
+ private Path cgroupv1SelfPrefix;
private Path cgroupv2SelfCgroup;
private Path cgroupv1SelfCgroupJoinCtrl;
private Path cgroupv1CgroupsOnlyCPUCtrl;
@@ -166,6 +169,20 @@ public class TestCgroupSubsystemFactory {
"42 30 0:38 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:14 - cgroup none rw,seclabel,cpuset\n" +
"43 30 0:39 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:15 - cgroup none rw,seclabel,blkio\n" +
"44 30 0:40 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:16 - cgroup none rw,seclabel,freezer\n";
+ private String mntInfoPrefix =
+ "931 921 0:86 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,context=\"system_u:object_r:container_file_t:s0:c375,c611\",mode=755,uid=100000,gid=100000\n" +
+ "932 931 0:26 /user.slice/user-1000.slice/[email protected]/user.slice/podman-637196.scope/2be1ee6e076d20b38a61a1e3289974662d646fb50489ccf22f7fa4e3dc082295 /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd\n" +
+ "933 931 0:29 / /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,hugetlb\n" +
+ "934 931 0:30 /user.slice /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,devices\n" +
+ "935 931 0:31 / /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,perf_event\n" +
+ "936 931 0:32 / /sys/fs/cgroup/net_cls,net_prio ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,net_cls,net_prio\n" +
+ "937 931 0:33 / /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpuset\n" +
+ "938 931 0:34 / /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,freezer\n" +
+ "940 931 0:35 /user.slice /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpu,cpuacct\n" +
+ "941 931 0:36 /user.slice/user-1000.slice/session-50.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory\n" +
+ "942 931 0:37 /user.slice/user-1000.slice/session-50.scope /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,pids\n" +
+ "943 931 0:38 /user.slice /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,blkio\n" +
+ "944 931 0:39 / /sys/fs/cgroup/rdma ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,rdma";
private String cgroupsNonZeroHierarchy =
"#subsys_name hierarchy num_cgroups enabled\n" +
"cpuset 9 1 1\n" +
@@ -217,6 +234,19 @@ public class TestCgroupSubsystemFactory {
"2:cpu,cpuacct:/\n" +
"1:name=systemd:/user.slice/user-1000.slice/[email protected]/apps.slice/apps-org.gnome.Terminal.slice/vte-spawn-3c00b338-5b65-439f-8e97-135e183d135d.scope\n" +
"0::/user.slice/user-1000.slice/[email protected]/apps.slice/apps-org.gnome.Terminal.slice/vte-spawn-3c00b338-5b65-439f-8e97-135e183d135d.scope\n";
+ private String cgroupv1SelfPrefixContent =
+ "12:rdma:/\n" +
+ "11:blkio:/system.slice/sshd.service\n" +
+ "10:pids:/user.slice/user-1000.slice/session-3.scope\n" +
+ "9:memory:/user.slice/user-1000.slice/session-3.scope\n" +
+ "8:cpu,cpuacct:/\n" +
+ "7:freezer:/\n" +
+ "6:cpuset:/\n" +
+ "5:net_cls,net_prio:/\n" +
+ "4:perf_event:/\n" +
+ "3:devices:/system.slice/sshd.service\n" +
+ "2:hugetlb:/\n" +
+ "1:name=systemd:/user.slice/user-1000.slice/[email protected]/user.slice/podman-637674.scope";
private String cgroupv2SelfCgroupContent = "0::/user.slice/user-1000.slice/session-2.scope";
@Before
@@ -257,12 +287,18 @@ public class TestCgroupSubsystemFactory {
cgroupv1MntInfoColonsHierarchy = Paths.get(existingDirectory.toString(), "mountinfo_colons");
Files.writeString(cgroupv1MntInfoColonsHierarchy, mntInfoColons);
+ cgroupv1MntInfoPrefix = Paths.get(existingDirectory.toString(), "mountinfo-prefix");
+ Files.writeString(cgroupv1MntInfoPrefix, mntInfoPrefix);
+
cgroupv1SelfCgroup = Paths.get(existingDirectory.toString(), "self_cgroup_cgv1");
Files.writeString(cgroupv1SelfCgroup, cgroupv1SelfCgroupContent);
cgroupv1SelfColons = Paths.get(existingDirectory.toString(), "self_colons_cgv1");
Files.writeString(cgroupv1SelfColons, cgroupv1SelfColonsContent);
+ cgroupv1SelfPrefix = Paths.get(existingDirectory.toString(), "self_prefix_cgv1");
+ Files.writeString(cgroupv1SelfPrefix, cgroupv1SelfPrefixContent);
+
cgroupv2SelfCgroup = Paths.get(existingDirectory.toString(), "self_cgroup_cgv2");
Files.writeString(cgroupv2SelfCgroup, cgroupv2SelfCgroupContent);
@@ -393,6 +429,24 @@ public class TestCgroupSubsystemFactory {
assertEquals(memoryInfo.getMountRoot(), memoryInfo.getCgroupPath());
}
+ @Test
+ public void testMountPrefixCgroupsV1() throws IOException {
+ String cgroups = cgroupv1CgInfoNonZeroHierarchy.toString();
+ String mountInfo = cgroupv1MntInfoPrefix.toString();
+ String selfCgroup = cgroupv1SelfPrefix.toString();
+ Optional<CgroupTypeResult> result = CgroupSubsystemFactory.determineType(mountInfo, cgroups, selfCgroup);
+
+ assertTrue("Expected non-empty cgroup result", result.isPresent());
+ CgroupTypeResult res = result.get();
+ CgroupInfo memoryInfo = res.getInfos().get("memory");
+ assertEquals(memoryInfo.getCgroupPath(), "/user.slice/user-1000.slice/session-3.scope");
+ assertEquals("/sys/fs/cgroup/memory", memoryInfo.getMountPoint());
+ CgroupV1SubsystemController cgroupv1MemoryController = new CgroupV1SubsystemController(memoryInfo.getMountRoot(), memoryInfo.getMountPoint());
+ cgroupv1MemoryController.setPath(memoryInfo.getCgroupPath());
+ // issue to verify: path was not set because the cgroupPath does not start with mount root
+ assertNotNull(cgroupv1MemoryController.path());
+ }
+
@Test
public void testZeroHierarchyCgroupsV1() throws IOException {
String cgroups = cgroupv1CgInfoZeroHierarchy.toString();
@iklam
Copy link

iklam commented May 19, 2022

I am wondering if the problem is this:

  • You have systemd running on the host, and a different copy of systemd that runs inside the container.
  • They both set up /user.slice/user-1000.slice/session-??.scope within their own file systems

For some reason, when you're looking inside the container, /proc/self/cgroup might use a path in the containerized file system whereas /proc/self/mountinfo uses a path in the host file system. These two paths may look alike but they have absolutely no relation to each other.

To understand this, we need more infotmation.

In the host terminal that you ran the podman command, could you do this?

cat /proc/cgroups
cat /proc/self/cgroup
cat /proc/self/mountinfo

ls -l /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-3.scope
ls -l /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-50.scope
cat /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-3.scope/tasks 
cat /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-50.scope/tasks 
echo $$

And, run a bash inside the container and

ps -ef
cat /proc/cgroups
ls -l /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-3.scope
ls -l /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-50.scope
cat /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-3.scope/tasks 
cat /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-50.scope/tasks 
echo $$

The second echo $$ prints the PID of the bash. Please try to find the host PID of this process, and then:

cat /proc/$HOSTPID/cgroup
cat /proc/$HOSTPID/mountinfo

@iklam
Copy link

iklam commented May 19, 2022

Also, on the host:

find /sys/fs/cgroup/memory -name tasks -exec grep -H $HOSTPID {} \;

and in the container:

find /sys/fs/cgroup/memory -name tasks -exec grep -H $$ {} \;

@gaol
Copy link
Author

gaol commented May 23, 2022

Info in host

proc/cgroups in host

[jenkins@testjenkins ~]$ cat /proc/cgroups
#subsys_name	hierarchy	num_cgroups	enabled
cpuset	9	7	1
cpu	5	126	1
cpuacct	5	126	1
blkio	7	126	1
memory	11	156	1
devices	3	126	1
freezer	6	7	1
net_cls	2	7	1
perf_event	8	7	1
net_prio	2	7	1
hugetlb	4	7	1
pids	12	139	1
rdma	10	1	1

/proc/self/cgroup in host:

[jenkins@testjenkins ~]$ cat /proc/self/cgroup
12:pids:/user.slice/user-1000.slice/session-12.scope
11:memory:/user.slice/user-1000.slice/session-12.scope
10:rdma:/
9:cpuset:/
8:perf_event:/
7:blkio:/user.slice
6:freezer:/
5:cpu,cpuacct:/user.slice
4:hugetlb:/
3:devices:/user.slice
2:net_cls,net_prio:/
1:name=systemd:/user.slice/user-1000.slice/session-12.scope

/proc/self/mountinfo in host:

[jenkins@testjenkins ~]$ cat /proc/self/mountinfo
22 97 0:21 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw,seclabel
23 97 0:5 / /proc rw,nosuid,nodev,noexec,relatime shared:26 - proc proc rw
24 97 0:6 / /dev rw,nosuid shared:22 - devtmpfs devtmpfs rw,seclabel,size=1883584k,nr_inodes=470896,mode=755
25 22 0:7 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:3 - securityfs securityfs rw
26 24 0:22 / /dev/shm rw,nosuid,nodev shared:23 - tmpfs tmpfs rw,seclabel
27 24 0:23 / /dev/pts rw,nosuid,noexec,relatime shared:24 - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
28 97 0:24 / /run rw,nosuid,nodev shared:25 - tmpfs tmpfs rw,seclabel,mode=755
29 22 0:25 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:4 - tmpfs tmpfs ro,seclabel,mode=755
30 29 0:26 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:5 - cgroup cgroup rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
31 22 0:27 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:17 - pstore pstore rw,seclabel
32 22 0:28 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:18 - bpf bpf rw,mode=700
33 29 0:29 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup rw,seclabel,net_cls,net_prio
34 29 0:30 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:7 - cgroup cgroup rw,seclabel,devices
35 29 0:31 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:8 - cgroup cgroup rw,seclabel,hugetlb
36 29 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,seclabel,cpu,cpuacct
37 29 0:33 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,seclabel,freezer
38 29 0:34 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,seclabel,blkio
39 29 0:35 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup rw,seclabel,perf_event
40 29 0:36 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,seclabel,cpuset
41 29 0:37 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,seclabel,rdma
42 29 0:38 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,seclabel,memory
43 29 0:39 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,seclabel,pids
44 22 0:12 / /sys/kernel/tracing rw,relatime shared:19 - tracefs none rw,seclabel
93 22 0:40 / /sys/kernel/config rw,relatime shared:20 - configfs configfs rw
97 1 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/rhel-root rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
45 22 0:20 / /sys/fs/selinux rw,relatime shared:21 - selinuxfs selinuxfs rw
46 24 0:19 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
47 22 0:8 / /sys/kernel/debug rw,relatime shared:28 - debugfs debugfs rw,seclabel
48 23 0:42 / /proc/sys/fs/binfmt_misc rw,relatime shared:29 - autofs systemd-1 rw,fd=45,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=19208
49 24 0:43 / /dev/hugepages rw,relatime shared:30 - hugetlbfs hugetlbfs rw,seclabel,pagesize=2M
50 22 0:44 / /sys/fs/fuse/connections rw,relatime shared:31 - fusectl fusectl rw
118 97 252:1 / /boot rw,relatime shared:63 - xfs /dev/vda1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
253 97 0:46 / /var/lib/nfs/rpc_pipefs rw,relatime shared:131 - rpc_pipefs sunrpc rw
460 97 253:0 /var/lib/containers/storage/overlay /var/lib/containers/storage/overlay rw,relatime - xfs /dev/mapper/rhel-root rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
652 28 0:49 / /run/user/1000 rw,nosuid,nodev,relatime shared:348 - tmpfs tmpfs rw,seclabel,size=382616k,mode=700,uid=1000,gid=1000
670 28 0:50 / /run/user/42 rw,nosuid,nodev,relatime shared:358 - tmpfs tmpfs rw,seclabel,size=382616k,mode=700,uid=42,gid=42
688 97 0:51 / /var/lib/containers/storage/overlay-containers/bd2cc69ef2cb764aefaafc9ded9515c6f048ec550910063526cf9ff57bb297f0/userdata/shm rw,nosuid,nodev,noexec,relatime shared:368 - tmpfs shm rw,context="system_u:object_r:container_file_t:s0:c74,c602",size=64000k
705 28 0:24 /netns /run/netns rw,nosuid,nodev shared:25 - tmpfs tmpfs rw,seclabel,mode=755
722 705 0:4 net:[4026532618] /run/netns/cni-fce0f50f-dd99-6bf5-2051-cca7935085c9 rw shared:385 - nsfs nsfs rw,seclabel
723 28 0:4 net:[4026532618] /run/netns/cni-fce0f50f-dd99-6bf5-2051-cca7935085c9 rw shared:385 - nsfs nsfs rw,seclabel
763 460 0:52 / /var/lib/containers/storage/overlay/4bab10ae7c9deba98287bdd5608aa6cb85845da7360062ed6533c43b0a89a324/merged rw,nodev,relatime - overlay overlay rw,context="system_u:object_r:container_file_t:s0:c74,c602",lowerdir=/var/lib/containers/storage/overlay/l/KLYAH5DSHAYEOIEKF2N5F6NJVI:/var/lib/containers/storage/overlay/l/TXDBAA57PUA27C5RVAUDWEKYGX:/var/lib/containers/storage/overlay/l/GVR2UWBQWTWFFUGWPQSXEQKIUT:/var/lib/containers/storage/overlay/l/JJY73LGLBOTQOEPVWB222NOS56:/var/lib/containers/storage/overlay/l/YABFT2KH6CV6GOAHFY3NQ5RVDP:/var/lib/containers/storage/overlay/l/RCQLPCS2PARQ62D2Q2OJ7H3KQQ,upperdir=/var/lib/containers/storage/overlay/4bab10ae7c9deba98287bdd5608aa6cb85845da7360062ed6533c43b0a89a324/diff,workdir=/var/lib/containers/storage/overlay/4bab10ae7c9deba98287bdd5608aa6cb85845da7360062ed6533c43b0a89a324/work,metacopy=on,volatile
400 652 0:47 / /run/user/1000/gvfs rw,nosuid,nodev,relatime shared:215 - fuse.gvfsd-fuse gvfsd-fuse rw,user_id=1000,group_id=1000
453 97 0:48 / /var/lib/containers/storage/overlay-containers/a3d0c81c57f943ccde8a60b319aa9d93c9289b4d59309f26b9f4d9c100bc9016/userdata/shm rw,nosuid,nodev,noexec,relatime shared:225 - tmpfs shm rw,context="system_u:object_r:container_file_t:s0:c82,c218",size=64000k
571 705 0:4 net:[4026532483] /run/netns/cni-f5964bd0-a76c-0507-fc1c-0888bd841af5 rw shared:234 - nsfs nsfs rw,seclabel
572 28 0:4 net:[4026532483] /run/netns/cni-f5964bd0-a76c-0507-fc1c-0888bd841af5 rw shared:234 - nsfs nsfs rw,seclabel
630 460 0:54 / /var/lib/containers/storage/overlay/592029cfb841d024415a4534c09d6bfff358d9c9a98f652e8c20c9485967c13d/merged rw,nodev,relatime - overlay overlay rw,context="system_u:object_r:container_file_t:s0:c82,c218",lowerdir=/var/lib/containers/storage/overlay/l/BV7PN2N5V5H5SCXU4MGE5S7E3B:/var/lib/containers/storage/overlay/l/HFG5U6PHXNB6I2Y3XI32MQXQFZ:/var/lib/containers/storage/overlay/l/C72MXIEQMOJSYOBAJZ3NMLGT6A:/var/lib/containers/storage/overlay/l/GOE4IYTPKGFI5JOS7TL25LLHU4:/var/lib/containers/storage/overlay/l/M2SY5GNDX4D5UGN3334MFX7Z6Z:/var/lib/containers/storage/overlay/l/NVVTV3LECTDKTUOR3A4JQKX6X5:/var/lib/containers/storage/overlay/l/JA2O42N3CZXVYMADOLHHKQEVE6:/var/lib/containers/storage/overlay/l/NUMQFY65SI2A7I5WKYQJMIHQNI:/var/lib/containers/storage/overlay/l/F5ZYTYNPIKZLA2G22R22RR2V4A:/var/lib/containers/storage/overlay/l/MA5JEXO6KP5NGAQS5REJHORSZQ:/var/lib/containers/storage/overlay/l/HFPUL5ZX4BXEEV2APOEBSWNCUP:/var/lib/containers/storage/overlay/l/TKCAO7GQPKILH2A64CZPIODTQM:/var/lib/containers/storage/overlay/l/5X32XCEHBXSYL5G3M7AU3GA4ZQ:/var/lib/containers/storage/overlay/l/TZRSKTG7MQPN2KQPXPDYL2KESK:/var/lib/containers/storage/overlay/l/DLTYH4YDLUD5LKEZH7AE234LGP:/var/lib/containers/storage/overlay/l/QX4BMBKC67JUOPJHIXFSSY7YXQ:/var/lib/containers/storage/overlay/l/DX5IP5OKWVRWOJ2DD7BQ4AFE5U:/var/lib/containers/storage/overlay/l/BPCWOVDVLN3HP2T2HH7QMQDXAG:/var/lib/containers/storage/overlay/l/TV7GEUMD2S5FG4A54IZMZFQVTG:/var/lib/containers/storage/overlay/l/OFNFNEKZW7CGM3G5LAV5GSVJVN:/var/lib/containers/storage/overlay/l/LVO6KMRIM3NFUAWDXBBKBR47AT:/var/lib/containers/storage/overlay/l/K2E7EB336H2IKQWEHR7EN7BYUE,upperdir=/var/lib/containers/storage/overlay/592029cfb841d024415a4534c09d6bfff358d9c9a98f652e8c20c9485967c13d/diff,workdir=/var/lib/containers/storage/overlay/592029cfb841d024415a4534c09d6bfff358d9c9a98f652e8c20c9485967c13d/work,metacopy=on,volatile


The session is session-12.scope this time, below are the infos:

NOTE: There is no session-50.scope in host this time

[jenkins@testjenkins ~]$ ls -l /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-
session-12.scope/ session-13.scope/ session-14.scope/ session-3.scope/  session-6.scope/  session-9.scope/
[jenkins@testjenkins ~]$ ls -l /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-12.scope/
total 0
-rw-r--r--. 1 root root 0 May 23 12:44 cgroup.clone_children
--w--w--w-. 1 root root 0 May 23 12:44 cgroup.event_control
-rw-r--r--. 1 root root 0 May 23 12:44 cgroup.procs
-rw-r--r--. 1 root root 0 May 23 12:44 memory.failcnt
--w-------. 1 root root 0 May 23 12:44 memory.force_empty
-rw-r--r--. 1 root root 0 May 23 12:44 memory.kmem.failcnt
-rw-r--r--. 1 root root 0 May 23 12:44 memory.kmem.limit_in_bytes
-rw-r--r--. 1 root root 0 May 23 12:44 memory.kmem.max_usage_in_bytes
-r--r--r--. 1 root root 0 May 23 12:44 memory.kmem.slabinfo
-rw-r--r--. 1 root root 0 May 23 12:44 memory.kmem.tcp.failcnt
-rw-r--r--. 1 root root 0 May 23 12:44 memory.kmem.tcp.limit_in_bytes
-rw-r--r--. 1 root root 0 May 23 12:44 memory.kmem.tcp.max_usage_in_bytes
-r--r--r--. 1 root root 0 May 23 12:44 memory.kmem.tcp.usage_in_bytes
-r--r--r--. 1 root root 0 May 23 12:44 memory.kmem.usage_in_bytes
-rw-r--r--. 1 root root 0 May 23 12:44 memory.limit_in_bytes
-rw-r--r--. 1 root root 0 May 23 12:44 memory.max_usage_in_bytes
-rw-r--r--. 1 root root 0 May 23 12:44 memory.memsw.failcnt
-rw-r--r--. 1 root root 0 May 23 12:44 memory.memsw.limit_in_bytes
-rw-r--r--. 1 root root 0 May 23 12:44 memory.memsw.max_usage_in_bytes
-r--r--r--. 1 root root 0 May 23 12:44 memory.memsw.usage_in_bytes
-rw-r--r--. 1 root root 0 May 23 12:44 memory.move_charge_at_immigrate
-r--r--r--. 1 root root 0 May 23 12:44 memory.numa_stat
-rw-r--r--. 1 root root 0 May 23 12:44 memory.oom_control
----------. 1 root root 0 May 23 12:44 memory.pressure_level
-rw-r--r--. 1 root root 0 May 23 12:44 memory.soft_limit_in_bytes
-r--r--r--. 1 root root 0 May 23 12:44 memory.stat
-rw-r--r--. 1 root root 0 May 23 12:44 memory.swappiness
-r--r--r--. 1 root root 0 May 23 12:44 memory.usage_in_bytes
-rw-r--r--. 1 root root 0 May 23 12:44 memory.use_hierarchy
-rw-r--r--. 1 root root 0 May 23 12:44 notify_on_release
-rw-r--r--. 1 root root 0 May 23 12:44 tasks

tasks in session-12.scope:

[jenkins@testjenkins ~]$ cat /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-12.scope/tasks 
49238
49250
49253
54958
54959
54960
54961
54962
54963
54964
54987
54988
54989
54991
55001
# tasks file in `session-13.scope` is empty
[jenkins@testjenkins ~]$ cat /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-13.scope/tasks

Container info in host:

[jenkins@testjenkins ~]$ podman ps --ns
CONTAINER ID  NAMES                                   PID         CGROUPNS    IPC         MNT         NET         PIDNS       USERNS      UTS
9055a14c4105  automaton-slave-eap-7.4.x-testsuite-26  50293       4026531835  4026532554  4026532552  4026532558  4026532555  4026532551  4026532553

So, $HOSTPID=50293

proc/<PID>/cgroup in host:

[jenkins@testjenkins ~]$ cat /proc/50293/cgroup 
12:pids:/user.slice/user-1000.slice/session-13.scope
11:memory:/user.slice/user-1000.slice/session-13.scope
10:rdma:/
9:cpuset:/
8:perf_event:/
7:blkio:/user.slice
6:freezer:/
5:cpu,cpuacct:/user.slice
4:hugetlb:/
3:devices:/user.slice
2:net_cls,net_prio:/
1:name=systemd:/user.slice/user-1000.slice/[email protected]/user.slice/podman-50207.scope/9055a14c4105546b79bf5719e79e6bb76fed7e0e2622dbcb0d03be707cb5705f

/proc/<PID>/mountinfo in host:

[jenkins@testjenkins ~]$ cat /proc/50293/mountinfo 
829 677 0:79 / / rw,relatime - overlay overlay rw,context="system_u:object_r:container_file_t:s0:c603,c866",lowerdir=/home/jenkins/.local/share/containers/storage/overlay/l/L7UAXWOAPWTVGDSC6F7JQ3LJTV:/home/jenkins/.local/share/containers/storage/overlay/l/L7UAXWOAPWTVGDSC6F7JQ3LJTV/../diff1:/home/jenkins/.local/share/containers/storage/overlay/l/7JONQMJHS27QF22JQOHYTGFTLO:/home/jenkins/.local/share/containers/storage/overlay/l/ZPHCH2FUO6ND4WGXYLMMGB7FPX:/home/jenkins/.local/share/containers/storage/overlay/l/NZMWBIXYL2VFPIFQ7TXGJF23YT:/home/jenkins/.local/share/containers/storage/overlay/l/C32JMNSXI7DS32WPE4RMXDCI2H:/home/jenkins/.local/share/containers/storage/overlay/l/MW4ON4VYMWQCGKAAY252G5Q35W,upperdir=/home/jenkins/.local/share/containers/storage/overlay/e90e79637776865f47a66802fb72b7003a080204ca2795d682c1e241388032ce/diff,workdir=/home/jenkins/.local/share/containers/storage/overlay/e90e79637776865f47a66802fb72b7003a080204ca2795d682c1e241388032ce/work,volatile
830 829 0:82 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
831 829 253:0 /home/jenkins/current/jobs/eap-7.4.x-build/builds/28/archive /parent_job ro,relatime - xfs /dev/mapper/rhel-root rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
832 829 253:0 /opt /opt ro,relatime - xfs /dev/mapper/rhel-root rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
833 829 0:83 / /sys ro,nosuid,nodev,noexec,relatime - sysfs sysfs ro,seclabel
834 829 0:84 / /dev rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c603,c866",size=65536k,mode=755,uid=100000,gid=100000
835 829 0:49 /containers/overlay-containers/9055a14c4105546b79bf5719e79e6bb76fed7e0e2622dbcb0d03be707cb5705f/userdata/resolv.conf /etc/resolv.conf rw,nosuid,nodev,relatime - tmpfs tmpfs rw,seclabel,size=382616k,mode=700,uid=1000,gid=1000
836 829 0:49 /containers/overlay-containers/9055a14c4105546b79bf5719e79e6bb76fed7e0e2622dbcb0d03be707cb5705f/userdata/hostname /etc/hostname rw,nosuid,nodev,relatime - tmpfs tmpfs rw,seclabel,size=382616k,mode=700,uid=1000,gid=1000
837 829 0:49 /containers/overlay-containers/9055a14c4105546b79bf5719e79e6bb76fed7e0e2622dbcb0d03be707cb5705f/userdata/hosts /etc/hosts rw,nosuid,nodev,relatime - tmpfs tmpfs rw,seclabel,size=382616k,mode=700,uid=1000,gid=1000
838 829 0:49 /containers/overlay-containers/9055a14c4105546b79bf5719e79e6bb76fed7e0e2622dbcb0d03be707cb5705f/userdata/.containerenv /run/.containerenv rw,nosuid,nodev,relatime - tmpfs tmpfs rw,seclabel,size=382616k,mode=700,uid=1000,gid=1000
839 834 0:85 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,context="system_u:object_r:container_file_t:s0:c603,c866",gid=100005,mode=620,ptmxmode=666
841 834 0:81 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw,seclabel
849 834 0:78 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs shm rw,context="system_u:object_r:container_file_t:s0:c603,c866",size=64000k,uid=1000,gid=1000
850 829 0:49 /containers/overlay-containers/9055a14c4105546b79bf5719e79e6bb76fed7e0e2622dbcb0d03be707cb5705f/userdata/run/secrets /run/secrets rw,nosuid,nodev,relatime - tmpfs tmpfs rw,seclabel,size=382616k,mode=700,uid=1000,gid=1000
851 833 0:86 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c603,c866",mode=755,uid=100000,gid=100000
852 851 0:26 /user.slice/user-1000.slice/[email protected]/user.slice/podman-50207.scope/9055a14c4105546b79bf5719e79e6bb76fed7e0e2622dbcb0d03be707cb5705f /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
853 851 0:29 / /sys/fs/cgroup/net_cls,net_prio ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,net_cls,net_prio
860 851 0:30 /user.slice /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,devices
861 851 0:31 / /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,hugetlb
862 851 0:32 /user.slice /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpu,cpuacct
863 851 0:33 / /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,freezer
864 851 0:34 /user.slice /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,blkio
865 851 0:35 / /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,perf_event
866 851 0:36 / /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpuset
867 851 0:37 / /sys/fs/cgroup/rdma ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,rdma
868 851 0:38 /user.slice/user-1000.slice/session-13.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory
869 851 0:39 /user.slice/user-1000.slice/session-13.scope /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,pids
870 829 253:0 /home/jenkins/.ssh /var/jenkins_home/.ssh ro,relatime - xfs /dev/mapper/rhel-root rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
871 829 253:0 /home/jenkins/.m2 /var/jenkins_home/.m2 rw,relatime - xfs /dev/mapper/rhel-root rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
872 829 253:0 /home/jenkins/.netrc /var/jenkins_home/.netrc ro,relatime - xfs /dev/mapper/rhel-root rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
873 829 253:0 /home/jenkins/.gitconfig /var/jenkins_home/.gitconfig ro,relatime - xfs /dev/mapper/rhel-root rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
876 829 253:0 /home/jenkins/current/workspace/eap-7.4.x-testsuite /var/jenkins_home/workspace/eap-7.4.x-testsuite rw,relatime - xfs /dev/mapper/rhel-root rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
877 834 0:6 /null /dev/null rw,nosuid master:22 - devtmpfs devtmpfs rw,seclabel,size=1883584k,nr_inodes=470896,mode=755
878 834 0:6 /random /dev/random rw,nosuid master:22 - devtmpfs devtmpfs rw,seclabel,size=1883584k,nr_inodes=470896,mode=755
879 834 0:6 /full /dev/full rw,nosuid master:22 - devtmpfs devtmpfs rw,seclabel,size=1883584k,nr_inodes=470896,mode=755
880 834 0:6 /tty /dev/tty rw,nosuid master:22 - devtmpfs devtmpfs rw,seclabel,size=1883584k,nr_inodes=470896,mode=755
881 834 0:6 /zero /dev/zero rw,nosuid master:22 - devtmpfs devtmpfs rw,seclabel,size=1883584k,nr_inodes=470896,mode=755
882 834 0:6 /urandom /dev/urandom rw,nosuid master:22 - devtmpfs devtmpfs rw,seclabel,size=1883584k,nr_inodes=470896,mode=755
883 830 0:82 /asound /proc/asound ro,nosuid,nodev,noexec,relatime - proc proc rw
884 830 0:82 /bus /proc/bus ro,nosuid,nodev,noexec,relatime - proc proc rw
885 830 0:82 /fs /proc/fs ro,nosuid,nodev,noexec,relatime - proc proc rw
888 830 0:82 /irq /proc/irq ro,nosuid,nodev,noexec,relatime - proc proc rw
919 830 0:82 /sys /proc/sys ro,nosuid,nodev,noexec,relatime - proc proc rw
920 830 0:82 /sysrq-trigger /proc/sysrq-trigger ro,nosuid,nodev,noexec,relatime - proc proc rw
921 830 0:87 / /proc/acpi ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c603,c866",uid=100000,gid=100000
922 830 0:6 /null /proc/kcore rw,nosuid master:22 - devtmpfs devtmpfs rw,seclabel,size=1883584k,nr_inodes=470896,mode=755
923 830 0:6 /null /proc/keys rw,nosuid master:22 - devtmpfs devtmpfs rw,seclabel,size=1883584k,nr_inodes=470896,mode=755
924 830 0:6 /null /proc/timer_list rw,nosuid master:22 - devtmpfs devtmpfs rw,seclabel,size=1883584k,nr_inodes=470896,mode=755
925 830 0:6 /null /proc/sched_debug rw,nosuid master:22 - devtmpfs devtmpfs rw,seclabel,size=1883584k,nr_inodes=470896,mode=755
926 830 0:88 / /proc/scsi ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c603,c866",uid=100000,gid=100000
927 833 0:89 / /sys/firmware ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c603,c866",uid=100000,gid=100000
928 833 0:90 / /sys/fs/selinux ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c603,c866",uid=100000,gid=100000
929 833 0:91 / /sys/dev/block ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c603,c866",uid=100000,gid=100000

task that contains the $HOSTPID

[jenkins@testjenkins ~]$ find /sys/fs/cgroup/memory -name tasks -exec grep -H 50293 {} \;
/sys/fs/cgroup/memory/user.slice/user-1000.slice/session-13.scope/tasks:50293

Information inside container

[jenkins@9055a14c4105 eap-7.4.x-testsuite]$ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
jenkins        1       0  0 04:43 ?        00:00:00 /bin/bash /var/jenkins_home/workspace/eap-7.4.x-testsuite/hera/wait
jenkins        8       0  0 04:43 pts/0    00:00:00 /bin/bash /var/jenkins_home/workspace/eap-7.4.x-testsuite/hera/buil
jenkins       46       8  0 04:43 pts/0    00:00:00 /bin/bash /var/jenkins_home/workspace/eap-7.4.x-testsuite/harmonia/
jenkins       47       8  0 04:43 pts/0    00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=tee /usr/bin/tee /var/j
jenkins       99      46  4 04:43 pts/0    00:00:01 /opt/oracle/jdk-17.0.2/bin/java -Dmaven.wagon.http.ssl.insecure=tru
jenkins      177       0  0 04:44 pts/1    00:00:00 /bin/bash
jenkins      197       1  0 04:44 ?        00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 1
jenkins      198     177  0 04:44 pts/1    00:00:00 ps -ef

/proc/cgroups inside container

[jenkins@9055a14c4105 eap-7.4.x-testsuite]$ cat /proc/cgroups
#subsys_name	hierarchy	num_cgroups	enabled
cpuset	9	7	1
cpu	5	126	1
cpuacct	5	126	1
blkio	7	126	1
memory	11	156	1
devices	3	126	1
freezer	6	7	1
net_cls	2	7	1
perf_event	8	7	1
net_prio	2	7	1
hugetlb	4	7	1
pids	12	139	1
rdma	10	1	1

There are no files with /user.slice/xxx under /sys/fs/cgroup/memory/ inside container

[jenkins@9055a14c4105 eap-7.4.x-testsuite]$ ls -l /sys/fs/cgroup/memory/|grep user 
[jenkins@9055a14c4105 eap-7.4.x-testsuite]$ echo $$
177
[jenkins@9055a14c4105 eap-7.4.x-testsuite]$ find /sys/fs/cgroup/memory -name tasks -exec grep -H $$ {} \;
[jenkins@9055a14c4105 eap-7.4.x-testsuite]$ exit

@iklam
Copy link

iklam commented May 23, 2022

Do you still have session 9055a14c4105? If so, can you confirm if this process in the container is really $HOSTPID=50293?

jenkins        1       0  0 04:43 ?        00:00:00 /bin/bash /var/jenkins_home/workspace/eap-7.4.x-testsuite/hera/wait

Please do this on the host. It should report "NSpid: 1"

$ cat /proc/50293/status | grep NSpid

In your host output of proc/50293/cgroup and proc/50293/mountinfo from above, they are both using session-13.scope, so we don't see the symptom you reported (where the two files disagree).

11:memory:/user.slice/user-1000.slice/session-13.scope
868 851 0:38 /user.slice/user-1000.slice/session-13.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory

Can you find the HOSTPID for the "java" process

jenkins       99      46  4 04:43 pts/0    00:00:01 /opt/oracle/jdk-17.0.2/bin/java -Dmaven.wagon.http.ssl.insecure=tru

This can be done on the host with:

cd /proc
grep NSpid [0-9]*/status | grep ' 99$'

Could you double check the /proc/<pid>/cgroup and /proc/<pid>/mountinfo files for this process disagree with each other?

In container:

cat /proc/99/cgroup
cat /proc/99/mountinfo
find /sys/fs/cgroup/memory -name tasks -exec grep -H 99 {} \;

On host:

find /sys/fs/cgroup/memory -name tasks -exec grep -H $HOSTPID {} \;

Since java is launched by the jenkins processes (pid=46), I wonder if jenkins did anything to change the cgroup setting for the java process.

@gaol
Copy link
Author

gaol commented May 23, 2022

Do you still have session 9055a14c4105? If so, can you confirm if this process in the container is really $HOSTPID=50293?

jenkins        1       0  0 04:43 ?        00:00:00 /bin/bash /var/jenkins_home/workspace/eap-7.4.x-testsuite/hera/wait

Sorry, no, the container exists only for several minutes(all tests failed quickly), I am updating to the test script to sleep for 1 hour for long existence, but we will lose the java process information inside the container.

In our setup, the container was started outside of Jenkins environment.

Please do this on the host. It should report "NSpid: 1"

$ cat /proc/50293/status | grep NSpid

After I restarted a new job, I got the followings:

[jenkins@testjenkins ~]$ podman ps --ns
CONTAINER ID  NAMES                                   PID         CGROUPNS    IPC         MNT         NET         PIDNS       USERNS      UTS
e4da90e1cdfc  automaton-slave-eap-7.4.x-testsuite-30  67694       4026531835  4026532632  4026532630  4026532635  4026532633  4026532629  4026532631

cat /proc/67694/status |grep NSpid gives me: NSpid: 67694 1

The PID of the container is: 67694, inside of the container:

[jenkins@e4da90e1cdfc eap-7.4.x-testsuite]$ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
jenkins        1       0  0 07:35 ?        00:00:00 /bin/bash /var/jenkins_home/workspace/eap-7.4.x-testsuite/hera/wait
jenkins        8       0  0 07:35 pts/0    00:00:00 /bin/bash /var/jenkins_home/workspace/eap-7.4.x-testsuite/hera/buil
jenkins       42       8  0 07:35 pts/0    00:00:00 /bin/bash /var/jenkins_home/workspace/eap-7.4.x-testsuite/harmonia/
jenkins       43       8  0 07:35 pts/0    00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=tee /usr/bin/tee /var/j
jenkins       95      42  0 07:35 pts/0    00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 60
jenkins      803       0  0 07:46 pts/1    00:00:00 /bin/bash
jenkins      823       1  0 07:47 ?        00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 1

In your host output of proc/50293/cgroup and proc/50293/mountinfo from above, they are both using session-13.scope, so we don't see the symptom you reported (where the two files disagree).

11:memory:/user.slice/user-1000.slice/session-13.scope
868 851 0:38 /user.slice/user-1000.slice/session-13.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory

What are disagreed are files inside the container, and they are (in this session:):

[jenkins@e4da90e1cdfc eap-7.4.x-testsuite]$ cat /proc/self/cgroup |grep memory
11:memory:/user.slice/user-1000.slice/session-3.scope
[jenkins@e4da90e1cdfc eap-7.4.x-testsuite]$ cat /proc/self/mountinfo |grep memory
912 901 0:38 /user.slice/user-1000.slice/session-28.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory

Can you find the HOSTPID for the "java" process

jenkins       99      46  4 04:43 pts/0    00:00:01 /opt/oracle/jdk-17.0.2/bin/java -Dmaven.wagon.http.ssl.insecure=tru

This can be done on the host with:

cd /proc
grep NSpid [0-9]*/status | grep ' 99$'

Could you double check the /proc/<pid>/cgroup and /proc/<pid>/mountinfo files for this process disagree with each other?

In container:

cat /proc/99/cgroup
cat /proc/99/mountinfo
find /sys/fs/cgroup/memory -name tasks -exec grep -H 99 {} \;

In this session, the process is:

jenkins       95      42  0 07:35 pts/0    00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 60

In host:

[jenkins@testjenkins ~]$ grep NSpid.*95 /proc/*/status 2> /dev/null
/proc/1952/status:NSpid:	1952
/proc/195/status:NSpid:	195
/proc/3955/status:NSpid:	3955
/proc/68095/status:NSpid:	68095	43
/proc/68147/status:NSpid:	68147	95

So, I think he host pid of this process is: 68147, and the content:

[jenkins@testjenkins ~]$ cat /proc/68147/cgroup |grep memory
11:memory:/user.slice/user-1000.slice/session-29.scope
[jenkins@testjenkins ~]$ cat /proc/68147/mountinfo |grep memory
912 901 0:38 /user.slice/user-1000.slice/session-28.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory

Inside of the container:

[jenkins@e4da90e1cdfc eap-7.4.x-testsuite]$ cat /proc/95/cgroup |grep memory
11:memory:/user.slice/user-1000.slice/session-29.scope
[jenkins@e4da90e1cdfc eap-7.4.x-testsuite]$ cat /proc/95/mountinfo |grep memory
912 901 0:38 /user.slice/user-1000.slice/session-28.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory

On host:

find /sys/fs/cgroup/memory -name tasks -exec grep -H $HOSTPID {} \;

in this session, it is:

[jenkins@testjenkins proc]$ find /sys/fs/cgroup/memory -name tasks -exec grep -H 68147 {} \;
/sys/fs/cgroup/memory/user.slice/user-1000.slice/session-29.scope/tasks:68147

Since java is launched by the jenkins processes (pid=46), I wonder if jenkins did anything to change the cgroup setting for the java process.

no, pid=46 in previous session is actually a wrapper of bash script, which does nothing related to the container and cgroup update.

@gaol
Copy link
Author

gaol commented May 23, 2022

The pstree in host:

[jenkins@testjenkins proc]$ pstree -ap 67694
wait.sh,67694 /var/jenkins_home/workspace/eap-7.4.x-testsuite/hera/wait.sh
  └─sleep,74232 --coreutils-prog-shebang=sleep /usr/bin/sleep 1

and the /proc/<Container PID>/cgroup and /proc/<Container PID>/mountinfo in host matches:

[jenkins@testjenkins proc]$ cat /proc/67694/cgroup |grep memory
11:memory:/user.slice/user-1000.slice/session-28.scope
[jenkins@testjenkins proc]$ cat /proc/67694/mountinfo |grep memory
912 901 0:38 /user.slice/user-1000.slice/session-28.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory

But the processes inside the container (PID != 1) do not match.

@jerboaa
Copy link

jerboaa commented May 23, 2022

@gaol Is there a possibility to run this program on the affected container where you see this mismatch between /proc/self/cgroup and /proc/self/mountinfo? It does not rely on /proc/self/mountinfo and /proc/self/cgroup to match in the way the JDK currently does.

import java.io.IOException;
import java.nio.file.FileVisitResult;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.SimpleFileVisitor;
import java.nio.file.attribute.BasicFileAttributes;

public class FindCgroupPath {
    static final String pid;
    
    static {
        String pidVal = null;
        try {
            pidVal = Files.readSymbolicLink(Paths.get("/proc/self")).getFileName().toString();
        } catch (IOException e) {
        }
        pid = pidVal;
    }

    public static void main(String[] args) throws IOException {
        if (args.length != 1) {
            System.err.println("Usage: FincCgroupPath <path/to/controller/mount>");
            System.exit(1);
        }
        String mount = args[0];
        System.out.println("PID is: " + pid + " walking: " + mount);
        Files.walkFileTree(Paths.get(mount), new SimpleFileVisitor<>() {

            @Override
            public FileVisitResult visitFile(Path file,
                    BasicFileAttributes attrs) throws IOException {
                if (file.getFileName().compareTo(Paths.get("cgroup.procs")) == 0) {
                    if (findPath(file)) {
                        return FileVisitResult.TERMINATE;
                    }
                }
                return FileVisitResult.CONTINUE;
            }

        });
    }
    
    public static boolean findPath(Path cgroupProcsFile) {
        try {
            System.out.println("Analyzing " + cgroupProcsFile);
            for (String line: Files.readAllLines(cgroupProcsFile)) {
                if (pid != null && pid.equals(line.trim())) {
                    System.out.println("Found process at path " + cgroupProcsFile);
                    System.out.println("Cgroup path is: " + cgroupProcsFile.getParent());
                    return true;
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return false;
    }

}

Run as:

$ java FindCgroupPath.java /sys/fs/cgroup/memory

@gaol
Copy link
Author

gaol commented May 24, 2022

Thanks, @jerboaa here is the report running in the affected container:

[jenkins@383e373c4282 jenkins_home]$ cat /proc/self/cgroup |grep memory
11:memory:/user.slice/user-1000.slice/session-3.scope
[jenkins@383e373c4282 jenkins_home]$ cat /proc/self/mountinfo |grep memory
916 905 0:38 /user.slice/user-1000.slice/session-31.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory

[jenkins@383e373c4282 jenkins_home]$ /opt/oracle/jdk-17.0.2/bin/java -version
openjdk version "17.0.2" 2022-01-18
OpenJDK Runtime Environment (build 17.0.2+8-86)
OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)

[jenkins@383e373c4282 jenkins_home]$ /opt/oracle/jdk-17.0.2/bin/java FindCgroupPath.java /sys/fs/cgroup/memory
PID is: 452 walking: /sys/fs/cgroup/memory
Analyzing /sys/fs/cgroup/memory/cgroup.procs

[jenkins@383e373c4282 jenkins_home]$ cat /sys/fs/cgroup/memory/cgroup.procs
1
609

@jerboaa
Copy link

jerboaa commented May 24, 2022

@gaol Thank you. This output:

[jenkins@383e373c4282 jenkins_home]$ /opt/oracle/jdk-17.0.2/bin/java FindCgroupPath.java /sys/fs/cgroup/memory
PID is: 452 walking: /sys/fs/cgroup/memory
Analyzing /sys/fs/cgroup/memory/cgroup.procs

Suggests that the current process isn't being part of the memory namespace. If it was, the output would be something like this:

bash-5.1$ /opt/jdk/bin/java FindCgroupPath.java /sys/fs/cgroup/memory
PID is: 2 walking: /sys/fs/cgroup/memory
Analyzing /sys/fs/cgroup/memory/cgroup.procs
Found process at path /sys/fs/cgroup/memory/cgroup.procs
Cgroup path is: /sys/fs/cgroup/memory

@gaol
Copy link
Author

gaol commented May 24, 2022

May I know if this is the wrong set up in your opinion ? where to check the memory limitation of the process inside of the container in current setup ? thanks !

@jerboaa
Copy link

jerboaa commented May 24, 2022

No, it should be fine. I'm guessing that some other controllers would be enabled. Either way, OpenJDK needs to handle this case properly. The question is which (if any of the relevant ones) is indeed enabled. You could find out with something like this:

for c in $(ls -d /sys/fs/cgroup/*); do if echo $c | grep -qE 'memory|cpu|blkio|pids'; then java FindCgroupPath.java $c; fi; done

@iklam
Copy link

iklam commented May 25, 2022

@gaol, since you are running on a cgroupv1 system, I think this problem can be worked around by running with podman ... --cgroupns=private ...

See https://docs.podman.io/en/latest/markdown/podman-run.1.html

--cgroupns=mode: Set the cgroup namespace mode for the container.
    host: use the host’s cgroup namespace inside the container.
    container:id: join the namespace of the specified container.
    private: create a new cgroup namespace.
    ns:path: join the namespace at the specified path.
If the host uses cgroups v1, the default is set to host. On cgroups v2, the default is private.

By default, if you don't use any memory settings, on a cgroupv1 system, podman puts the containerized process into the memory cgroup of the user on the host. Here's what I get:

U2110: ~$ cat /proc/self/cgroup | grep memory
7:memory:/user.slice/user-1000.slice/session-3.scope
U2110: ~$ cat /proc/self/mountinfo | grep memory
43 32 0:38 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,memory

U2110: ~$ podman run --rm -it --tty=true fedora bash
[root@dc24d11dcb5e /]# cat /proc/self/cgroup | grep memory
7:memory:/user.slice/user-1000.slice/[email protected]
[root@dc24d11dcb5e /]# cat /proc/self/mountinfo | grep memory
1181 1174 0:38 /user.slice/user-1000.slice/[email protected] /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,memory

Note that I used ssh to log onto the host U2110 and /user.slice/user-1000.slice/session-3.scope is the cgroup hierarchy used for this SSH session.

I think in you case, you use one SSH session to create the container, but use another SSH session to launch the Java process. For some unknown reason, podman uses the hierarchy of the SSH session (instead of /user.slice/user-1000.slice/[email protected] in my case). This causes the JVM to fail.

If you use --cgroupns=private, the cgroup and mountinfo should look like this and will make Java happy.

U2110: ~$ podman run --rm -it --tty=true --cgroupns=private fedora bash
[root@d14b44b4d754 /]# cat /proc/self/cgroup | grep memory
7:memory:/
[root@d14b44b4d754 /]# cat /proc/self/mountinfo | grep memory
1181 1174 0:38 / /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,memory

@iklam
Copy link

iklam commented May 25, 2022

@jerboaa I think we should reconsider how we process the mountpoint in mountinfo. If it's not '/', and the path doesn't exist (not in the filesystem namespace of the current process), I don't think we should interpret it at all.

@gaol
Copy link
Author

gaol commented May 25, 2022

I think in you case, you use one SSH session to create the container, but use another SSH session to launch the Java process

Thanks @iklam
Yes, this is exactly what we do. 👍

Currently we use -v /sys/fs/cgroup:/sys/fs/cgroup:ro to bypass the failure, I will do experiment on the --cgroupns=private.

@gaol
Copy link
Author

gaol commented May 25, 2022

for c in $(ls -d /sys/fs/cgroup/*); do if echo $c | grep -qE 'memory|cpu|blkio|pids'; then java FindCgroupPath.java $c; fi; done

@jerboaa below are the report using the script above:

[jenkins@8b078952dce0 jenkins_home]$ for c in $(ls -d /sys/fs/cgroup/*); do if echo $c | grep -qE 'memory|cpu|blkio|pids'; then /opt/oracle/jdk-17.0.2/bin/java FindCgroupPath.java $c; fi; done
PID is: 248 walking: /sys/fs/cgroup/blkio
Analyzing /sys/fs/cgroup/blkio/cgroup.procs
Found process at path /sys/fs/cgroup/blkio/cgroup.procs
Cgroup path is: /sys/fs/cgroup/blkio
PID is: 272 walking: /sys/fs/cgroup/cpu
PID is: 296 walking: /sys/fs/cgroup/cpu,cpuacct
Analyzing /sys/fs/cgroup/cpu,cpuacct/cgroup.procs
Found process at path /sys/fs/cgroup/cpu,cpuacct/cgroup.procs
Cgroup path is: /sys/fs/cgroup/cpu,cpuacct
PID is: 320 walking: /sys/fs/cgroup/cpuacct
PID is: 344 walking: /sys/fs/cgroup/cpuset
Analyzing /sys/fs/cgroup/cpuset/cgroup.procs
Found process at path /sys/fs/cgroup/cpuset/cgroup.procs
Cgroup path is: /sys/fs/cgroup/cpuset
PID is: 375 walking: /sys/fs/cgroup/memory
Analyzing /sys/fs/cgroup/memory/cgroup.procs
PID is: 407 walking: /sys/fs/cgroup/pids
Analyzing /sys/fs/cgroup/pids/cgroup.procs
[jenkins@8b078952dce0 jenkins_home]$ /opt/oracle/jdk-17.0.2/bin/java -XshowSettings:system -version
Exception in thread "main" java.lang.NullPointerException
	at java.base/java.util.Objects.requireNonNull(Objects.java:208)
	at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:263)
	at java.base/java.nio.file.Path.of(Path.java:147)
	at java.base/java.nio.file.Paths.get(Paths.java:69)
	at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(CgroupUtil.java:67)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
	at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:69)
	at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
	at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
	at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:175)
	at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:149)
	at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:84)
	at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getInstance(CgroupV1Subsystem.java:60)
	at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:116)
	at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167)
	at java.base/jdk.internal.platform.SystemMetrics.instance(SystemMetrics.java:29)
	at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:58)
	at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
	at java.base/sun.launcher.LauncherHelper.printSystemMetrics(LauncherHelper.java:318)
	at java.base/sun.launcher.LauncherHelper.showSettings(LauncherHelper.java:173)

After using --cgroupns=private, the output becomes:

[jenkins@3cfd6c2aaac0 jenkins_home]$ for c in $(ls -d /sys/fs/cgroup/*); do if echo $c | grep -qE 'memory|cpu|blkio|pids'; then /opt/oracle/jdk-17.0.2/bin/java FindCgroupPath.java $c; fi; done
PID is: 214 walking: /sys/fs/cgroup/blkio
Analyzing /sys/fs/cgroup/blkio/cgroup.procs
Found process at path /sys/fs/cgroup/blkio/cgroup.procs
Cgroup path is: /sys/fs/cgroup/blkio
PID is: 238 walking: /sys/fs/cgroup/cpu
PID is: 262 walking: /sys/fs/cgroup/cpu,cpuacct
Analyzing /sys/fs/cgroup/cpu,cpuacct/cgroup.procs
Found process at path /sys/fs/cgroup/cpu,cpuacct/cgroup.procs
Cgroup path is: /sys/fs/cgroup/cpu,cpuacct
PID is: 286 walking: /sys/fs/cgroup/cpuacct
PID is: 310 walking: /sys/fs/cgroup/cpuset
Analyzing /sys/fs/cgroup/cpuset/cgroup.procs
Found process at path /sys/fs/cgroup/cpuset/cgroup.procs
Cgroup path is: /sys/fs/cgroup/cpuset
PID is: 340 walking: /sys/fs/cgroup/memory
Analyzing /sys/fs/cgroup/memory/cgroup.procs
PID is: 372 walking: /sys/fs/cgroup/pids
Analyzing /sys/fs/cgroup/pids/cgroup.procs
[jenkins@3cfd6c2aaac0 jenkins_home]$ /opt/oracle/jdk-17.0.2/bin/java -XshowSettings:system -version
Operating System Metrics:
    Provider: cgroupv1
    Effective CPU Count: 2
    CPU Period: 100000us
    CPU Quota: -1
    CPU Shares: -1
    List of Processors, 2 total: 
    0 1 
    List of Effective Processors, 2 total: 
    0 1 
    List of Memory Nodes, 1 total: 
    0 
    List of Available Memory Nodes, 1 total: 
    0 
    Memory Limit: Unlimited
    Memory Soft Limit: Unlimited
    Memory & Swap Limit: Unlimited

openjdk version "17.0.2" 2022-01-18
OpenJDK Runtime Environment (build 17.0.2+8-86)
OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)

@jerboaa
Copy link

jerboaa commented May 25, 2022

@jerboaa I think we should reconsider how we process the mountpoint in mountinfo. If it's not '/', and the path doesn't exist (not in the filesystem namespace of the current process), I don't think we should interpret it at all.

@iklam Let's discuss this in https://bugs.openjdk.java.net/browse/JDK-8286212. I have no idea how transient gists are. It's not clear what you mean. Do you mean the root field or the mount point field according to man procfs (i.e. field (4) or (5)? Because we always need to consider (5) on cgroups v1 IMO.

@iklam
Copy link

iklam commented May 31, 2022

@jerboaa I think we should reconsider how we process the mountpoint in mountinfo. If it's not '/', and the path doesn't exist (not in the filesystem namespace of the current process), I don't think we should interpret it at all.

@iklam Let's discuss this in https://bugs.openjdk.java.net/browse/JDK-8286212. I have no idea how transient gists are. It's not clear what you mean. Do you mean the root field or the mount point field according to man procfs (i.e. field (4) or (5)? Because we always need to consider (5) on cgroups v1 IMO.

Yes, I meant the root field (4).

@jerboaa
Copy link

jerboaa commented Jun 1, 2022

OK. I'll consider this when rebooting the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment