This is an attempt to communicate unexpected behavior using the 2.1-SNAPSHOT
version of Akka's cluster module.
Specifically, the unexpected behavior my application is experiencing is that when the cluster leader becomes unreachable due to a SIGINT
, it is unable to re-join the cluster when restarted. Strangely, this always works when the node that is killed is not the leader.
The output listings here consist of handpicked "important" logged events during the course of an Akka cluster session. There are two nodes in the system: node1.mydomain.com
and node2.mydomain.com
. Both nodes are running identical software, freshly pulled from source control.
Because of the way Akka does leader selection, node1
is always the cluster leader when both nodes are members.
The Akka configuration is identical on node1 and node2.
node {
akka {
log-config-on-start = "on"
actor.provider = "akka.remote.RemoteActorRefProvider"
cluster {
nodename = "node"
auto-join = "on"
auto-down = "on"
seed-nodes = [
"akka://[email protected]:32001",
"akka://[email protected]:32001"
]
}
loglevel = INFO
remote {
transport = "akka.remote.netty.NettyRemoteTransport"
netty {
# uncomment to override hostname -- the default value is
# java.net.InetAddress.getLocalHost().getHostName()
# hostname = "mynode.mydomain.com"
port = 32001
}
}
}
}
Both nodes are launched at approximately the same time. During the course of execution, once the logging output indicates that the two nodes have become peers and reached convergence, I manually send SIGINT
to the instance running on node2
.
The results contain logging output generated by node1
during the course of the experiment.
[09/04/2012 22:46:59.236] [node-akka.actor.default-dispatcher-7] [akka://node/system/cluster/core]
Cluster Node [akka://[email protected]:32001] - Leader is moving node [akka://[email protected]:32001] from JOINING to UP
[09/04/2012 22:47:01.425] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
Cluster Peers: []
[09/04/2012 22:47:01.427] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://[email protected]:32001, status = Up),
Member(address = akka://[email protected]:32001, status = Up)
),
Set(),
true,
Set(akka://[email protected]:32001, akka://[email protected]:32001),
Some(akka://[email protected]:32001)
)]
[09/04/2012 22:47:03.521] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
Peer Info Update Received from [Actor[akka://[email protected]:32001/user/kernel/myapp]]
[09/04/2012 22:47:11.525] [node-akka.actor.default-dispatcher-8] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp], Actor[akka://[email protected]:32001/user/kernel/myapp]]
[09/04/2012 22:47:13.716] [node-3] [NettyRemoteTransport(akka://[email protected]:32001)]
RemoteClientShutdown@akka://[email protected]:32001
[09/04/2012 22:47:18.247] [node-akka.actor.default-dispatcher-4] [FailureDetector(akka://node)]
Phi value [Infinity] for connection [akka://[email protected]:32001], after [4997 ms], based on [N(979.0588235294117, 114.57361907023495)]
[09/04/2012 22:47:18.249] [node-akka.actor.default-dispatcher-4] [akka://node/system/cluster/core]
Cluster Node [akka://[email protected]:32001] - Marking node(s) as UNREACHABLE [Member(address = akka://[email protected]:32001, status = Up)]
[09/04/2012 22:47:18.249] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
MembersChanged: [TreeSet(Member(address = akka://[email protected]:32001, status = Up))]
[09/04/2012 22:47:18.249] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set(Member(address = akka://[email protected]:32001, status = Up))]
[09/04/2012 22:47:18.249] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://[email protected]:32001)] [false]
[09/04/2012 22:47:18.251] [node-akka.actor.default-dispatcher-5] [akka://node/system/cluster/core]
Cluster Node [akka://[email protected]:32001] - Marking unreachable node [akka://[email protected]:32001] as DOWN
[09/04/2012 22:47:18.256] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set(Member(address = akka://[email protected]:32001, status = Down))]
[09/04/2012 22:47:18.257] [node-akka.actor.default-dispatcher-1] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://[email protected]:32001)] [true]
[09/04/2012 22:47:21.622] [node-akka.actor.default-dispatcher-1] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp]]
[09/04/2012 22:47:21.822] [node-akka.actor.default-dispatcher-6] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://[email protected]:32001, status = Up)
),
Set(Member(address = akka://[email protected]:32001, status = Down)),
true,
Set(akka://[email protected]:32001),
Some(akka://[email protected]:32001)
)]
[09/04/2012 22:47:29.663] [node-5] [NettyRemoteTransport(akka://[email protected]:32001)]
RemoteClientStarted@akka://[email protected]:32001
[09/04/2012 22:47:29.687] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
MembersChanged: [TreeSet(
Member(address = akka://[email protected]:32001, status = Up),
Member(address = akka://[email protected]:32001, status = Joining)
)]
[09/04/2012 22:47:29.688] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set()]
[09/04/2012 22:47:29.688] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://[email protected]:32001)] [false]
[09/04/2012 22:47:29.732] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://[email protected]:32001)] [true]
[09/04/2012 22:47:30.255] [node-akka.actor.default-dispatcher-7] [akka://node/system/cluster/core]
Cluster Node [akka://[email protected]:32001] - Leader is moving node [akka://[email protected]:32001] from JOINING to UP
[09/04/2012 22:47:30.256] [node-akka.actor.default-dispatcher-1] [akka://node/user/kernel/myapp]
MembersChanged: [TreeSet(
Member(address = akka://[email protected]:32001, status = Up),
Member(address = akka://[email protected]:32001, status = Up)
)]
[09/04/2012 22:47:30.256] [node-akka.actor.default-dispatcher-1] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://[email protected]:32001)] [false]
[09/04/2012 22:47:30.468] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://[email protected]:32001)] [true]
[09/04/2012 22:47:31.722] [node-akka.actor.default-dispatcher-2] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp], Actor[akka://[email protected]:32001/user/kernel/myapp]]
[09/04/2012 22:47:31.923] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://[email protected]:32001, status = Up),
Member(address = akka://[email protected]:32001, status = Up)
),
Set(),
true,
Set(akka://[email protected]:32001, akka://[email protected]:32001),
Some(akka://[email protected]:32001)
)]
[09/04/2012 22:47:34.744] [node-akka.actor.default-dispatcher-2] [akka://node/user/kernel/myapp]
Peer Info Update Received from [Actor[akka://[email protected]:32001/user/kernel/myapp]]
Both nodes are launched at approximately the same time. During the course of execution, once the logging output indicates that the two nodes have become peers and reached convergence, I manually send SIGINT
to the instance running on node1
.
The results contain logging output generated by node2
during the course of the experiment.
[09/05/2012 09:01:25.801] [node-akka.actor.default-dispatcher-4] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://[email protected]:32001, status = Up),
Member(address = akka://[email protected]:32001, status = Up)
),
Set(),
true,
Set(akka://[email protected]:32001, akka://[email protected]:32001),
Some(akka://[email protected]:32001)
)]
[09/05/2012 09:01:25.847] [node-akka.actor.default-dispatcher-2] [akka://node/user/kernel/myapp]
Routing table update received from [Actor[akka://[email protected]:32001/user/kernel/myapp]]
[09/05/2012 09:01:25.897] [node-akka.actor.default-dispatcher-2] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp], Actor[akka://[email protected]:32001/user/kernel/myapp]]
[09/05/2012 09:01:37.880] [node-akka.actor.default-dispatcher-3] [NettyRemoteTransport(akka://[email protected]:32001)]
RemoteClientShutdown@akka://[email protected]:32001
[09/05/2012 09:01:42.620] [node-akka.actor.default-dispatcher-6] [FailureDetector(akka://node)]
Phi value [Infinity] for connection [akka://[email protected]:32001], after [4977 ms], based on [N(998.7222222222222, 100.0)]
[09/05/2012 09:01:42.621] [node-akka.actor.default-dispatcher-6] [akka://node/system/cluster/core]
Cluster Node [akka://[email protected]:32001] - Marking node(s) as UNREACHABLE [Member(address = akka://[email protected]:32001, status = Up)]
[09/05/2012 09:01:42.621] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
MembersChanged: [TreeSet(Member(address = akka://[email protected]:32001, status = Up))]
[09/05/2012 09:01:42.621] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set(Member(address = akka://[email protected]:32001, status = Up))]
[09/05/2012 09:01:42.627] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://[email protected]:32001)] [false]
[09/05/2012 09:01:42.644] [node-akka.actor.default-dispatcher-6] [akka://node/system/cluster/core]
Cluster Node [akka://[email protected]:32001] - Leader is marking unreachable node [akka://[email protected]:32001] as DOWN
[09/05/2012 09:01:42.645] [node-akka.actor.default-dispatcher-4] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set(Member(address = akka://[email protected]:32001, status = Down))]
[09/05/2012 09:01:42.645] [node-akka.actor.default-dispatcher-4] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://[email protected]:32001)] [true]
[09/05/2012 09:01:46.100] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp]]
[09/05/2012 09:01:46.101] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://[email protected]:32001, status = Up)
),
Set(Member(address = akka://[email protected]:32001, status = Down)),
true,
Set(akka://[email protected]:32001),
Some(akka://[email protected]:32001)
)]
[09/05/2012 09:01:56.199] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://[email protected]:32001, status = Up)
),
Set(Member(address = akka://[email protected]:32001, status = Down)),
true,
Set(akka://[email protected]:32001),
Some(akka://[email protected]:32001)
)]
[09/05/2012 09:01:56.199] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp]]
[INFO] [09/05/2012 09:02:16.597] [node-akka.actor.default-dispatcher-8] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://[email protected]:32001, status = Up)
),
Set(Member(address = akka://[email protected]:32001, status = Down)),
true,
Set(akka://[email protected]:32001),
Some(akka://[email protected]:32001)
)]