You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was using consul by taking the code from github, building it into an image and using that image to run my consul-agent containers.
When my kubernetes cluster goes for a rollback.
We are using consul for service discovery where we have some MICM pod whose address needs to be discovered by other pods.
So, when we are doing the rollback, it is giving the old IP to the consul agent service health check,
When checked in nodes, the IP is the old IP, but when checked in members, it has the new IP.
First service:
"Tags":["mode:primary"],"Address":"::ffff:10.131.24.226",
"TaggedAddresses":{"lan_ipv4":{"Address":"::ffff:10.131.24.226","Port":2000},"wan_ipv4":{"Address":"::ffff:10.131.24.226","Port":2000}},"Meta":null,"Port":2000,"Weights":{"Passing":1,"Warning":1},"EnableTagOverride":false,"Proxy":{"Mode":"","MeshGateway":{},"Expose":{}},"Connect":{},"PeerName":"","CreateIndex":15832,"ModifyIndex":15832},"Checks":[{"Node":"ftncpcfx","CheckID":"ftncpcfx-SESSION-HC","Name":"Controller Session Healthcheck","Status":"passing","Notes":"This session is supposed to do a curl internally every 4 seconds","Output":"","ServiceID":"","ServiceName":"","ServiceTags":[],"Type":"ttl","Interval":"0s","Timeout":"0s","ExposedPort":0,"Definition":{},"CreateIndex":15830,"ModifyIndex":15830},{"Node":"ftncpcfx","CheckID":"serfHealth","Name":"Serf Health Status","Status":"passing","Notes":"","Output":"Agent alive and reachable","ServiceID":"","ServiceName":"","ServiceTags":[],"Type":"","Interval":"","Timeout":"","ExposedPort":0,"Definition":{},"CreateIndex":15825,"ModifyIndex":15825}]},{"Node":{"ID":"241ee620-c14c-2325-82b2-d1e665e50871"
Second service should be having the IP as : 10.129.12.185 but having 10.129.9.113
,"Node":"ftncpcfx-partner","Address":"10.129.9.113",
"Datacenter":"ftncpcfx","TaggedAddresses":{"lan":"10.129.9.113","lan_ipv4":"10.129.9.113","wan":"10.129.9.113","wan_ipv4":"10.129.9.113"},"Meta":{"consul-network-segment":"","consul-version":"1.18.1"},"CreateIndex":3298,"ModifyIndex":3299},"Service":{"ID":"ftncpcfx-partner","Service":"ftncpcfx","Tags":["mode:backup"],"Address":"::ffff:10.129.9.113","TaggedAddresses":{"lan_ipv4":{"Address":"::ffff:10.129.9.113","Port":2000},"wan_ipv4":{"Address":"::ffff:10.129.9.113","Port":2000}},"Meta":null,"Port":2000,"Weights":{"Passing":1,"Warning":1},"EnableTagOverride":false,"Proxy":{"Mode":"","MeshGateway":{},"Expose":{}},"Connect":{},"PeerName":"","CreateIndex":3302,"ModifyIndex":3302},"Checks":[{"Node":"ftncpcfx-partner","CheckID":"ftncpcfx-SESSION-HC","Name":"Controller Session Healthcheck","Status":"passing","Notes":"This session is supposed to do a curl internally every 4 seconds","Output":"","ServiceID":"","ServiceName":"","ServiceTags":[],"Type":"ttl","Interval":"0s","Timeout":"0s","ExposedPort":0,"Definition":{},"CreateIndex":3301,"ModifyIndex":3301},{"Node":"ftncpcfx-partner","CheckID":"serfHealth","Name":"Serf Health Status","Status":"passing","Notes":"","Output":"Agent alive and reachable","ServiceID":"","ServiceName":"","ServiceTags":[],"Type":"","Interval":"","Timeout":"","ExposedPort":0,"Definition":{},"CreateIndex":3298,"ModifyIndex":3298}]}]
while the consul catalog nodes command gives the same :
[consul@ftncpcfx-scscfc-548689679f-hc797 bin]$ ./consul catalog nodes
Node ID Address DC ftncpcfx 4684b697 10.131.24.226 ftncpcfx
ftncpcfx-partner 241ee620 10.129.9.113 ftncpcfx
But the consul members' command is giving this :
[consul@ftncpcfx-scscfc-548689679f-hc797 bin]$ ./consul members
Node Address Status Type Build Protocol DC Partition Segment
ftncpcfx 10.131.24.226:8301 alive client 1.18.1 2 ftncpcfx default
ftncpcfx-partner 10.129.12.185:8301 alive client 1.18.1 2 ftncpcfx default
These are my consul pods running in a statefulset: ftncpcfx-sd-0 302d6473 10.131.24.225 ftncpcfx
ftncpcfx-sd-1 312d6473 10.130.13.201 ftncpcfx
ftncpcfx-sd-2 322d6473 10.130.16.23 ftncpcfx
The issue is when ftncpcfx-scscf-58485cfdd9-hrqq6 is trying to find the service which is ftncpcfx-partner, it is not able to find as it is having the old IP after rollback, the new IP is not been issued to be taken care of by consul.
The IP in nodes and the IP in the members are different !!
Why ??
Can you help me with the same?
Thanks and regards,
Shashank Garg
The text was updated successfully, but these errors were encountered:
Hi team,
I was using consul by taking the code from github, building it into an image and using that image to run my consul-agent containers.
When my kubernetes cluster goes for a rollback.
We are using consul for service discovery where we have some MICM pod whose address needs to be discovered by other pods.
So, when we are doing the rollback, it is giving the old IP to the consul agent service health check,
When checked in nodes, the IP is the old IP, but when checked in members, it has the new IP.
here is the proof :
My role pods:
[root@golden-20210518 ~]# oc get pods -o wide | grep -i scscf
ftncpcfx-scscf-58485cfdd9-hrqq6 2/2 Running 0 7h27m 10.128.14.245 appworker-32.fi-912.vlab.nsn-rdnet.net
ftncpcfx-scscfc-548689679f-hc797 1/2 Running 0 24m 10.129.12.221 appworker-14.fi-912.vlab.nsn-rdnet.net
ftncpcfx-scscfc-548689679f-hc797 , his consul agent is having the issue.
these are the pods whose service is registered, and the role pods are trying to find :
[root@golden-20210518 ~]# oc get pods -o wide | grep -i micm
ftncpcfx-micm-567c9774db-m64g5 2/2 Running 0 5h19m 10.131.24.226 appworker-10.fi-912.vlab.nsn-rdnet.net
ftncpcfx-micmpartner-6bc9f69877-jndrb 2/2 Running 2 (6m52s ago) 7h25m 10.129.12.185 appworker-14.fi-912.vlab.nsn-rdnet.net
Inside the consul@ftncpcfx-scscfc-548689679f-hc797:
We are facing issue :
The service discovery command used to find the service:
Command:
[consul@ftncpcfx-scscfc-548689679f-hc797 bin]$ curl http://localhost:8500/v1/health/service/ftncpcfx
[{"Node":{"ID":"4684b697-ddc2-dac5-4653-8606c7043ec1","Node":"ftncpcfx","Address":"10.131.24.226","Datacenter":"ftncpcfx","TaggedAddresses":{"lan":"10.131.24.226","lan_ipv4":"10.131.24.226","wan":"10.131.24.226","wan_ipv4":"10.131.24.226"},"Meta":{"consul-network-segment":"","consul-version":"1.18.1"},"CreateIndex":15825,"ModifyIndex":15827},"Service":{"ID":"ftncpcfx","Service":"ftncpcfx",
First service:
"Tags":["mode:primary"],"Address":"::ffff:10.131.24.226",
"TaggedAddresses":{"lan_ipv4":{"Address":"::ffff:10.131.24.226","Port":2000},"wan_ipv4":{"Address":"::ffff:10.131.24.226","Port":2000}},"Meta":null,"Port":2000,"Weights":{"Passing":1,"Warning":1},"EnableTagOverride":false,"Proxy":{"Mode":"","MeshGateway":{},"Expose":{}},"Connect":{},"PeerName":"","CreateIndex":15832,"ModifyIndex":15832},"Checks":[{"Node":"ftncpcfx","CheckID":"ftncpcfx-SESSION-HC","Name":"Controller Session Healthcheck","Status":"passing","Notes":"This session is supposed to do a curl internally every 4 seconds","Output":"","ServiceID":"","ServiceName":"","ServiceTags":[],"Type":"ttl","Interval":"0s","Timeout":"0s","ExposedPort":0,"Definition":{},"CreateIndex":15830,"ModifyIndex":15830},{"Node":"ftncpcfx","CheckID":"serfHealth","Name":"Serf Health Status","Status":"passing","Notes":"","Output":"Agent alive and reachable","ServiceID":"","ServiceName":"","ServiceTags":[],"Type":"","Interval":"","Timeout":"","ExposedPort":0,"Definition":{},"CreateIndex":15825,"ModifyIndex":15825}]},{"Node":{"ID":"241ee620-c14c-2325-82b2-d1e665e50871"
Second service should be having the IP as : 10.129.12.185 but having 10.129.9.113
,"Node":"ftncpcfx-partner","Address":"10.129.9.113",
"Datacenter":"ftncpcfx","TaggedAddresses":{"lan":"10.129.9.113","lan_ipv4":"10.129.9.113","wan":"10.129.9.113","wan_ipv4":"10.129.9.113"},"Meta":{"consul-network-segment":"","consul-version":"1.18.1"},"CreateIndex":3298,"ModifyIndex":3299},"Service":{"ID":"ftncpcfx-partner","Service":"ftncpcfx","Tags":["mode:backup"],"Address":"::ffff:10.129.9.113","TaggedAddresses":{"lan_ipv4":{"Address":"::ffff:10.129.9.113","Port":2000},"wan_ipv4":{"Address":"::ffff:10.129.9.113","Port":2000}},"Meta":null,"Port":2000,"Weights":{"Passing":1,"Warning":1},"EnableTagOverride":false,"Proxy":{"Mode":"","MeshGateway":{},"Expose":{}},"Connect":{},"PeerName":"","CreateIndex":3302,"ModifyIndex":3302},"Checks":[{"Node":"ftncpcfx-partner","CheckID":"ftncpcfx-SESSION-HC","Name":"Controller Session Healthcheck","Status":"passing","Notes":"This session is supposed to do a curl internally every 4 seconds","Output":"","ServiceID":"","ServiceName":"","ServiceTags":[],"Type":"ttl","Interval":"0s","Timeout":"0s","ExposedPort":0,"Definition":{},"CreateIndex":3301,"ModifyIndex":3301},{"Node":"ftncpcfx-partner","CheckID":"serfHealth","Name":"Serf Health Status","Status":"passing","Notes":"","Output":"Agent alive and reachable","ServiceID":"","ServiceName":"","ServiceTags":[],"Type":"","Interval":"","Timeout":"","ExposedPort":0,"Definition":{},"CreateIndex":3298,"ModifyIndex":3298}]}]
while the consul catalog nodes command gives the same :
[consul@ftncpcfx-scscfc-548689679f-hc797 bin]$ ./consul catalog nodes
Node ID Address DC
ftncpcfx 4684b697 10.131.24.226 ftncpcfx
ftncpcfx-partner 241ee620 10.129.9.113 ftncpcfx
But the consul members' command is giving this :
[consul@ftncpcfx-scscfc-548689679f-hc797 bin]$ ./consul members
Node Address Status Type Build Protocol DC Partition Segment
ftncpcfx 10.131.24.226:8301 alive client 1.18.1 2 ftncpcfx default
ftncpcfx-partner 10.129.12.185:8301 alive client 1.18.1 2 ftncpcfx default
which is the new IP of micm !
[consul@ftncpcfx-scscfc-548689679f-hc797 bin]$ ./consul catalog datacenters
ftncpcfx
These are my consul pods running in a statefulset:
ftncpcfx-sd-0 302d6473 10.131.24.225 ftncpcfx
ftncpcfx-sd-1 312d6473 10.130.13.201 ftncpcfx
ftncpcfx-sd-2 322d6473 10.130.16.23 ftncpcfx
The issue is when ftncpcfx-scscf-58485cfdd9-hrqq6 is trying to find the service which is ftncpcfx-partner, it is not able to find as it is having the old IP after rollback, the new IP is not been issued to be taken care of by consul.
The IP in nodes and the IP in the members are different !!
Why ??
Can you help me with the same?
Thanks and regards,
Shashank Garg
The text was updated successfully, but these errors were encountered: