Description
Overview of the Issue
To test Consul service mesh in "Transparent Proxy" mode, We deployed in a AWS EKS "static-server" (hashicorp/http-echo) and "static-client" (curlimages/curl) images in the service mesh (sidecar proxy injection: connectInject / consul dataplane), as sample code on consul documentation do.
NOTE: we use AWS EC2 SPOT instances in sandboxing/texting environments
Calls from static-client to static-server (curl), works fine using proxy and using these FQDN:
- static-server.service.consul (consul dns resolve to a static-server Pod IP)
- static-server.connect.consul (consul dns resolve to a static-server Pod IP)
- static-server.virtual.consul (consul dns resolve to 240.x.x.x - private virtual IP used by sidecar proxy)
After some hours we cannot anymore get a correct response from static-server using these FQDN:
- (KO) static-server.service.consul
- (KO) static-server.connect.consul
But no issue for the FQDN:
- (OK) static-server.virtual.consul
A restart for static-server deployment resolve the issue.
Reproduction Steps
Steps to reproduce this issue:
- Deploy consul on AWS EKS cluster/datacenter with ArgoCD
- Deploy static-server and static-client in different specific namespaces enabling connectInjection (Transparent Proxy enable by default in chart)
- Set WAN Federation with Mesh Gateway with other consul clusters/datacenter
- Run commands like
kubectl -n static-client exec deploy/static-client -c static-client -- curl http://static-server.service.consul
- Get correct responses from static-server (for all three FQDNs)
- View static-client logs for the calls
...
2025-02-03T09:12:05.830Z+00:00 [debug] envoy.rbac(22) checking connection: requestedServerName: , sourceIP: 10.x.y.z:52864, directRemoteIP: 10.x.y.z:52864,remoteIP: 10.x.y.z:52864, localAddress: 10.x.y.z:20000, ssl: uriSanPeerCertificate: spiffe://aa574018-aaa6-0a95-28a6-956aa6e501cd.consul/ns/default/dc/dc1/svc/static-client, dnsSanPeerCertificate: , subjectPeerCertificate: , dynamicMetadata:
2025-02-03T09:12:05.831Z+00:00 [debug] envoy.rbac(22) enforced allowed, matched policy consul-intentions-layer4
2025-02-03T09:12:05.838Z+00:00 [debug] envoy.rbac(22) checking connection: requestedServerName: , sourceIP: 10.x.y.z:52864, directRemoteIP: 10.x.y.z:52864,remoteIP: 10.x.y.z:52864, localAddress: 10.x.y.z:20000, ssl: uriSanPeerCertificate: spiffe://aa574018-aaa6-0a95-28a6-956aa6e501cd.consul/ns/default/dc/dc1/svc/static-client, dnsSanPeerCertificate: , subjectPeerCertificate: , dynamicMetadata:
...
- View static-server logs for the calls
...
2025-02-03T10:27:57.667Z+00:00 [debug] envoy.filter(23) [Tags: "ConnectionId":"1787"] new tcp proxy session
2025-02-03T10:27:57.667Z+00:00 [debug] envoy.filter(23) [Tags: "ConnectionId":"1787"] Creating connection to cluster passthrough~static-server.default.dc1.internal.aa574018-aaa6-0a95-28a6-956aa6e501cd.consul
...
- Wait for some hours
- Run again commands like
kubectl -n static-client exec deploy/static-client -c static-client -- curl http://static-server.service.consul
- Get correct responses from static-server (for virtual.connect FQDN)
- Get error below (for service.consul and connect.consul FQDNs)
curl: (52) Empty reply from server
command terminated with exit code 52
- View static-server logs for the calls returning error
...
2025-02-03T10:23:07.521Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"6617"] remote address:10.a.b.c:37846,TLS_error:|268435612:SSL routines:OPENSSL_internal:HTTP_REQUEST:TLS_error_end
...
Consul info for both Client and Server
Client info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = 920cc7c6
version = 1.20.1
version_metadata =
consul:
acl = enabled
bootstrap = false
known_datacenters = 5
leader = true
leader_addr = 10.36.10.68:8300
server = true
raft:
applied_index = 1146343
commit_index = 1146343
fsm_pending = 0
last_contact = 0
last_log_index = 1146343
last_log_term = 64
last_snapshot_index = 1130696
last_snapshot_term = 64
latest_configuration = [{Suffrage:Voter ID:d44171e8-e0e2-6abb-95c3-01f2fc99a918 Address:10.36.30.45:8300} {Suffrage:Voter ID:0f0e40cd-33ab-ea1e-f3f8-6b5f50f1ddfe Address:10.36.10.68:8300} {
Suffrage:Voter ID:d7d3f7d8-a3b5-12ac-8f09-ba8413757bcb Address:10.36.42.88:8300}]
latest_configuration_index = 0
num_peers = 2
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 64
runtime:
arch = amd64
cpu_count = 2
goroutines = 479
max_procs = 2
os = linux
version = go1.22.7
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 23
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1038
members = 3
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 25440
members = 15
query_queue = 0
query_time = 1
Using kubernetes consul dataplane with chart config (see Server agent HCL config below)
Server info
Using kubernetes consul dataplane with chart config (see Server agent HCL config below)
global:
enabled: true
enablePodSecurityPolicies: false
datacenter: dc1
tls:
enabled: true
verify: true
httpsOnly: true
federation:
enabled: true
createFederationSecret: true
gossipEncryption:
autoGenerate: true
acls:
manageSystemACLs: true
createReplicationToken: true
argocd:
enabled: true
server:
enabled: true
replicas: 3
storageClass: ebs-csi-gp3-encrypt-retain
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Delete
resources: |
requests:
cpu: "100m"
limits:
memory: "500Mi"
cpu: "500m"
storage: 10Gi
disruptionBudget:
enabled: false
dns:
enabled: true
enableRedirection: false
ui:
enabled: true
service:
type: ClusterIP
connectInject:
enabled: true
default: false
logLevel: "debug"
transparentProxy:
defaultEnabled: true
defaultOverwriteProbes: true
disruptionBudget:
enabled: false
cni:
enabled: true
logLevel: info
cniBinDir: "/opt/cni/bin"
cniNetDir: "/etc/cni/net.d"
meshGateway:
enabled: true
replicas: 2
service:
type: LoadBalancer
annotations:
'service.beta.kubernetes.io/aws-load-balancer-name': "consul-mgw-dc1-pri"
'service.beta.kubernetes.io/aws-load-balancer-type': "external"
'service.beta.kubernetes.io/aws-load-balancer-scheme': "internal"
'service.beta.kubernetes.io/aws-load-balancer-nlb-target-type': "ip"
'service.beta.kubernetes.io/aws-load-balancer-backend-protocol': "tcp"
'service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled': "true"
Operating system and Environment details
AWS EKS Cluster
Client Version: v1.29.0-eks-5e0fdde
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.12-eks-2d5f260