Skip to content

SSL routines:OPENSSL_internal:HTTP_REQUEST:TLS_error_end after some hours from deployment time #22123

Open
@Roxyrob

Description

@Roxyrob

Overview of the Issue

To test Consul service mesh in "Transparent Proxy" mode, We deployed in a AWS EKS "static-server" (hashicorp/http-echo) and "static-client" (curlimages/curl) images in the service mesh (sidecar proxy injection: connectInject / consul dataplane), as sample code on consul documentation do.

NOTE: we use AWS EC2 SPOT instances in sandboxing/texting environments

Calls from static-client to static-server (curl), works fine using proxy and using these FQDN:

  • static-server.service.consul (consul dns resolve to a static-server Pod IP)
  • static-server.connect.consul (consul dns resolve to a static-server Pod IP)
  • static-server.virtual.consul (consul dns resolve to 240.x.x.x - private virtual IP used by sidecar proxy)

After some hours we cannot anymore get a correct response from static-server using these FQDN:

  • (KO) static-server.service.consul
  • (KO) static-server.connect.consul

But no issue for the FQDN:

  • (OK) static-server.virtual.consul

A restart for static-server deployment resolve the issue.


Reproduction Steps

Steps to reproduce this issue:

  1. Deploy consul on AWS EKS cluster/datacenter with ArgoCD
  2. Deploy static-server and static-client in different specific namespaces enabling connectInjection (Transparent Proxy enable by default in chart)
  3. Set WAN Federation with Mesh Gateway with other consul clusters/datacenter
  4. Run commands like kubectl -n static-client exec deploy/static-client -c static-client -- curl http://static-server.service.consul
  5. Get correct responses from static-server (for all three FQDNs)
  6. View static-client logs for the calls
...
2025-02-03T09:12:05.830Z+00:00 [debug] envoy.rbac(22) checking connection: requestedServerName: , sourceIP: 10.x.y.z:52864, directRemoteIP: 10.x.y.z:52864,remoteIP: 10.x.y.z:52864, localAddress: 10.x.y.z:20000, ssl: uriSanPeerCertificate: spiffe://aa574018-aaa6-0a95-28a6-956aa6e501cd.consul/ns/default/dc/dc1/svc/static-client, dnsSanPeerCertificate: , subjectPeerCertificate: , dynamicMetadata:
2025-02-03T09:12:05.831Z+00:00 [debug] envoy.rbac(22) enforced allowed, matched policy consul-intentions-layer4
2025-02-03T09:12:05.838Z+00:00 [debug] envoy.rbac(22) checking connection: requestedServerName: , sourceIP: 10.x.y.z:52864, directRemoteIP: 10.x.y.z:52864,remoteIP: 10.x.y.z:52864, localAddress: 10.x.y.z:20000, ssl: uriSanPeerCertificate: spiffe://aa574018-aaa6-0a95-28a6-956aa6e501cd.consul/ns/default/dc/dc1/svc/static-client, dnsSanPeerCertificate: , subjectPeerCertificate: , dynamicMetadata:
...
  1. View static-server logs for the calls
...
2025-02-03T10:27:57.667Z+00:00 [debug] envoy.filter(23) [Tags: "ConnectionId":"1787"] new tcp proxy session
2025-02-03T10:27:57.667Z+00:00 [debug] envoy.filter(23) [Tags: "ConnectionId":"1787"] Creating connection to cluster passthrough~static-server.default.dc1.internal.aa574018-aaa6-0a95-28a6-956aa6e501cd.consul
...
  1. Wait for some hours
  2. Run again commands like kubectl -n static-client exec deploy/static-client -c static-client -- curl http://static-server.service.consul
  3. Get correct responses from static-server (for virtual.connect FQDN)
  4. Get error below (for service.consul and connect.consul FQDNs)
curl: (52) Empty reply from server
command terminated with exit code 52
  1. View static-server logs for the calls returning error
...
2025-02-03T10:23:07.521Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"6617"] remote address:10.a.b.c:37846,TLS_error:|268435612:SSL routines:OPENSSL_internal:HTTP_REQUEST:TLS_error_end
...

Consul info for both Client and Server

Client info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease =
        revision = 920cc7c6
        version = 1.20.1
        version_metadata =
consul:
        acl = enabled
        bootstrap = false
        known_datacenters = 5
        leader = true
        leader_addr = 10.36.10.68:8300
        server = true
raft:
        applied_index = 1146343
        commit_index = 1146343
        fsm_pending = 0
        last_contact = 0
        last_log_index = 1146343
        last_log_term = 64
        last_snapshot_index = 1130696
        last_snapshot_term = 64
        latest_configuration = [{Suffrage:Voter ID:d44171e8-e0e2-6abb-95c3-01f2fc99a918 Address:10.36.30.45:8300} {Suffrage:Voter ID:0f0e40cd-33ab-ea1e-f3f8-6b5f50f1ddfe Address:10.36.10.68:8300} {
Suffrage:Voter ID:d7d3f7d8-a3b5-12ac-8f09-ba8413757bcb Address:10.36.42.88:8300}]
        latest_configuration_index = 0
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Leader
        term = 64
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 479
        max_procs = 2
        os = linux
        version = go1.22.7
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 23
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1038
        members = 3
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 25440
        members = 15
        query_queue = 0
        query_time = 1
Using kubernetes consul dataplane with chart config (see Server agent HCL config below)
Server info
Using kubernetes consul dataplane with chart config (see Server agent HCL config below)
global:
  enabled: true
  enablePodSecurityPolicies: false
  datacenter: dc1
  tls:
    enabled: true
    verify: true
    httpsOnly: true
  federation:
    enabled: true
    createFederationSecret: true
  gossipEncryption:
    autoGenerate: true
  acls:
    manageSystemACLs: true
    createReplicationToken: true
  argocd:
    enabled: true
  server:
    enabled: true
    replicas: 3
    storageClass: ebs-csi-gp3-encrypt-retain
    persistentVolumeClaimRetentionPolicy:
      whenDeleted: Retain
      whenScaled: Delete
    resources: |
      requests:
        cpu: "100m"
      limits:
        memory: "500Mi"
        cpu: "500m"
    storage: 10Gi
    disruptionBudget:
      enabled: false
  dns:
    enabled: true
    enableRedirection: false
  ui:
    enabled: true
    service:
      type: ClusterIP
  connectInject:
    enabled: true
    default: false
    logLevel: "debug"
    transparentProxy:
      defaultEnabled: true
      defaultOverwriteProbes: true
    disruptionBudget:
      enabled: false
    cni:
      enabled: true
      logLevel: info
      cniBinDir: "/opt/cni/bin"
      cniNetDir: "/etc/cni/net.d"
  meshGateway:
    enabled: true
    replicas: 2
    service:
      type: LoadBalancer
      annotations:
        'service.beta.kubernetes.io/aws-load-balancer-name': "consul-mgw-dc1-pri"
        'service.beta.kubernetes.io/aws-load-balancer-type': "external"
        'service.beta.kubernetes.io/aws-load-balancer-scheme': "internal"
        'service.beta.kubernetes.io/aws-load-balancer-nlb-target-type': "ip"
        'service.beta.kubernetes.io/aws-load-balancer-backend-protocol': "tcp"
        'service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled': "true"

Operating system and Environment details

AWS EKS Cluster
Client Version: v1.29.0-eks-5e0fdde
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.12-eks-2d5f260

Log Fragments

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions