Skip to content

10.1.x: Crash in SSL_read_early_data #12140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bneradt opened this issue Mar 28, 2025 · 1 comment · May be fixed by #12224
Open

10.1.x: Crash in SSL_read_early_data #12140

bneradt opened this issue Mar 28, 2025 · 1 comment · May be fixed by #12224
Assignees
Milestone

Comments

@bneradt
Copy link
Contributor

bneradt commented Mar 28, 2025

When testing 10.1.x on a production box, I'm seeing the following crash with some frequency:

(gdb) bt
#0  0x00007f1504a95b1f in ssl3_ctrl (s=0x7f0abe756800, cmd=<optimized out>, larg=0, parg=0x0) at ssl/s3_lib.c:3736
#1  0x00000000007e193a in ssl_callback_info (ssl=0x7f0abe756800, where=<optimized out>, ret=<optimized out>) at /home/bneradt/build/_scm/trafficserver10.1/src/iocore/net/SSLUtils.cc:1092
#2  0x00007f1504b11cb4 in tls_finish_handshake (s=0x7f0abe756800, wst=<optimized out>, clearbufs=<optimized out>, stop=1) at ssl/statem/statem_lib.c:1519
#3  0x00007f1504b056ec in write_state_machine (s=0x7f0abe756800) at ssl/statem/statem.c:860
#4  state_machine (s=0x7f0abe756800, server=1) at ssl/statem/statem.c:488
#5  0x00007f1504aa5aa2 in SSL_read_early_data (s=0x7f0abe756800, buf=0x7f096e1ff000, num=16384, readbytes=0x7f14f83fcf48) at ssl/ssl_lib.c:2377
#6  0x00000000007c59f1 in SSLNetVConnection::_ssl_accept (this=0x7f07e7408000) at /home/bneradt/build/_scm/trafficserver10.1/include/tscore/Ptr.h:116
#7  0x00000000007ca782 in SSLNetVConnection::sslServerHandShakeEvent (this=this@entry=0x7f07e7408000, err=@0x7f14f83fd464: 0) at /home/bneradt/build/_scm/trafficserver10.1/src/iocore/net/SSLNetVConnection.cc:1249
#8  0x00000000007cb1c9 in SSLNetVConnection::sslStartHandShake (this=0x7f07e7408000, event=<optimized out>, err=@0x7f14f83fd464: 0) at /home/bneradt/build/_scm/trafficserver10.1/src/iocore/net/SSLNetVConnection.cc:1050
#9  0x00000000007c7a49 in SSLNetVConnection::net_read_io (this=0x7f07e7408000, nh=0x7f14fd8ce690) at /home/bneradt/build/_scm/trafficserver10.1/src/iocore/net/SSLNetVConnection.cc:507
#10 0x0000000000827d14 in NetHandler::process_ready_list (this=this@entry=0x7f14fd8ce690) at /home/bneradt/build/_scm/trafficserver10.1/src/iocore/net/NetHandler.cc:284
#11 0x000000000082806a in NetHandler::waitForActivity (this=0x7f14fd8ce690, timeout=<optimized out>) at /home/bneradt/build/_scm/trafficserver10.1/src/iocore/net/NetHandler.cc:375
#12 0x00000000008636ea in EThread::execute_regular (this=this@entry=0x7f14fd8cdb00) at /home/bneradt/build/_scm/trafficserver10.1/include/tsutil/Histogram.h:156
#13 0x0000000000863919 in EThread::execute (this=0x7f14fd8cdb00) at /home/bneradt/build/_scm/trafficserver10.1/src/iocore/eventsystem/UnixEThread.cc:358
#14 EThread::execute (this=0x7f14fd8cdb00) at /home/bneradt/build/_scm/trafficserver10.1/src/iocore/eventsystem/UnixEThread.cc:335
#15 0x00000000008604b2 in spawn_thread_internal (a=0x7f1500703640) at /home/bneradt/build/_scm/trafficserver10.1/src/iocore/eventsystem/Thread.cc:75
#16 0x00007f1502fa21ca in start_thread () from /lib64/libpthread.so.0
#17 0x00007f1502c0ee73 in clone () from /lib64/libc.so.6

Some possibly interesting early data variables:

(gdb) f 7                                                        
#7  0x00000000007ca782 in SSLNetVConnection::sslServerHandShakeEvent (this=this@entry=0x7f07e7408000, err=@0x7f14f83fd464: 0) at /home/bneradt/build/_scm/trafficserver10.1/src/iocore/net/SSLNetVConnection.cc:1249
1249      ssl_error_t ssl_error = this->_ssl_accept();

(gdb) set pagination off                 
(gdb) set print pretty                                     
(gdb) p *this                    
$1 = {         

...
  <TLSEarlyDataSupport> = {                                          
    _vptr.TLSEarlyDataSupport = 0x917408 <vtable for SSLNetVConnection+936>, 
    static DEFAULT_MAX_EARLY_DATA_SIZE = 16384, 
    static _ex_data_index = 7,    
    _early_data_len = 0
  },                      
...
  _early_data_finish = false, 
  _early_data_buf = 0x0, 
  _early_data_reader = 0x0

For reference, I'm testing against 10.1.x commit:

commit 42f2920bce6df86e0e21a8de85e33a1795e9eff5
Author: Chris McFarlen <[email protected]>
Date:   Tue Mar 11 11:51:42 2025 -0500

    Move defaulting install prefix before layout setup (#12085)
    
    Co-authored-by: Chris McFarlen <[email protected]>
    (cherry picked from commit 9a1ef119f3b7a017583c9aa5d088b2437101b92a)

After testing commits, I find that reverting #11844 stops this crash for us.

bneradt pushed a commit to bneradt/trafficserver that referenced this issue Apr 15, 2025
…pache#1043)

This reverts commit 9355249.

Reverting this commit avoids the following SSL_read_early_data crash:
apache#12140
@bneradt bneradt added this to the 10.1.0 milestone Apr 28, 2025
@bneradt
Copy link
Contributor Author

bneradt commented May 5, 2025

It's interesting that the crash happens in this OpenSSL call here:

int nid = SSL_get_negotiated_group(const_cast<SSL *>(ssl));

I'm not sure why calling SSL_get_negotiated_group with what should otherwise be a valid SSL* object (it was used multiple times earlier in the function) should cause an issue.

I verified that if I simplified the #11844 to this single line addition, I still get this crash:

diff --git a/src/iocore/net/SSLUtils.cc b/src/iocore/net/SSLUtils.cc
index a71180fa2..47992de89 100644
--- a/src/iocore/net/SSLUtils.cc
+++ b/src/iocore/net/SSLUtils.cc
@@ -1077,6 +1077,8 @@ ssl_callback_info(const SSL *ssl, int where, int ret)
       }
       Metrics::Counter::increment(it->second);
     }
+
+    SSL_get_negotiated_group(const_cast<SSL *>(ssl));
   }
 }

Therefore simply calling SSL_get_negotiated_group(const_cast<SSL *>(ssl)); in ssl_callback_info causes this crash.

bneradt added a commit to bneradt/trafficserver that referenced this issue May 6, 2025
Recent versions of OpenSSL define SSL_CB_HANDSHAKE_DONE as follows:

> Callback has been called because a handshake is finished. It also
> occurs if the handshake is paused to allow the exchange of early data.

Calling SSL_get_negotiated_group duing early data situations was causing
a crash. This moves the calling of SSL_get_negotiated_group to
_record_tls_handshake_end_time where it will reliably not be called in
the context of processing early data.

Fixes: apache#12140
bneradt added a commit to bneradt/trafficserver that referenced this issue May 6, 2025
Recent versions of OpenSSL define SSL_CB_HANDSHAKE_DONE as follows:

> Callback has been called because a handshake is finished. It also
> occurs if the handshake is paused to allow the exchange of early data.

Calling SSL_get_negotiated_group duing early data situations was causing
a crash. This moves the calling of SSL_get_negotiated_group next to
_record_tls_handshake_end_time where it will reliably not be called in
the context of processing early data.

Fixes: apache#12140
bneradt added a commit to bneradt/trafficserver that referenced this issue May 6, 2025
Recent versions of OpenSSL define SSL_CB_HANDSHAKE_DONE as follows:

> Callback has been called because a handshake is finished. It also
> occurs if the handshake is paused to allow the exchange of early data.

Calling SSL_get_negotiated_group duing early data situations was causing
a crash. This moves the calling of SSL_get_negotiated_group next to
_record_tls_handshake_end_time where it will reliably not be called in
the context of processing early data.

Fixes: apache#12140
bneradt added a commit to bneradt/trafficserver that referenced this issue May 6, 2025
Recent versions of OpenSSL define SSL_CB_HANDSHAKE_DONE as follows:

> Callback has been called because a handshake is finished. It also
> occurs if the handshake is paused to allow the exchange of early data.

Calling SSL_get_negotiated_group duing early data situations was causing
a crash. This moves the calling of SSL_get_negotiated_group next to
_record_tls_handshake_end_time where it will reliably not be called in
the context of processing early data.

Fixes: apache#12140
bneradt added a commit to bneradt/trafficserver that referenced this issue May 6, 2025
Recent versions of OpenSSL define SSL_CB_HANDSHAKE_DONE as follows:

> Callback has been called because a handshake is finished. It also
> occurs if the handshake is paused to allow the exchange of early data.

Calling SSL_get_negotiated_group duing early data situations was causing
a crash. This moves the calling of SSL_get_negotiated_group next to
_record_tls_handshake_end_time where it will reliably not be called in
the context of processing early data.

Fixes: apache#12140
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant