Skip to content

Skip broken HTML preview test case with libxml >= 2.14 #18413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/18413.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Skip broken HTML preview test case with libxml >= 2.14.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue (#18406) mentions that this isn't reproducible with [email protected]. Could another alternative fix just be to update that dependency instead? -> #18480

5 changes: 5 additions & 0 deletions tests/media/test_html_preview.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,11 @@
)

from tests import unittest
from unittest.case import SkipTest

try:
import lxml
from lxml import etree
except ImportError:
lxml = None # type: ignore[assignment]

Expand Down Expand Up @@ -324,6 +326,9 @@ def test_empty(self) -> None:

def test_no_tree(self) -> None:
"""A valid body with no tree in it."""
if etree.LIBXML_VERSION >= (2, 14):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What versions specifically is this broken and fixed in?

These tests seem to pass fine for me locally with libxml2 2.14.3-1 on Manjaro Linux (Arch-based).

$ SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.media.test_html_preview
The "poetry.dev-dependencies" section is deprecated and will be removed in a future version. Use "poetry.group.dev.dependencies" instead.
tests.media.test_html_preview
  MediaEncodingTestCase
    test_content_type ...                                                  [OK]
    test_duplicates ...                                                    [OK]
    test_fallback ...                                                      [OK]
    test_meta_charset ...                                                  [OK]
    test_meta_charset_underscores ...                                      [OK]
    test_meta_xml_encoding ...                                             [OK]
    test_unknown_invalid ...                                               [OK]
    test_xml_encoding ...                                                  [OK]
  OpenGraphFromHtmlTestCase
    test_comment ...                                                       [OK]
    test_comment2 ...                                                      [OK]
    test_empty ...                                                         [OK]
    test_empty_description ...                                             [OK]
    test_h1_as_title ...                                                   [OK]
    test_invalid_encoding ...                                              [OK]
    test_invalid_encoding2 ...                                             [OK]
    test_missing_title ...                                                 [OK]
    test_missing_title_and_broken_h1 ...                                   [OK]
    test_nested_nodes ...                                                  [OK]
    test_no_tree ...                                                       [OK]
    test_script ...                                                        [OK]
    test_simple ...                                                        [OK]
    test_twitter_tag ...                                                   [OK]
    test_windows_1252 ...                                                  [OK]
    test_xml ...                                                           [OK]
  SummarizeTestCase
    test_long_summarize ...                                                [OK]
    test_short_summarize ...                                               [OK]
    test_small_then_large_summarize ...                                    [OK]

-------------------------------------------------------------------------------
Ran 27 tests in 0.025s

PASSED (successes=27)

I don't see the specific fix commit mentioned in the issue in the diff for the latest tag, https://gitlab.gnome.org/GNOME/libxml2/-/compare/v2.14.2...v2.14.3?from_project_id=1665 so I'm confused why it's working for me.

We may want to adjust this condition depending on the answer here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just re-tested, and building synapse 1.130.0 with libxml 2.14.3 and lxml 5.4.0 still fails for me. I'm using the PKGBUILD of the matrix-synapse 1.130.0-1 package from Arch Linux, but with rm-faling-test.patch removed, and I'm running both the build and the tests in a clean chroot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we're using the same versions, there seems to be another factor at play here and I don't think we've gotten to the root issue yet.

Perhaps it would be good to see the exact test failure output you're seeing. Although, based on the issue you linked, I could see how the test might fail.

If I look closer at the source and add some debug logs to the decode_body(...) function, I can see that it goes all the way to the end of the function and uses utf-8 encoding.

def test_no_tree(self) -> None:
"""A valid body with no tree in it."""
html = b"\x00"
tree = decode_body(html, "http://example.com/test.html")
self.assertIsNone(tree)


I'm testing with Synapse from source. My previous reply was with [email protected] and testing again now with [email protected] since it just got updated.

# `pamac` is the package manager on my system, Manajaro Linux (Arch-based).
# Note: The `python-lxml` listed here isn't relevant to my local Synapse from source
# which will use a Poetry virtual environment
$ pamac search libxml2 --installed
python-lxml  5.4.0-1                                           extra
    Python3 binding for the libxml2 and libxslt libraries
perl-alien-libxml2  0.20-1                                     extra
    Install the C libxml2 library on your system
lib32-libxml2  2.14.3-1                                     multilib
    XML C parser and toolkit (32-bit)
libxml2  2.14.3-1                                               core
    XML C parser and toolkit

All the tests pass:

$ SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.media.test_html_preview
The "poetry.dev-dependencies" section is deprecated and will be removed in a future version. Use "poetry.group.dev.dependencies" instead.
tests.media.test_html_preview
  MediaEncodingTestCase
    test_content_type ...                                                  [OK]
    test_duplicates ...                                                    [OK]
    test_fallback ...                                                      [OK]
    test_meta_charset ...                                                  [OK]
    test_meta_charset_underscores ...                                      [OK]
    test_meta_xml_encoding ...                                             [OK]
    test_unknown_invalid ...                                               [OK]
    test_xml_encoding ...                                                  [OK]
  OpenGraphFromHtmlTestCase
    test_comment ...                                                       [OK]
    test_comment2 ...                                                      [OK]
    test_empty ...                                                         [OK]
    test_empty_description ...                                             [OK]
    test_h1_as_title ...                                                   [OK]
    test_invalid_encoding ...                                              [OK]
    test_invalid_encoding2 ...                                             [OK]
    test_missing_title ...                                                 [OK]
    test_missing_title_and_broken_h1 ...                                   [OK]
    test_nested_nodes ...                                                  [OK]
    test_no_tree ...                                                       [OK]
    test_script ...                                                        [OK]
    test_simple ...                                                        [OK]
    test_twitter_tag ...                                                   [OK]
    test_windows_1252 ...                                                  [OK]
    test_xml ...                                                           [OK]
  SummarizeTestCase
    test_long_summarize ...                                                [OK]
    test_short_summarize ...                                               [OK]
    test_small_then_large_summarize ...                                    [OK]

-------------------------------------------------------------------------------
Ran 27 tests in 0.015s

PASSED (successes=27)

raise SkipTest("This test is known to be broken with libxml >= 2.14.")

html = b"\x00"
tree = decode_body(html, "http://example.com/test.html")
self.assertIsNone(tree)
Expand Down