Skip to content

fix(check-licenses): print message on exceptions while loading a file #3732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions scripts/check_licenses.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,17 @@ async def validate_file(path: pathlib.Path) -> int:
"""Validate a single file."""
checks = 0
errors = 0
records = await asyncio.get_event_loop().run_in_executor(
None, lambda p: json.loads(open(p, "rb").read()), path
)

try:
records = await asyncio.get_event_loop().run_in_executor(
None, lambda p: json.loads(open(p, "rb").read()), path
)

except Exception as e:
message = f"Failed to load {path.name} because of {type(e).__name__}: {e}"

logging.error(message)
raise e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works well, but in the case of testing many files, the error message may get "lost" in the middle:

$ python3 ./scripts/check_licenses.py 
...
[INFO] Successfully validated file atlas-2020-4lep.json
[INFO] Successfully validated file lhcb-antimatter-matters-2017.json
[INFO] Successfully validated file jade-computing-notes.json
[INFO] Successfully validated file cms-simulated-datasets-2015-part_11.json
[INFO] Successfully validated file atlas-2024-mc-pp-exotics-nominal.json
[ERROR] Failed to load mytest.json because of JSONDecodeError: Extra data: line 402 column 4 (char 13334)
[INFO] Successfully validated file LHCb_2011_Beam3500GeV_VeloClosed_MagDown_RealData_Reco14_Stripping21r1p1a_DIMUON_DST.json
[INFO] Successfully validated file LHCb_2011_Beam3500GeV_VeloClosed_MagUp_RealData_Reco14_Stripping21r1_BHADRONCOMPLETEEVENT_DST.json
[INFO] Successfully validated file opera-detector-events-charm.json
[INFO] Successfully validated file cms-simulated-datasets-Run2011A.json
[INFO] Successfully validated file cms-tools-nanoaod-outreach-higgstautau.json
...
[INFO] Processed 329 files within 1.09 seconds.
[ERROR] Validation completed with 1 errors!
        Please ensure the licenses are one of the following: ['CC0-1.0', 'GPL-3.0-only', 'MIT', 'Apache-2.0', 'BSD-3-Clause'].
        If you are using a valid SPDX license string that is not in the above list, please contact `[email protected]`.

If output has 330 lines, the ERROR messages are getting a bit "lost" in the output.

Possible solutions:

  • Repeat the error situation at the end, indicating which files failed, basically giving a summary like "329 files tested, 1 error found (mydata.json)", similarly to how pytest behaves.
  • As also suggested by Pablo in the issue, please demote INFO info DEBUG, so that these success messages would not be unnecessarily printed, and would not obscure the picture. (This is usually how linters work, only output errors and warnings.) If a user wants to see them, they could increase loglevel by a command-line option

BTW some other cosmetic remarks:

  • Cosmetics: BTW we may also want to process the input files in some nice alphabetical order for easier orientation in the outputs?
  • Cosmetics: While you are at editing this script, please amend the header to use #!/usr/bin/env python3 i.e. the Python version 3 included. Helps on some systems where there are only python2 and python3 executables.


for record in records:
if rec_licenses := record.get("license"):
Expand Down