Skip to content

WIP: modprobe: efficient and extensible flux startup and shutdown #6774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Apr 19, 2025

This experimental PR is a prototype of a replacement for the Flux rc startup/shutdown system.
It started with a few goals:

  • parallelize module loading and other rc "tasks" to speed up startup
  • improve upon the method of overriding modules that have alternatives, like sched and feasiblity
  • easily restrict modules/tasks to certain ranks

I'm somewhat satisfied with the interface in this prototype, so I'm posting it early for feedback on that aspect of the design, before going on to do documentation and tests. I'm looking for feedback on the overall scheme here, and if it would an acceptable path forward.

I apologize, the description below is lengthy:

The proposed design here consists of 3 components: A TOML configuration specification for expressing modules and their relationships and requirements: modules.toml and etc/modules.d/*.toml, a new Python interface to define tasks that run during rc1 and rc3, and finally a flux modprobe command that processes the previous two files and runs the tasks defined for the runlevel as efficiently as possible. Each of these components is described in more detail below:

modules.toml

The modules.toml file defines modules in Flux, whether they require other modules, any broker attrs/config they need, and the ranks on which they should be loaded. The flux-core modules are all defined in /etc/flux/modules.toml, and extra modules can be defined in modules.d/*.toml. The file currently contains one [[modules]] array, each entry of which defines a module and supports the following keys (this is taken from the top of the existing modules.toml)

#   name     (required) The module name. This will be the target of module load
#            and remove requests. If there is a name collision between entries,
#            then the last one loaded will be used.
#
#   module   (optional) The module to load if different from name.
#
#   provides (optional) List of services this module provides, e.g. "sched".
#             If multiple modules provide the same service name, the last one
#             loaded takes precedence by default, though this can be influenced
#             by configuration or broker attributes.
#
#   args     (optional) An array of module arguments.
#
#   ranks    (optional) The set of ranks on this module should be loaded.
#            May either be an RFC 22 idset string, or a string beginning with
#            `>` or `<` followed by a single integer. (e.g. ">0" to load a
#            module on all ranks but rank 0.
#
#   requires  (optional) An array of services this module requires, i.e. if
#             this module is loaded then services/modules in requires will also
#             be loaded. If this module also has to be loaded after any
#             required modules, add them to the after array as well.
#
#   after     (optional) An array of modules for which this module must be
#             loaded after. If this module also requires the service or module
#             it is loaded after, then the module must also be added to
#             `requires`.
#
#   needs     (optional) An array of modules which are required for this module
#             to be loaded.
#
#   needs-config (optional) An array of configuration keys in dotted key
#                form which are required for this module to be loaded.
#                If the key is not set, then the module is skipped.
#
#   needs-attrs  (optional) Same as with needs-config, but for broker
#                attributes.

Here's an example module:

[[modules]]
name = "cron"
ranks = "0"
requires = ["heartbeat"]
args = ["sync=heartbeat.pulse"]

This entry defines the cron module, only loaded on rank 0. The cron module requires the heartbeat mdoule, and it should be loaded by default with the args sync=heartbeat.pulse.

One caveat: the provides documentation claims to support a way to override the default alternative, but that is not yet implemented.

modprobe rcX.py files

flux modprobe replaces the rcX scripts with a Python file that defines a set of tasks to run and the relationships between those tasks so that interdependent tasks are run in the correct order. Non-module tasks are defined in an Python file via the @task decorator:

def task(name, **kwargs):
    """
    Decorator for modprobe "rc" task functions.

    This decorator is applied to functions in an rc1 or rc3 python
    source file to turn them into valid flux-modprobe(1) tasks.

    Args:
    name (required, str): The name of this task.
    ranks (required, str): A rank expression that indicates on which
        ranks this task should be invoked. ``ranks`` may be a valid
        RFC 22 Idset string, a single integer prefixed with ``<`` or
        ``<`` to indicate matching ranks less than or greater than a
        given rank, or the string ``all`` (the default if ``ranks``
        is not specified). Examples: ``0``, ``>0``, ``0-3``.
    requires (options, list): An optional list of task or module names
        this tasnk requires. This is used to ensure required tasks are
        active when activating another task. It does not indicate that
        this task will necessarily be run before the tasks it requires.
        (See ``before`` for that feature)
    needs (options, list): Disable this task if any task in ``needs`` is
        not active.
    provides (optional, list): An optional list of string service name
        that this task provides. This can be used to set up alternatives
        for a given service. (Mostly useful with modules)
    before (optional, list): A list of tasks or modules for which this task
        must be run before.
    after (optional, list) A list of tasks or modules for which this task
        must be run after.
    needs_attrs (optional, list): A list of broker attributes on which
        this task depends. If any of the attributes are not set then the
        task will not be run.
    needs_config (optional, list): A list of config keys on which this
        task depends. If any of the specified config keys are not set,
        then this task will not be run.

    Example:
    ::
        # Declare a task that will be run after the kvs module is loaded
        # only on rank 0
        @task("test", ranks="0", needs=["kvs"], after=["kvs"])
        def test_kvs_task(context):
            # do something with kvs
    """

The context here is a flux.modprobe.Context object which is shared between all tasks, it contains some convenience attributes and methods to get a shared Flux handle, send an rpc, or run something under bash, as well as offering a way to get broker attributes, config, and share arbitrary data between tasks. Here's an example of a task from rc1.py:

@task(
    "config-reload",
    ranks=">0",
    needs_attrs=["config.path"],
    before=["*"],
)
def config_reload(context):
    context.rpc("config.reload").get()

This task runs only on ranks != 0, only if the config.path broker attribute is set and runs before all other tasks. It sends the config.reload rpc and waits for the result.

When modprobe loads a *.py file, it will always first run any defined setup (context) method. This is where the rc file can define modules to load or remove, setup context data, etc. This is currently how an rc.d/*.py could set an alternative or extend module args, or a replacement rc1.py could load a subset of modules (though a more light weight method could be implemented later).

Check out the modules.toml and rc1.py and rc3.py in this PR for full examples.

Transition

For now, the majority of rc1 and rc3 are replaced with flux modprobe rcX. The run through of FLUX_RC_EXTRA is still maintained for backwards compatibility, but some kind of transition plan for flux-sched will need to be implemented. If the overall design here is acceptable, we can work on that next.

Timing

This implementation reduces the rc1 runtime in Flux from ~2.3s to ~.4s for a single rank flux start. To evaluate the prototype, the current version supports a --timing option which dumps the start/end times of all tasks into the KVS. Here's the results for a system instance startup as an example:

2025-04-19-073436_871x504_scrot

Problem: We'd like to use the TopologicalSorter from Python 3.9 graphlib,
but Flux still needs to support Python < 3.9.

Grab a backport from

 https://github.com/mariushelf/graphlib_backport

and vendor in flux.utils.
@grondo grondo force-pushed the modprobe2 branch 6 times, most recently from df1c909 to 72a0172 Compare April 21, 2025 14:08
@chu11
Copy link
Member

chu11 commented Apr 22, 2025

A really dumb initial comment. Seeing a file called modules.toml would make me think this is how I load modules into flux. Granted documentation is to be written, but perhaps rename to uhhh modules-definitions.toml? (and likewise modules-defnitions.d/)?

@grondo
Copy link
Contributor Author

grondo commented Apr 22, 2025

it was originally modprobe.toml. Would that be preferable, since it is how you configure the modprobe tool?

@chu11
Copy link
Member

chu11 commented Apr 23, 2025

it was originally modprobe.toml. Would that be preferable, since it is how you configure the modprobe tool?

Yeah, I think that would be better.

grondo added 4 commits April 23, 2025 07:19
Problem: Flux startup and shutdown currently use scripts that load
modules and run other tasks serially, but there is potential to do some
of this work in parallel which would greatly speed up instance startup.

Introduce flux-modprobe, which can manage loading and removal of
modules and the execution of tasks based on precedence relationships.

Modules are defined in TOML configuration files, and startup/shutdown
tasks defined in python using a modprobe `@task` decorator. The rc1
and rc3 scripts can then be replaced by `flux modprobe rc1` and
`flux modprobe rc3` which load modules and execute tasks in the most
efficient manner possible.
Problem: flux-modprobe requires that modules be configured in a
modprobe.toml file, but no such file currently exists.

Add modprobe.toml containing configuration for the current set of
flux-core modules.
Problem: Flux startup/shutdown does not use flux-modprobe.

Add etc/rc1.py and etc/rc3.py files to define startup and shutdown
tasks. Replace most of rc1/rc3 with calls to `flux modprobe rc1` and
`flux modprobe rc3`.
Problem: `FLUX_MODPROBE_PATH` set in the calling environment
could affect tests in the testsuite.

Unset `FLUX_MODPROBE_PATH` for tests.
grondo added 7 commits April 23, 2025 10:50
Problem: The rc-* "personality" files used by `test_under_flux` in the
testsuite do not use flux-modprobe. This not only makes them less
efficient, but also we lose out on testing the modprobe tool.

Convert the rc personality files to flux-modprobe.
Problem: Several tests in t1200-stats-basic.t are racy with
the new rc1, which runs faster than before.

Use `dmesg-grep.py` to find matching dmesg lines before exiting the
instance instead of assuming the output will occur in time when
/bin/true is used as the initial program.

In one case, load the heartbeat module with period=0.2s to ensure
content module stats are emitted within `sleep 1`.
Problem: flake8 identifies several problems with the helper script
`etc/gen-cmdhelp.py`.

Address the issues so that Python linting run clean on this script.
Problem: There's trailing whitespace in `etc/.gitignore`.

Remove it.
Problem: New Python files in etc and t/rc are not run under black,
flake8 and mypy since these paths do not appear in the pre-commit
`files` pattern.

Add these paths to `.pre-commit-config.yaml` so the new `.py` files
are formatted and checked in CI.
Problem: There are no tests of flux-modprobe.

Add t0100-modprobe.t for this purpose.
Problem: None of the tests in the testsuite test starting an instance
with content.restore=auto but no RESTORE link, but this is the case
when first starting a system instance, and exercises key login in
etc/rc1.py.

Add -Scontent.restore=auto to the first instance run in
t2810-kvs-garbage-collect.t so that this case is exercised in the
testsuite.
Comment on lines +101 to +102
# if task.ranks:
# tree.label += f" (ranks={task.ranks})"

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Copy link

codecov bot commented Apr 23, 2025

Codecov Report

Attention: Patch coverage is 91.02990% with 54 lines in your changes missing coverage. Please review.

Project coverage is 83.90%. Comparing base (b6f6b87) to head (fd33472).

Files with missing lines Patch % Lines
src/bindings/python/flux/modprobe.py 90.24% 48 Missing ⚠️
src/cmd/flux-modprobe.py 94.54% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6774      +/-   ##
==========================================
+ Coverage   83.83%   83.90%   +0.06%     
==========================================
  Files         535      537       +2     
  Lines       89323    89925     +602     
==========================================
+ Hits        74888    75449     +561     
- Misses      14435    14476      +41     
Files with missing lines Coverage Δ
src/cmd/flux-modprobe.py 94.54% <94.54%> (ø)
src/bindings/python/flux/modprobe.py 90.24% <90.24%> (ø)

... and 32 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants