-
Notifications
You must be signed in to change notification settings - Fork 683
Job arrays support for HTCondor #5960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @JosephLalli |
@vivekvenkris A few questions, since I've been working on the side to improve Nextflow's support for HTCondor: Are you at UWisc by chance? I'm most familiar with the UWisc's CHTC workspace, but if you are at another institution I can do my best to provide assistance/advice. Could you flesh out your reasoning behind job arrays for NXF-HTCondor? I have found that the "executor.submitRateLimit" and (perhaps more importantly) "executor.queueSize" options limit the number of jobs that are simultaneously submitted to the HTCondor scheduler, and ensure that new jobs are submitted at a rate the scheduler node can handle. I am curious to hear more about how you have configured your environment. The lack of a shared POSIX filesystem has hampered my ability to use HTCondor w/ Nextflow in the past, and I have had to work to update Nextflow's Condor implementation to fully support the use of Seqera's Fusion to simulate a shared POSIX filesystem. It would be great if you have a simpler solution. PS - While it's not fit for public consumption quite yet, my Nextflow-condor branch is available here: https://github.com/JosephLalli/nextflow |
Hi @JosephLalli No, I am not in UWisc. Yeah, we tried the We do have a distributed file system that is available across all the compute nodes and the head node. This is also where the nextflow executable, project and the work directories are stored. For now, other than the stress on the scheduler, this seems to run quite seamlessly. We tried to submit the same amount of load outside nextflow via job arrays and that significantly reduced the load on the scheduler by 80%. I will have a look at your condor branch, thanks! Is job array functionality something that you envision to develop support for in the near future? |
New feature
Hello,
I would like to ask if the new "Job Array" functionality can also be extended to HT Condor.
The main problem we are facing currently is that the condor scheduler daemon gets overwhelmed if we submit > 10000 jobs across different nextflow pipelines running on our cluster. If these could be submitted as job arrays, this would greatly reduce the load on the scheduler.
The implementation (from the user experience side) can be the same as for other schedulers like SLURM. There will be a
array <X>
as an additional parameter to the executor. Nextflow then launches one jobs for ever<X>
number of process instances, as an array.Thanks,
Vivek
The text was updated successfully, but these errors were encountered: