restartJobOnWorkerRestart property
Restarts the entire CustomJob if a worker gets restarted.
This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.
Implementation
core.bool? restartJobOnWorkerRestart;