Retry/Skip features are applicable to items within a chunk in a fault-tolerant chunk-oriented step, not at the step level or the job level. There are actually two distinct things in your requirement:
1. How to stop a job after a given timeout?
Apart from externally calling JobOperator#stop
after a time out occurs, you can stop a job from within the job itself by sending a stop signal through the StepExecution#isTerminateOnly
flag. The idea is to have access to the step execution in order to set that flag after a certain timeout. This depends on the tasklet type of the step:
Simple Tasklet
For a simple tasklet, you can access the step execution through the ChunkContext
. Here is an example:
import java.time.Duration;
import java.util.Date;
import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;
public class MyTasklet implements Tasklet {
private static final int TIMEOUT = 120; // in minutes (can be turned into a configurable field through a constructor)
@Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
if (timeout(chunkContext)) {
chunkContext.getStepContext().getStepExecution().setTerminateOnly();
}
// do some work
if (moreWork()) {
return RepeatStatus.CONTINUABLE;
} else {
return RepeatStatus.FINISHED;
}
}
private boolean timeout(ChunkContext chunkContext) {
Date startTime = chunkContext.getStepContext().getStepExecution().getJobExecution().getStartTime();
Date now = new Date();
return Duration.between(startTime.toInstant(), now.toInstant()).toMinutes() > TIMEOUT;
}
private boolean moreWork() {
return false; // TODO implement logic
}
}
This tasklet will regularly check if the timeout is exceeded and stop the step (and hence the surrounding job) accordingly.
Chunk-oriented tasklet
In this case, you can use a step listener and set the terminateOnly
flag in one of the lifecycle methods (afterRead
, afterWrite
, etc). Here is an example:
import java.time.Duration;
import java.util.Date;
import org.springframework.batch.core.StepExecution;
import org.springframework.batch.core.listener.StepListenerSupport;
import org.springframework.batch.core.scope.context.ChunkContext;
public class StopListener extends StepListenerSupport {
private static final int TIMEOUT = 120; // in minutes (can be made configurable through constructor)
private StepExecution stepExecution;
@Override
public void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
@Override
public void afterChunk(ChunkContext context) { // or afterRead, or afterWrite, etc.
if (timeout(context)) {
this.stepExecution.setTerminateOnly();
}
}
private boolean timeout(ChunkContext chunkContext) {
Date startTime = chunkContext.getStepContext().getStepExecution().getJobExecution().getStartTime();
Date now = new Date();
return Duration.between(startTime.toInstant(), now.toInstant()).toMinutes() > TIMEOUT;
}
}
The idea is the same, you need to check the time regularly and set the flag when appropriate.
Both ways will leave your job in a STOPPED
status which is a restartable status. Batch jobs used to be executed in a batch window and a common requirement was to stop them (gracefully) when the window is closed. The previous technique is the way to go.
The answer in Spring batch: Retry job if does not complete in particular time is not a good option IMO because it will abruptly terminate the transaction for the current chunk and leave the job in a FAILED
status (which is a restartable status as well). However, by seeing a job in a FAILED
status, it is not possible to distinguish a real failure from a deliberate stop. Given the requirement of deliberately wanting the job stop at a the end of the batch window, I believe that the job should be gracefully stopped and restarted in the next window.
2. How to restart the job automatically after the timeout?
Now that you know how to stop a job after a timeout, you can use a RetryTemplate
around the job launcher and re-launch the job when appropriate. Here is an example:
public static void main(String[] args) throws Throwable {
RetryTemplate retryTemplate = new RetryTemplate();
retryTemplate.setRetryPolicy(new SimpleRetryPolicy(3));
ApplicationContext applicationContext = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = applicationContext.getBean(JobLauncher.class);
Job job = applicationContext.getBean(Job.class);
JobParameters jobParameters = new JobParametersBuilder()
.addDate("runtime", new Date())
.toJobParameters();
retryTemplate.execute((RetryCallback<JobExecution, Throwable>) retryContext -> {
JobExecution jobExecution = jobLauncher.run(job, jobParameters);
if (jobExecution.getExitStatus().getExitCode().equals(ExitStatus.STOPPED.getExitCode())) {
throw new Exception("Job timeout");
}
return jobExecution;
});
}
This will automatically re-run the job at most 3 times if it finishes with the status STOPPED
(for example due to a timeout as shown previously).
Hope this helps.