In a process which reads 15,000 rows from one table and either updates or inserts them into another table and given the scenario where the process may abnormally terminate, describe how you track progress within the data to ensure the process could be restarted without repeating any processing.

Showing Answers 1 - 1 of 1 Answers

PeeD1

  • Apr 17th, 2009
 

The following are some possible strategies for dealing with this problem:

1) If a job does a bulk load into a table and you know the table was empty before the job began, you can truncate the table before recovering the job.

2) If a job does a bulk load into a table and the table was not empty before the job began, you can solve the problem by having the graph attach a load ID to the rows it loads. You can do this by putting a component, such as a REFORMAT, in front of the component that loads the table to attach the ID to each row. Then before you recover and rerun the job, you can delete the rows containing the ID.

3) Use an API mode component with a commitTable parameter, such as UPDATE_TABLE. Such a component keeps track of the records it commits to the database. When you recover and rerun the job, it will skip over the records it has already committed.

For example, suppose you configure an Update Table component to do a commit every 10,000 rows and a system failure occurs after it has processed 133,753 rows. When you restart the job, the loader will know that the first 130,000 rows have been committed and skip them.

If you use this solution, you must design the graph so that the input data to the Update Table arrives in the same order every time you run the graph.

  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.

 

Related Answered Questions

 

Related Open Questions