PeeD1
Profile Answers by PeeD1

Apr 17th, 2009

The following are some possible strategies for dealing with this problem:

1) If a job does a bulk load into a table and you know the table was empty before the job began, you can truncate the table before recovering the job.

2) If a job does a bulk load into a table and the table was not empty before the job began, you can solve the problem by having the graph attach a load ID to the rows it loads. You can do this by putting a component, such as a REFORMAT, in front of the component that loads the table to attach the ID to each row. Then before you recover and rerun the job, you can delete the rows containing the ID.

3) Use an API mode component with a commitTable parameter, such as UPDATE_TABLE. Such a component keeps track of the records it commits to the database. When you recover and rerun the job, it will skip over the records it has already committed.
For example, suppose you configure an Update Table component to do a commit every 10,000 rows and a system failure occurs after it has processed 133,753 rows. When you restart the job, the loader will know that the first 130,000 rows have been committed and skip them.
If you use this solution, you must design the graph so that the input data to the Update Table arrives in the same order every time you run the graph.