How do you improve the performance of a graph?

Showing Answers 1 - 6 of 6 Answers

Ashim Dutta

  • Jul 10th, 2005
 

ashimdutta@yahoo.com 
There are many ways the performance of the graph can be improved. 
1) Use a limited number of components in a particular phase 
2) Use optimum value of max core values for sort and join components 
3) Minimise the number of sort components 
4) Minimise sorted join component and if possible replace them by in-memory join/hash join 
5) Use only required fields in the sort, reformat, join components 
6) Use phasing/flow buffers in case of merge, sorted joins 
7) If the two inputs are huge then use sorted join, otherwise use hash join with proper driving port 
8) For large dataset don't use broadcast as partitioner 
9) Minimise the use of regular expression functions like re_index in the trasfer functions 
10) Avoid repartitioning of data unnecessarily

mithun ganguly

  • Sep 13th, 2005
 

One of the vital points is that :

 Try to run the graph as long as possible in MFS. For these input files should be partitioned and if possible output file should also be partitioned.

  Was this answer useful?  Yes

Gour

  • Jan 18th, 2006
 

is in-memory join faster than sorted join ??

  Was this answer useful?  Yes

Baanumathy

  • May 10th, 2006
 

In addition to the above mentioned cases we can also include some more like:

1. If sort component is used and the sort keys are same for the next sort component which follows after 2 or 3 components, then instead of using sort component again it is preferable to use Sort within Groups component mentioning thise keys as major keys and other keys as minor keys. in this case it assumes that major keys are already sorted and it need sto sort only on minor keys. 

          eg: sort-1 component uses keys a,b,c and 2nd sort component after 2 - 3 components (in the same flow) uses keys a,b,e,f. in that   case use sort within groups in the 2nd case keeping a,b as major keys and e,f as minor keys.

2. when splitting records into more than two flows prefer Reformat rather than Broadcast component.

3. For joining records from 2 flows use Concatenate component only when there is a need to follow some specific order in joining records. If no order is required then it is preferable to use Gather component.

4. Instead of too many Reformat component consecutively one after the other use output indexes parameter in the first Reformat component and mention the condition there. For detailed information on this concept refer Help.
 

I would like to add couple of more points to the already given explanations:

1. Try to use lookups for joining instead of join if you have huge no. of records in one flow and relatively less no. of records in the other flow.
2. Try to be less reliant on database. Try to perform sort, join and other functionalities using Abinitio components rather than implementing them using SQL in Input Table.

  Was this answer useful?  Yes

arunsin

  • Mar 13th, 2011
 

Adding to all the point mentioned by others -

- Thumb Rule is to prefer componet solution to a problem compared to logical solution.
- Use Oracle hints and Ab Initio Hints (ABLOCAL utility for lower oracle versions than 9i) in database components.
- Ability to judge the usage of lookup or join component based on the volume of data to be joined.
- Managing skew by having equal data partitioning in case of MFS usage (depends on the requirement).
- Component Folding Usage.

  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.

 

Related Answered Questions

 

Related Open Questions