How would you do performance tuning for already built graph ? Can you let me know some examples?

sekr
Profile Answers by sekr Questions by sekr

Apr 26th, 2006

example :- suppose sort is used in fornt of merge component its no use of using sort ! bcz we hv sort component built in merge.
2) we use lookup instead of JOIN,Merge Componenet.
3) suppose we wnt to join the data comming from 2 files and we dnt wnt dupliates we will use union funtion instead of adding addtional component for duplicate remover.
correct me if im wrong

sunny

Jul 5th, 2006

Hi cndraa(chandra i hope)
u have replied that merge has inbuilt sorting feature,which is wrong.it does not contain such feature.anyways if i am wrong please correct me.
Thanks.
Sunny

prakash

Aug 9th, 2006

hi sunny,
wat u said is right.

Sachin

Sep 4th, 2007

There are many ways to improve the performance of the graph. It also
depends on a particular graph, the components used in it.
In general the following tips can be used for improving performance:
1> Try to use partitioning in the graph
2> try minimizing the number of components
3> Maintain lookups for better efficiency
4> Components like join/ rollup should have the option ‘Input must be sorted’,
if they are placed after a sort component.
5> If component have In memory: Input need not be sorted option selected, use
the MAX_CORE parameter value efficiently.
6> Use phasing of a graph efficiently.
7> Ensure that all the graphs where RDBMS tables are used as input, the join
condition is on indexed columns.
8> Try to perform the sort or aggregation operation of data in the source
tables at the database server itself, instead of using it in AbInitio.

There are many ways the performance of the graph can be improved.
1) Use a limited number of components in a particular phase
2) Use optimum value of max core values for sort and join components
3) Minimize the number of sort components
4) Minimize sorted join component and if possible replace them by in-memory
join/hash join
5) Use only required fields in the sort, reformat, join components
6) Use phasing/flow buffers in case of merge, sorted joins
7) If the two inputs are huge then use sorted join, otherwise use hash join
with proper driving port
8) For large dataset don't use broadcast as partitioner
9) Minimize the use of regular expression functions like re_index in the
transfer functions
10) Avoid repartitioning of data unnecessarily.

In addition to the above mentioned cases we can also include some more
like:
1. If sort component is used and the sort keys are same for the next sort
component which follows after 2 or 3 components, then instead of using sort
component again it is preferable to use Sort within Groups component
mentioning these keys as major keys and other keys as minor keys. in this case
it assumes that major keys are already sorted and it needs to sort only on
minor keys.
eg: sort-1 component uses keys a,b,c and 2nd sort component after 2 - 3
components (in the same flow) uses keys a,b,e,f. in that case use sort within
groups in the 2nd case keeping a,b as major keys and e,f as minor keys.
2. when splitting records into more than two flows prefer Reformat rather than
Broadcast component.
3. For joining records from 2 flows use Concatenate component only when there
is a need to follow some specific order in joining records. If no order is
required then it is preferable to use Gather component.
4. Instead of too many Reformat component consecutively one after the other
use output indexes parameter in the first Reformat component and mention the
condition there. For detailed information on this concept refer Help.