For data parallelism, we can use partition components. For component parallelism, we can use replicate component. Like this which component(s) can we use for pipeline parallelism?

Questions by prabhupurna   answers by prabhupurna

Showing Answers 1 - 7 of 7 Answers

Abhisek Basu Mullick

  • Jul 19th, 2006
 

When connected sequence of components of the same branch of graph execute concurrently is called pipeline parallelism.

Componets like reformat where we distribute input flow to multiple o/p flow using output index depending on some selection criteria and process those o/p flows simultaneosly creates pipeline parallelism.

But components like sort where entire i/p must be read befor a single record is writen to o/p can not achieve pipeline parallelism.

  Was this answer useful?  Yes

mukund

  • Dec 22nd, 2006
 

guys,

this was a very good question

before learning abinitio or building any graph u need to know the concepts of parallisms they are 1)data parallism 2)pipeline parallism 3)component parallism

generally pipeline parallism is a concept of processing of  data by different components

let me give u flow:

input file ------>reformat----->rollup------>filter by expression----->o/p file

                       50th record            25 records                10 records

 clearly speaking when ever u run any graph we observe the number of records processed on flows ,this is best example for pipeline parallism

hope this might suffices u

mukund

  Was this answer useful?  Yes

Vamsi

  • Feb 18th, 2016
 

Filter_By_expression is the component which supports Pipeline Parallelism.

  Was this answer useful?  Yes

sudarshan

  • Mar 2nd, 2016
 

You can use components that does not require any sorted data (explicit or in memory sort) to get pipeline parallelism. Components that needed sorted data like join, roll-up, merge, sort, partition by key and sort breaks the pipeline parallelism.

  Was this answer useful?  Yes

mohankrishna

  • Apr 8th, 2016
 

Any component in the flow having no SORT component will do the pipeline parallelism. It Abintio Architecture that does it. For example there is 10 records in a file and u need to reformat it and filter the flow. At first record reformat picks up the first record of file and does transformation and feed it to filter, while filter is applying it specified condition the reformat picks up the second record from input file and transforms it keeps it ready to feed to filter.
Any how if you are using component folding the pipeline parallelism wont work. Folding eliminates this concept.

  Was this answer useful?  Yes

Gouse

  • May 26th, 2016
 

The component without sorted input because sort component breaks pipeline parallelism.

Thanks
Gouse

  Was this answer useful?  Yes

Mahesh

  • May 26th, 2021
 

For data parallelism, we can use partition components.
For component parallelism - its when i/ file ---> rollup(mfs layout) --> output .
Here practically there are 4 copies of components running in parallel on 4 path server, parallely crunching away.
pipeline parallelism - every component is parallel by default till the time it has to store data to work on it.
e.g. sort, rollup, scan, dedup may inhibit pipeline parallelism . But imagine a sort component without key....

  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.

 

Related Answered Questions

 

Related Open Questions