hadoop - Pig Multi-Query Optimization issue -


we running issues on pig's multiquery optimizer not work expected.

as understood, below script should run 1 mr job, runs 2 jobs on our cluster. think multiquery optimization should on default, missing here? if replace group by "filter" statement works 1 single mr job.

data = load 'input' (a:chararray, b:int, c:int); = group data b; b = group data c; store 'output1'; store b 'output2'; 

i'm using cdh packed pig 0.1.0 , hadoop 2.0.0.

if 0.1.0 real version of pig installation - it's old. latest version 0.11.1.

page performance 0.11.1 docs: http://pig.apache.org/docs/r0.11.1/perf.html


Comments

Popular posts from this blog

Line ending issue with Mercurial or Visual Studio -

python - Received unregistered task using Celery with Django -

tags - Jquery Mixitup plugin help prevent handlers being destroyed -