hadoop - Pig Multi-Query Optimization issue -
we running issues on pig's multiquery optimizer not work expected.
as understood, below script should run 1 mr job, runs 2 jobs on our cluster. think multiquery optimization should on default, missing here? if replace group by "filter" statement works 1 single mr job.
data = load 'input' (a:chararray, b:int, c:int); = group data b; b = group data c; store 'output1'; store b 'output2';
i'm using cdh packed pig 0.1.0 , hadoop 2.0.0.
if 0.1.0 real version of pig installation - it's old. latest version 0.11.1.
page performance 0.11.1 docs: http://pig.apache.org/docs/r0.11.1/perf.html
Comments
Post a Comment