I have run several experiments (English to Chinese) to compare the CMERT and MERT with the same Moses and on the same computer:
Experiment 1, 20 features (2 phrase tables, and 2 language models):
CMERT: MERT=0.304357; TEST=0.3801
MERT: MERT=0.310064; TEST=0.3933
Experiment 2, 15 features (1 phrase table, and 2 language models):
CMERT: MERT=0.297084; TEST=0.3541
MERT: MERT=0.302378; TEST=0.3544
Experiment 3, 16 features (1 phrase table, 2 language models, and lattice input):
CMERT: MERT=0.305125; TEST=0.3668
MERT: MERT=0.302637; TEST=0.3754
Based on my experiments, MERT works better than CMERT.