Let us not leave the output to redundant execution, but let us parallel program it. Our objective is to obtain the same output result as from sequential execution, as shown below.

2 4 6 8 10 12 14 16 18 20

We declare an array a1, which is of the same type and shape as array a. Array a is distributed using the align directive, but note that a1 is not distributed, that is, all nodes contain all the elements of a1. If all the elements of distributed array a collected in a1 and an output statement is executed by only p(1), the output will be the same as for sequential execution.

!$xmp nodes p(2)
!$xmp template t(10)
!$xmp distribute t(block) onto p

integer a(10)
integer a1(10)
!$xmp align a(i) with t(i)

!$xmp loop on t(i)
do i=1,10
	a(i)=i*2
end do

!$xmp gmove
a1(:) = a(:)

!$xmp task on p(1)
write(*,*) a1
!$xmp end task
#pragma xmp nodes p(2)
#pragma xmp template t(0:9)
#pragma xmp distribute t(block) onto p
int a[10];
int a1[10];
#pragma xmp align a[i] with t(i)

#pragma xmp loop on t(i)
for(i=0;i<10;i++)
	a[i] = (i+1)*2;

#pragma xmp gmove
a1[:] = a[:]; #pragma xmp loop on t(i) for(i=0;i<10;i++) #pragma xmp task on p(1) { printf("%d ", a[i]); } }

The gmove construct is used to specify communication. The gmove construct consists of the gmove directive and a corresponding array assignment statement. This is adopted from the form of the array assignment statement in the Fortran 90 specifications. However, only the form (destination data) = (source data) is allowed (that is, it is limited to a simple assignment not involving arithmetic operations). In this example, we represent all the elements of distributed array a being communicated to the duplicate array a1. A schematic diagram of the communication pattern (the element numbers are for the Fortran version) is shown below.


The gmove construct specification is simple, involving only a directive added to a simple array assignment statement. However, depending on how the variables on the left and right sides are distributed, a variety of communication patterns can be automatically generated.

Note that XcalableMP (XMP) requires explicit specification of all internode communication. The XMP compiler is capable of generating the appropriate communication from simple program descriptions. However, it will not go ahead and generate communication without these descriptions. The reason for opting for this kind of specification was to avoid difficulties in performance tuning of the program caused by an unexpected drop in performance due to communication occurring in a place the user does not anticipate. Communication will occur only in user-intended (specified) locations, while in unspecified locations, it will be performed by all execution nodes as a redundant (same) execution.