First, we must prepare the sequential program. A program that is shortened to only the relevant parts is shown below. Examples in both Fortran and C are presented. However, note that the iterations in each of the languages have been slightly modified to make them easier to express.
integer a(10) do i=1, 10 a(i)=i*2 end do write(*,*) a
int a; for(i=0;i<10;i++) a[i] = (i+1)*2; for(i=0;i<10;i++) printf("%d ", a[i]);
The program can be compiled as-is with an XcalableMP (XMP) compiler and then run. When the program is run on one node, the results, which are given below, are the same as for an ordinary original program.
2 4 6 8 10 12 14 16 18 20
What if the program is executed on multiple nodes? The following are the results when the program is executed on two nodes.
2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
With XMP, unless parallelization directives are given, all the nodes will run the same program. This is referred to as redundant execution. To change redundant execution into parallel execution, we (1) distribute the data (data mapping), (2) distribute the computing load (work mapping); and (3) describe the required communication and synchronization.
Thus, let us parallelize this program.
*1 The result depends on a system that simultaneously outputs from the two processes to the standard output. Depending on the system, the two outputs might be intermixed, or only the output from one process might be available