This page explains the parallelization of loops using the loop directive.

  • Using the Loop Directive
  • Distributed Parallelism
  • Duplicate Variables

Using the Loop Directive

The loop directive is a line that forces parallelization of the loop it is used on, based on the instruction. Accordingly, the following two conditions must be satisfied when using the loop directive:

  1. The target loop is not dependent on data shared across iterations or on control, that is, loops whose iteration produces the correct results irrespective of the order in which they are executed.
  2. For a distributed array, the node that accesses the array elements (user) and the node that has the array elements (owner) must match.

Let us examine in detail the distributed array accessed within the loop and the duplicate variables.

Distributed Parallelism

The following figure gives an example of a loop directive. The variables accessed within the loop, a and b, only have the i subscript, and therefore satisfy condition 1. To check for condition 2, we compare the locations of the owner and user of the data in the template. With regard to the loop index i,

  • the user of data a(i) is the distribution destination of t(i) (connected by the solid red line)
  • the owner of data a(i) is also the distribution destination of t(i) (connected by the dotted blue line).

Therefore, condition 2 is satisfied. The same is true for data b(i).


For the same program, if the loop is not written as

do i=1,10

but is

do i=2, 9

how can it be parallelized? The range of data used is different, but the user of the data has not changed, so the same loop directive can be used.


Now, what if the subscripts for variables a and b in the loop were changed from i to i+1 In this case, if we were to use the same expression in the on clause of the loop directive, with t(i+1), we would be able to parallelize with the loop directive.

Bear in mind that, in principle, the expression in the on clause of the loop directive should match the form of the subscripts of the loop. Doing this, the user of a(i) can be t(i). If there is no shift in the align directive (in the form of (i) with t(i)), the owner of a(i) is t(i), which satisfies condition 2.


What do we do if the subscript appearing in the loop does not take a single form? If the same variable is accessed with multiple subscripts, such as a(i) and a(i-1), condition 2 is not satisfied. In general, we need to perform communication before and after the parallel loop, but with clever use of data distribution, the loop directive can sometimes be used with only a small change to the program.

Duplicate Variables

Scalar variables (nonarray variables) and arrays that have not been declared using the align directive are duplicate variables. Duplicate variables are assigned to all execution nodes and always satisfy parallelization condition 2, irrespective of the way the on clause of the loop directive is written.

Duplicate variables are typically used in loops in the following three ways: (a) they are simply used as undefined variables, and their values do not change even after the loop is executed; and (b) they are variables used only within each iteration of the loop, and the variables are not referenced after the loop is executed. The variables described in (a) and (b) can be used inside parallel loops designated with a loop directive. In contrast, under pattern (c), which is referred to as reduction, the reduction variables are dependent on the data that is shared across iterations. Because of this, the loop directive we have described up to now cannot be used alone to parallelize the loop. For details on reduction, see this page.

  • (a) v and k are only used as variables
v=...
k=...
do i=...
	a(i)=b(i+k)*v
	 ...
end do
v=...
k=...
for(i=... ){
	a[i] = b[i+k]*v;
	 ...
}
  • (b) Temporary variable tmp
do i=...
...
	tmp=a(i)+...
	b(i)=tmp
end do
for(i=...  ){
...
	tmp=a[i]+...
	b[i]=tmp;
}
  • (c) Reduction variable s
s=0.0
do i=...
	...
	s=s+a(i)
end do
s=0.0
for(i=...  ){
	...
	s=s+a[i];
}