For Loop By Index Vs For Loop Through Collection Performance

A user asked:

What is the difference between using

for i = 1 to theArray.count do( ... )

and

for i in theArray do( ... )

Are they the same? Is one faster than the other ? Anything else to be aware of?

Answer:

Both can be used to do the same, but have different strengths and performance characteristics.

The first version gives you an index. So, if you want to drive a progress bar display, or use the index to access another array with the same amount of elements, it is usually the one to use - you can still get the i-th object, but you also have the number of objects that you have processed so far for free.

FOR EXAMPLE:

   theArray = for i = 1 to 10  collect i
   (
   for i = 1 to theArray.count do
   (
     --assuming prg_bar is a progressbar UI control:
     --prg_bar.value = 100.0*i/theArray.count
     --some more code here...
   )
   )

In the second case, you have to add your own counter local variable to keep track of the current indexes:

FOR EXAMPLE:

   (
   cnt = 0
   for i in theArray do
   (
     cnt+=1
     --prg_bar.value = 100.0*cnt/theArray.count
     --some more code...
   )
   )

Also, the first form allows you to loop backwards, which is essential if you are removing elements from the array while processing it. The following example creates an array of 1000 random integers and then in the second for loop it deletes any elements with a value less than 50:

FOR EXAMPLE:

   (
   theArray =for i = 1 to 1000 collect random 1 100
   for i = theArray.count to 1 by -1 where theArray[i] < 50 do
     deleteItem theArray i
   )

When it comes to speed though, the direct loop through the array elements is faster. Let us create an array of 10000 elements and loop through it 1000 times to get a significant time value to measure. The inner loop itself does not perform any actual work, only loops through the array and accesses the i-th element.

LOOP BY INDEX:

   (
   theArray = for i = 1 to 10000 collect random 1 100
   st = timestamp()
   for j = 1 to 1000 do
     for i = 1 to theArray.count do theArray[i]
   format "% ms\n" (timestamp()-st)
   )

This code executes on a particular machine in 5.6 seconds (+/- 0.1 seconds).

Removing the actual access to the array element by index (theArray[i]) reduces this time to 4.6 seconds. For example, if you were running from 1 to the array count, but not using the array elements for anything, which is highly improbable.

IN COMPARISON, LOOPING THROUGH THE COLLECTION

   (
   theArray =for i = 1 to 10000 collect random 1 100
   st = timestamp()
   for j = 1 to 1000 do
     for i in theArray do i
       format "% ms\n" (timestamp()-st)
   )

This executes on the same machine in 4.3 seconds (+/- 0.1 sec.), which is 1.3 times faster. Because the variable i already contains the n-th element, it does not matter whether we actually use the array element value for anything or not.

Now, if you do need the counter for something else like progress updates, the first approach is almost four times faster than the second.

SLOW

   (
   theArray =for i = 1 to 10000 collect random 1 100
   st = timestamp()
   for j = 1 to 1000 do
   (
     cnt = 0
     for i in theArray do cnt +=1
   )
   format "% ms\n" (timestamp()-st) 
   )

The above code executes in 14.3 seconds on the same machine because of the memory overhead involved in overwriting the value of the variable cnt 10 million times.

In other words, if the counter variable is significant for your code, you can use the form for i = 1 to theArray.count do(), otherwise use the direct loop through the collection as it is slightly faster in the general case.

Previous Tip

Try not to use Execute function if there is an alternative