Back to basics: Algorithms - finding the max pairwise product
Once you've acquired some experience using your favorite programming language, doing somme cool apps, even if they are complicated, I think it's always good to go back to the roots and start implementing some algorithms. Why is that? Algorithms have some interesting properties that will require to look at the problem at hand from a different point of view. They will force you to use your language in a different way because you'll need to fulfill those properties. So what are the properties I'm talking about?
It depends, on the problem at hand, on the algorithm, on the data structure. But generally speaking it's about time and space complexity expressed as Big-O. So sometimes you'll give up on nice recursive structure, because the iterative one is faster. Another time you'll use a mutable array instead of immutable list and so on and so forth.
The goal is to start with a very simple problems and learn how it can be solved with F#. Let's try with the first problem.
Problem: Maximum Pairwise Product
Given a sequence of non-negative integers \(a_0, \dots, a_{n-1}\), find the maximum pairwise product, that is, the largest integer that can be obtained by multiplying two different elements from the sequence.
Input Constraints
The number of elements \(n\) of the input is \(2 \le n \le 2\cdot10^5\) and the range of elements spread from \(0 \le a_0, \dots, a_{n-1} \le 10^5\)
Output
Single number - the maximum pairwise product
Samples
Input 1:
1:
|
|
Output 1:
1:
|
|
Input 2:
1:
|
|
Output 2:
1:
|
|
Input 3:
1:
|
|
Output 3:
1:
|
|
Hint: Watch out the Integer overflow when multiplying two numbers together.
This is a very simple problem so many of us would go stright to the right solution but in many more complicated cases it is worth to consider writing brute force algorithm that is sure to work correctly and to give us the right answer. This algorithm will also help us in the debug process where the results of different implementations can be compared.
Brut Force
What the Brute Force would be for this problem? The easiest thing would be to solve it iteratively with two nested loops. The outer loop would go from \(i = 0\) to \(n - 1\) while the inner loop would go from \(i + 1\) to \(n - 1\). This way we can compare the product of every pair of numbers to the last highest product and if its larger, we set the last highest product as the current product, if not we continue until the end of the list. The implementation would look something along these lines:
1: 2: 3: 4: 5: 6: 7: 8: 9: |
|
This is not at all idiomatic to the functional approach but at least it works. The results are as expected for Input 1, 2 and 3 respectively:
1: 2: 3: |
|
Unfortunatelly the time complexity is \(\theta(n^2)\) which is not great at all. But we can do better.
Recursive approach
Instead of using nested loops we may want to test another approach. We can iterate recursively throught the list, taking a maximum number and append it to the empty list that will contain only two maximum elements. Once the maximum number is found in the list, we have to remove it from the list and to iterate again on it until we find another maximum number that we will append to the list containing the first maximum number. The code looks like that:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: |
|
At line 17 we pass to twoMaxiter
function as first parameter, an empty list which is supposed to contain two maximum numbers at the end, and as the second parameter, the input list of all numbers. Then we match on the empty list o
. If the list l'
is emtpty then we return the list o
. If the o
list has two elements, then we are also done, we have our two maximum numbers. Otherwise we take the maximum number from the list l'
we remove it with removeFirst
function (filtering would not work because if the list has two or more equals max numbers we want just to remove the first one) and then we recurse another time appending the max number to the result list o
and passing also the rest of the list l'
. At the end we compute the product of two maximum found numbers.
The results are as expected for Input 1, 2 and 3 respectively:
1: 2: 3: |
|
The time complexity is much better then the brute force algorithm. We have n
operations to find the first max, then we have n
operations to remove the max found number from the end of the list, the next iteration finding the next one max number is n - 1
operation and removing it from the rest of the list is also n - 1
. Roughly speaking it is \(\theta(4n)\). We could avoid removing the second time the maximum number from the rest of the list as we already have our two max numbers but this would be \(\theta(3n)\). Dropping off the constant we can consider it as \(\theta(n)\)
But this problem can be easily solved using just one iteration ever the list of numbers.
Folding the list
A nice approach to this problem would be using the List.fold
function, passing in the empty list state
and the input list containing all the elements. The fold operation moves through all the elements, if the state length is less then 2 we append the current element to the state, otherwise if the state has an element less then the current element we keep the max element in the state and append the current element. At the end we return the state which will hold two maximum elements. At the end we compute the product of two maximum found numbers. The algorithm looks like the following:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: |
|
The results are as expected for Input 1, 2 and 3 respectively:
1: 2: 3: |
|
Estimating the runing time is easy. We iterate just once over the list of n
elements. So it's \(\theta(n)\). The operations on the state holding at most two elements are insignificant.
Note: We could also take the same approach to solve the problem in the recursive algorithm
There is no difference in execution time when the input list are small as in the samples above. But the difference can be significant on very large inputs. Let's consider this
Stress testing and mesuring time on large inputs
Let's consider the constraints. The number of elements n
is between the following range \(2 \le n \le 2\cdot10^5\).
Generating the input is very easy:
1: 2: 3: 4: 5: |
|
This will produce a list of 200000 elements.
The first algorithm to test is the brute force approach
1: 2: 3: 4: |
|
And the result is:
1: 2: 3: 4: 5: 6: |
|
It took more than 4 minutes to compute the result on my computer. Even if I'm doing an extra step of converting the list to an array, this doesn't have much of the influence on the running time.
Let's look at the recursive approach:
1: 2: 3: 4: |
|
And the result is:
1: 2: 3: 4: 5: 6: |
|
As you see, this is much better then the brut foce approach. We improved the runing time from over 4 minuts to 4 seconds. Let's see at the implementation using fold:
1: 2: 3: 4: |
|
And the result is:
1: 2: 3: 4: 5: 6: |
|
So this is really impressive. We improved the running time even better then the previous run going from over 4 seconds to 52 milliseconds. Even if asymptotically the running time of the recursive approach and the one using fold was \(\theta(n)\), the real running time difference is huge.
Conclusion
This simple problem showed how different algorithm implementations can greatly improve/decrease the overal running time. The important points to consider are:
- brute force approach is always a good step to take as a base line for measuring time and correctness of the algorithm. It's often simpler to get it right and unless the runing time is reasonable (not taking hours or even more to run) it is very usefull.
- The asympthotic runing times are not always equal in practice. We saw that the recursive algorithm and the fold have similar runing time expressed as \(\theta(n)\) but in practice there is a huge difference.
- always stress test your implementation under the muximum input you think your algorithm may run under.
Full name: backtobasicsalgorithmsfindingthemaxpairwiseproduct.getMaxPariwiseProduct
val int : value:'T -> int (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.int
--------------------
type int = int32
Full name: Microsoft.FSharp.Core.int
--------------------
type int<'Measure> = int
Full name: Microsoft.FSharp.Core.int<_>
Full name: Microsoft.FSharp.Core.array<_>
val int64 : value:'T -> int64 (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.int64
--------------------
type int64 = System.Int64
Full name: Microsoft.FSharp.Core.int64
--------------------
type int64<'Measure> = int64
Full name: Microsoft.FSharp.Core.int64<_>
Full name: backtobasicsalgorithmsfindingthemaxpairwiseproduct.removeFirst
Full name: backtobasicsalgorithmsfindingthemaxpairwiseproduct.findTwoMax
module List
from Microsoft.FSharp.Collections
--------------------
type List<'T> =
| ( [] )
| ( :: ) of Head: 'T * Tail: 'T list
interface IEnumerable
interface IEnumerable<'T>
member GetSlice : startIndex:int option * endIndex:int option -> 'T list
member Head : 'T
member IsEmpty : bool
member Item : index:int -> 'T with get
member Length : int
member Tail : 'T list
static member Cons : head:'T * tail:'T list -> 'T list
static member Empty : 'T list
Full name: Microsoft.FSharp.Collections.List<_>
Full name: Microsoft.FSharp.Collections.List.isEmpty
Full name: Microsoft.FSharp.Collections.List.max
Full name: Microsoft.FSharp.Collections.List.map
Full name: Microsoft.FSharp.Collections.List.reduce
Full name: backtobasicsalgorithmsfindingthemaxpairwiseproduct.findTwoMaxFold
Full name: Microsoft.FSharp.Collections.List.fold
Full name: Microsoft.FSharp.Collections.List.length
Full name: Microsoft.FSharp.Collections.List.exists
Full name: backtobasicsalgorithmsfindingthemaxpairwiseproduct.r
type Random =
new : unit -> Random + 1 overload
member Next : unit -> int + 2 overloads
member NextBytes : buffer:byte[] -> unit
member NextDouble : unit -> float
Full name: System.Random
--------------------
Random() : unit
Random(Seed: int) : unit
Full name: backtobasicsalgorithmsfindingthemaxpairwiseproduct.n
Full name: backtobasicsalgorithmsfindingthemaxpairwiseproduct.l
Random.Next(maxValue: int) : int
Random.Next(minValue: int, maxValue: int) : int
Full name: Microsoft.FSharp.Core.Operators.pown
member Clone : unit -> obj
member CopyTo : array:Array * index:int -> unit + 1 overload
member GetEnumerator : unit -> IEnumerator
member GetLength : dimension:int -> int
member GetLongLength : dimension:int -> int64
member GetLowerBound : dimension:int -> int
member GetUpperBound : dimension:int -> int
member GetValue : [<ParamArray>] indices:int[] -> obj + 7 overloads
member Initialize : unit -> unit
member IsFixedSize : bool
...
Full name: System.Array
Full name: Microsoft.FSharp.Collections.Array.ofList
static member AddMemoryPressure : bytesAllocated:int64 -> unit
static member CancelFullGCNotification : unit -> unit
static member Collect : unit -> unit + 2 overloads
static member CollectionCount : generation:int -> int
static member GetGeneration : obj:obj -> int + 1 overload
static member GetTotalMemory : forceFullCollection:bool -> int64
static member KeepAlive : obj:obj -> unit
static member MaxGeneration : int
static member ReRegisterForFinalize : obj:obj -> unit
static member RegisterForFullGCNotification : maxGenerationThreshold:int * largeObjectHeapThreshold:int -> unit
...
Full name: System.GC
val int64 : value:'T -> int64 (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.int64
--------------------
type int64 = Int64
Full name: Microsoft.FSharp.Core.int64
--------------------
type int64<'Measure> = int64
Full name: Microsoft.FSharp.Core.int64<_>