The Greedy Path
The 'greedy'
approach provides a very efficient strategy for finding
contraction paths for expressions with large numbers of tensors.
It does this by eagerly choosing contractions in three stages:
- Eagerly compute any Hadamard products (in arbitrary order -- this is commutative).
- Greedily contract pairs of remaining tensors, at each step choosing the
pair that maximizes
reduced_size
-- these are generally inner products. - Greedily compute any pairwise outer products, at each step choosing
the pair that minimizes
sum(input_sizes)
.
The cost heuristic reduced_size
is simply the size of the pair of potential
tensors to be contracted, minus the size of the resulting tensor.
The greedy
algorithm has space and time complexity O(n * k)
where n
is the number of input tensors and k
is the maximum number of tensors that
share any dimension (excluding dimensions that occur in the output or in every
tensor). As such, the algorithm scales well to very large sparse contractions
of low-rank tensors, and indeed, often finds the optimal, or close to optimal
path in such cases.
The greedy
functionality is provided by opt_einsum.paths.greedy
,
and is selected by the default optimize='auto'
mode of opt_einsum
for
expressions with many inputs. Expressions of up to a thousand tensors
should still take well less than a second to find paths for.
Optimal Scaling Misses
The greedy algorithm, while inexpensive, can occasionally miss optimal scaling in some circumstances as seen below. The greedy
algorithm prioritizes expressions which remove the largest indices first, in this particular case this is the incorrect choice and it is difficult for any heuristic algorithm to "see ahead" as would be needed here.
It should be stressed these cases are quite rare and by default contract
uses the optimal
path for four and fewer inputs as the cost of evaluating the optimal
path is similar to that of the greedy
path. Similarly, for 5-8 inputs, contract
uses one of the
branching strategies which can find higher quality paths.
M = np.random.rand(35, 37, 59)
A = np.random.rand(35, 51, 59)
B = np.random.rand(37, 51, 51, 59)
C = np.random.rand(59, 27)
path, desc = oe.contract_path('xyf,xtf,ytpf,fr->tpr', M, A, B, C, optimize="greedy")
print(desc)
#> Complete contraction: xyf,xtf,ytpf,fr->tpr
#> Naive scaling: 6
#> Optimized scaling: 5
#> Naive FLOP count: 2.146e+10
#> Optimized FLOP count: 4.165e+08
#> Theoretical speedup: 51.533
#> Largest intermediate: 5.371e+06 elements
#> --------------------------------------------------------------------------------
#> scaling BLAS current remaining
#> --------------------------------------------------------------------------------
#> 5 False ytpf,xyf->tpfx xtf,fr,tpfx->tpr
#> 4 False tpfx,xtf->tpf fr,tpf->tpr
#> 4 GEMM tpf,fr->tpr tpr->tpr
path, desc = oe.contract_path('xyf,xtf,ytpf,fr->tpr', M, A, B, C, optimize="optimal")
print(desc)
#> Complete contraction: xyf,xtf,ytpf,fr->tpr
#> Naive scaling: 6
#> Optimized scaling: 4
#> Naive FLOP count: 2.146e+10
#> Optimized FLOP count: 2.744e+07
#> Theoretical speedup: 782.283
#> Largest intermediate: 1.535e+05 elements
#> --------------------------------------------------------------------------------
#> scaling BLAS current remaining
#> --------------------------------------------------------------------------------
#> 4 False xtf,xyf->tfy ytpf,fr,tfy->tpr
#> 4 False tfy,ytpf->tfp fr,tfp->tpr
#> 4 TDOT tfp,fr->tpr tpr->tpr
So we can see that the greedy
algorithm finds a path which is about 16
times slower than the optimal
one. In such cases, it might be worth using
one of the more exhaustive optimization strategies: 'optimal'
,
'branch-all'
or branch-2
(all of which will find the optimal path in
this example).
Customizing the Greedy Path
The greedy path is a local optimizer in that it only ever assesses pairs of
tensors to contract, assigning each a heuristic 'cost' and then choosing the
'best' of these. Custom greedy approaches can be implemented by supplying
callables to the cost_fn
and choose_fn
arguments of
opt_einsum.paths.greedy
.