Proposal for Data.Graph : Improve graph creation complexity when nodes have "consecutive" keys

Discussion:

Olivier S.

2018-04-02 11:30:38 UTC

Hello,

I'm resending this proposal, which is simplified w.r.t the first one, and
where I removed a wrong analysis of a benchmark.

- Proposal I : Optimize the time complexity of (key -> Maybe Vertex)
lookups and graph creation when keys are Integral and consecutive.

(The related PR for proposal I is [1], including benchmarks showing the
performance improvements.)

Currently, (key -> Maybe Vertex) lookups returned by graphFromEdges consist
of a binary search on an array, with a time complexity of O(log V) (I will
use V for "Count of vertices", E for "Count of edges").

When key is Integral, and keys (of nodes passed to the graph creation
function) form a set of /consecutive/ values (for example : [4,5,6,7] or
[5,6,4,7]), we can have an O(1) lookup by substracting the value of the
smallest key, and checking for bounds.

Hence, graph creation complexity is improved, and user algorithms using
(key -> Maybe Vertex) lookups will see their complexity reduced by a factor
of up-to O(log V).

The PR introduces this lookup and uses it for functions
graphFromEdgesWithConsecutiveKeys and graphFromEdgesWithConsecutiveAscKeys.

Here is a summary of complexities for (graph creation, lookup function) as
they stand in the current state of the PR:

- graphFromEdges (the currently existing function):
O( (V+E) * log V ), O(log V)
- graphFromEdgesWithConsecutiveKeys (new function):
O( E + (V*log V) ), O(1)
- graphFromEdgesWithConsecutiveAscKeys (new function) :
O( V+E ), O(1)

- Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])] in
favor of `graphFromMap` taking (Map key (node,[key]))

If we pass the same key twice in the list we pass to 'graphFromEdges' it is
undefined which node for that key will actually be used.
Introducing 'graphFromMap', taking a (Map key (node,[key]) would alleviate
this issue, through the type used.

Also, using a Map makes the implementation a bit more "natural" : there is
no need for sorting by key, as Map.toAscList gives exactly the sorted list
we want.

We could also deprecate graphFromEdgesWithConsecutiveKeys and
graphFromEdgesWithConsecutiveAscKeys (introduced in proposal I) in favor of
graphFromConsecutiveMap.

About the naming, I propose two different schemes:

Either:
- graphFromEdges (takes a List, deprecated, existing
function)
- graphFromEdgesInMap (takes a Map)
- graphFromEdgesInConsecutiveMap (takes a Map with consecutive keys)
Or:
- graphFromEdges (takes a List, deprecated, existing
function)
- graphFromMap
- graphFromConsecutiveMap
with these, to reflect the Map / List duality in the naming scheme:
- graphFromList (takes a List, deprecated, redirects to
graphFromEdges)
- graphFromConsecutiveList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveKeys)
- graphFromConsecutiveAscList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveAscKeys)

Cheers,
Olivier Sohn

[1] https://github.com/haskell/containers/pull/549

Ivan Lazar Miljenovic

2018-04-02 12:34:56 UTC

Permalink

At the risk of bikeshedding, can you please use |V| and |E| to refer
to the order and size of the graph respectively?

Post by Olivier S.
When key is Integral, and keys (of nodes passed to the graph creation
function) form a set of /consecutive/ values (for example : [4,5,6,7] or
[5,6,4,7]), we can have an O(1) lookup by substracting the value of the
smallest key, and checking for bounds.

I'm not sure I follow this part; are you ignoring order in these lists
(you're referring to sets but using list notation)?

Post by Olivier S.
Hence, graph creation complexity is improved, and user algorithms using (key
-> Maybe Vertex) lookups will see their complexity reduced by a factor of
up-to O(log V).
The PR introduces this lookup and uses it for functions
graphFromEdgesWithConsecutiveKeys and graphFromEdgesWithConsecutiveAscKeys.
Here is a summary of complexities for (graph creation, lookup function) as
O( (V+E) * log V ), O(log V)
O( E + (V*log V) ), O(1)
O( V+E ), O(1)
- Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])] in
favor of `graphFromMap` taking (Map key (node,[key]))
If we pass the same key twice in the list we pass to 'graphFromEdges' it is
undefined which node for that key will actually be used.
Introducing 'graphFromMap', taking a (Map key (node,[key]) would alleviate
this issue, through the type used.

Off the top of my head, I'm not a big fan of this. If we're going to
improve this, then I'd prefer to do so in such a way that allowed for
usage with IntMap (is there an existing type-class that covers
association list-style data structures?). Ideally you could also use
HashMap from unordered-containers as well, but since we ultimately
want `type Vertex = Int` I'm not sure if that's worth it; IntMap,
however, is.

Post by Olivier S.
Also, using a Map makes the implementation a bit more "natural" : there is
no need for sorting by key, as Map.toAscList gives exactly the sorted list
we want.
We could also deprecate graphFromEdgesWithConsecutiveKeys and
graphFromEdgesWithConsecutiveAscKeys (introduced in proposal I) in favor of
graphFromConsecutiveMap.
- graphFromEdges (takes a List, deprecated, existing
function)
- graphFromEdgesInMap (takes a Map)
- graphFromEdgesInConsecutiveMap (takes a Map with consecutive keys)
- graphFromEdges (takes a List, deprecated, existing
function)
- graphFromMap
- graphFromConsecutiveMap
- graphFromList (takes a List, deprecated, redirects to
graphFromEdges)
- graphFromConsecutiveList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveKeys)
- graphFromConsecutiveAscList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveAscKeys)
Cheers,
Olivier Sohn
[1] https://github.com/haskell/containers/pull/549
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

--
Ivan Lazar Miljenovic
***@gmail.com
http://IvanMiljenovic.wordpress.com

Ivan Lazar Miljenovic

2018-04-02 22:18:32 UTC

Permalink

2018-04-02 14:34 GMT+02:00 Ivan Lazar Miljenovic