Discussion:
Proposal for Data.Graph : Improve graph creation complexity when nodes have "consecutive" keys
Olivier S.
2018-04-02 11:30:38 UTC
Permalink
Hello,

I'm resending this proposal, which is simplified w.r.t the first one, and
where I removed a wrong analysis of a benchmark.

- Proposal I : Optimize the time complexity of (key -> Maybe Vertex)
lookups and graph creation when keys are Integral and consecutive.

(The related PR for proposal I is [1], including benchmarks showing the
performance improvements.)

Currently, (key -> Maybe Vertex) lookups returned by graphFromEdges consist
of a binary search on an array, with a time complexity of O(log V) (I will
use V for "Count of vertices", E for "Count of edges").

When key is Integral, and keys (of nodes passed to the graph creation
function) form a set of /consecutive/ values (for example : [4,5,6,7] or
[5,6,4,7]), we can have an O(1) lookup by substracting the value of the
smallest key, and checking for bounds.

Hence, graph creation complexity is improved, and user algorithms using
(key -> Maybe Vertex) lookups will see their complexity reduced by a factor
of up-to O(log V).

The PR introduces this lookup and uses it for functions
graphFromEdgesWithConsecutiveKeys and graphFromEdgesWithConsecutiveAscKeys.

Here is a summary of complexities for (graph creation, lookup function) as
they stand in the current state of the PR:

- graphFromEdges (the currently existing function):
O( (V+E) * log V ), O(log V)
- graphFromEdgesWithConsecutiveKeys (new function):
O( E + (V*log V) ), O(1)
- graphFromEdgesWithConsecutiveAscKeys (new function) :
O( V+E ), O(1)

- Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])] in
favor of `graphFromMap` taking (Map key (node,[key]))

If we pass the same key twice in the list we pass to 'graphFromEdges' it is
undefined which node for that key will actually be used.
Introducing 'graphFromMap', taking a (Map key (node,[key]) would alleviate
this issue, through the type used.

Also, using a Map makes the implementation a bit more "natural" : there is
no need for sorting by key, as Map.toAscList gives exactly the sorted list
we want.

We could also deprecate graphFromEdgesWithConsecutiveKeys and
graphFromEdgesWithConsecutiveAscKeys (introduced in proposal I) in favor of
graphFromConsecutiveMap.

About the naming, I propose two different schemes:

Either:
- graphFromEdges (takes a List, deprecated, existing
function)
- graphFromEdgesInMap (takes a Map)
- graphFromEdgesInConsecutiveMap (takes a Map with consecutive keys)
Or:
- graphFromEdges (takes a List, deprecated, existing
function)
- graphFromMap
- graphFromConsecutiveMap
with these, to reflect the Map / List duality in the naming scheme:
- graphFromList (takes a List, deprecated, redirects to
graphFromEdges)
- graphFromConsecutiveList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveKeys)
- graphFromConsecutiveAscList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveAscKeys)

Cheers,
Olivier Sohn

[1] https://github.com/haskell/containers/pull/549
Ivan Lazar Miljenovic
2018-04-02 12:34:56 UTC
Permalink
Post by Olivier S.
Hello,
I'm resending this proposal, which is simplified w.r.t the first one, and
where I removed a wrong analysis of a benchmark.
- Proposal I : Optimize the time complexity of (key -> Maybe Vertex) lookups
and graph creation when keys are Integral and consecutive.
(The related PR for proposal I is [1], including benchmarks showing the
performance improvements.)
Currently, (key -> Maybe Vertex) lookups returned by graphFromEdges consist
of a binary search on an array, with a time complexity of O(log V) (I will
use V for "Count of vertices", E for "Count of edges").
At the risk of bikeshedding, can you please use |V| and |E| to refer
to the order and size of the graph respectively?
Post by Olivier S.
When key is Integral, and keys (of nodes passed to the graph creation
function) form a set of /consecutive/ values (for example : [4,5,6,7] or
[5,6,4,7]), we can have an O(1) lookup by substracting the value of the
smallest key, and checking for bounds.
I'm not sure I follow this part; are you ignoring order in these lists
(you're referring to sets but using list notation)?
Post by Olivier S.
Hence, graph creation complexity is improved, and user algorithms using (key
-> Maybe Vertex) lookups will see their complexity reduced by a factor of
up-to O(log V).
The PR introduces this lookup and uses it for functions
graphFromEdgesWithConsecutiveKeys and graphFromEdgesWithConsecutiveAscKeys.
Here is a summary of complexities for (graph creation, lookup function) as
O( (V+E) * log V ), O(log V)
O( E + (V*log V) ), O(1)
O( V+E ), O(1)
- Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])] in
favor of `graphFromMap` taking (Map key (node,[key]))
If we pass the same key twice in the list we pass to 'graphFromEdges' it is
undefined which node for that key will actually be used.
Introducing 'graphFromMap', taking a (Map key (node,[key]) would alleviate
this issue, through the type used.
Off the top of my head, I'm not a big fan of this. If we're going to
improve this, then I'd prefer to do so in such a way that allowed for
usage with IntMap (is there an existing type-class that covers
association list-style data structures?). Ideally you could also use
HashMap from unordered-containers as well, but since we ultimately
want `type Vertex = Int` I'm not sure if that's worth it; IntMap,
however, is.
Post by Olivier S.
Also, using a Map makes the implementation a bit more "natural" : there is
no need for sorting by key, as Map.toAscList gives exactly the sorted list
we want.
We could also deprecate graphFromEdgesWithConsecutiveKeys and
graphFromEdgesWithConsecutiveAscKeys (introduced in proposal I) in favor of
graphFromConsecutiveMap.
- graphFromEdges (takes a List, deprecated, existing
function)
- graphFromEdgesInMap (takes a Map)
- graphFromEdgesInConsecutiveMap (takes a Map with consecutive keys)
- graphFromEdges (takes a List, deprecated, existing
function)
- graphFromMap
- graphFromConsecutiveMap
- graphFromList (takes a List, deprecated, redirects to
graphFromEdges)
- graphFromConsecutiveList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveKeys)
- graphFromConsecutiveAscList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveAscKeys)
Cheers,
Olivier Sohn
[1] https://github.com/haskell/containers/pull/549
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
--
Ivan Lazar Miljenovic
***@gmail.com
http://IvanMiljenovic.wordpress.com
Ivan Lazar Miljenovic
2018-04-02 22:18:32 UTC
Permalink
2018-04-02 14:34 GMT+02:00 Ivan Lazar Miljenovic
Post by Ivan Lazar Miljenovic
Post by Olivier S.
Hello,
I'm resending this proposal, which is simplified w.r.t the first one, and
where I removed a wrong analysis of a benchmark.
- Proposal I : Optimize the time complexity of (key -> Maybe Vertex) lookups
and graph creation when keys are Integral and consecutive.
(The related PR for proposal I is [1], including benchmarks showing the
performance improvements.)
Currently, (key -> Maybe Vertex) lookups returned by graphFromEdges consist
of a binary search on an array, with a time complexity of O(log V) (I will
use V for "Count of vertices", E for "Count of edges").
At the risk of bikeshedding, can you please use |V| and |E| to refer
to the order and size of the graph respectively?
Do you mean in the haddock documentation for complexities or here? I don't
know which is mor readable, O( (V+E) * log V ) or O( (|V|+|E|) * log |V| ).
Anyway it would be a quick change in the PR, I'm not particularly attached
to the notation.
Both. |V| and |E| are more standard for this, as V and E represent
the vertices and edges themselves.
Post by Ivan Lazar Miljenovic
Post by Olivier S.
When key is Integral, and keys (of nodes passed to the graph creation
function) form a set of /consecutive/ values (for example : [4,5,6,7] or
[5,6,4,7]), we can have an O(1) lookup by substracting the value of the
smallest key, and checking for bounds.
I'm not sure I follow this part; are you ignoring order in these lists
(you're referring to sets but using list notation)?
-- (1) keys contains no duplicates
Set.size (Set.fromList keys) == length keys
sort keys == [minimum keys .. maximum keys]
The O(1) lookup is at line 516 of Data/Graph.hs in
https://github.com/haskell/containers/pull/549/files (key_vertex)
Post by Ivan Lazar Miljenovic
Post by Olivier S.
Hence, graph creation complexity is improved, and user algorithms using (key
-> Maybe Vertex) lookups will see their complexity reduced by a factor of
up-to O(log V).
The PR introduces this lookup and uses it for functions
graphFromEdgesWithConsecutiveKeys and
graphFromEdgesWithConsecutiveAscKeys.
Here is a summary of complexities for (graph creation, lookup function) as
O( (V+E) * log V ), O(log V)
O( E + (V*log V) ), O(1)
O( V+E ), O(1)
- Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])] in
favor of `graphFromMap` taking (Map key (node,[key]))
If we pass the same key twice in the list we pass to 'graphFromEdges' it is
undefined which node for that key will actually be used.
Introducing 'graphFromMap', taking a (Map key (node,[key]) would alleviate
this issue, through the type used.
Off the top of my head, I'm not a big fan of this. If we're going to
improve this, then I'd prefer to do so in such a way that allowed for
usage with IntMap
Yes, IntMap seems to be better wrt performances than Map. Quoting the doc of
This data structure performs especially well on binary operations like union
and intersection. However, my benchmarks show that it is also (much) faster
on insertions and deletions when compared to a generic size-balanced map
implementation (see Data.Map).
Post by Ivan Lazar Miljenovic
(is there an existing type-class that covers
association list-style data structures?).
https://hackage.haskell.org/package/collections-api-1.0.0.0/docs/Data-Collections.html#g:4
Except that it's in another library ;-)
With instances defined here, but only for Lazy versions Data.Map and
Note that the data structures for the Lazy and Strict variants of
[Int]Map are the same, it's just the strictness of the functions that
operate on them that differ.
https://hackage.haskell.org/package/collections-base-instances-1.0.0.0/docs/Data-Collections-BaseInstances.html
Also, the type-class class doesn't have toAscList (or toList) functions
(which is what we would use in the implementation).
So if we want to rely on this we would need to implement toAscList, and
probably add instances for Strict maps (Data.IntMap.Strict, Data.Map.Strict)
Post by Ivan Lazar Miljenovic
Ideally you could also use
HashMap from unordered-containers as well, but since we ultimately
want `type Vertex = Int` I'm not sure if that's worth it; IntMap,
however, is.
I see another problem with HashMap : it doesn't provide a toAscList function
where the keys are sorted, so we would have to sort them, incurring a fixed
O(V log V) cost, whereas with Map and IntMap the user has the possibility to
create the map from an ascending list (fromAscList), in O(V) time and we can
get the list back also (toAscList) in O(V) time.
Post by Ivan Lazar Miljenovic
Post by Olivier S.
Also, using a Map makes the implementation a bit more "natural" : there is
no need for sorting by key, as Map.toAscList gives exactly the sorted list
we want.
We could also deprecate graphFromEdgesWithConsecutiveKeys and
graphFromEdgesWithConsecutiveAscKeys (introduced in proposal I) in favor of
graphFromConsecutiveMap.
- graphFromEdges (takes a List, deprecated, existing
function)
- graphFromEdgesInMap (takes a Map)
- graphFromEdgesInConsecutiveMap (takes a Map with consecutive keys)
- graphFromEdges (takes a List, deprecated, existing
function)
- graphFromMap
- graphFromConsecutiveMap
- graphFromList (takes a List, deprecated, redirects to
graphFromEdges)
- graphFromConsecutiveList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveKeys)
- graphFromConsecutiveAscList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveAscKeys)
Cheers,
Olivier Sohn
[1] https://github.com/haskell/containers/pull/549
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
--
Ivan Lazar Miljenovic
http://IvanMiljenovic.wordpress.com
--
Ivan Lazar Miljenovic
***@gmail.com
http://IvanMiljenovic.wordpress.com
Olivier S.
2018-04-02 23:24:35 UTC
Permalink
2018-04-02 14:34 GMT+02:00 Ivan Lazar Miljenovic
Post by Ivan Lazar Miljenovic
Post by Olivier S.
Hello,
I'm resending this proposal, which is simplified w.r.t the first one, and
where I removed a wrong analysis of a benchmark.
- Proposal I : Optimize the time complexity of (key -> Maybe Vertex) lookups
and graph creation when keys are Integral and consecutive.
(The related PR for proposal I is [1], including benchmarks showing
the
Post by Ivan Lazar Miljenovic
Post by Olivier S.
performance improvements.)
Currently, (key -> Maybe Vertex) lookups returned by graphFromEdges consist
of a binary search on an array, with a time complexity of O(log V) (I will
use V for "Count of vertices", E for "Count of edges").
At the risk of bikeshedding, can you please use |V| and |E| to refer
to the order and size of the graph respectively?
Do you mean in the haddock documentation for complexities or here? I
don't
know which is mor readable, O( (V+E) * log V ) or O( (|V|+|E|) * log |V|
).
Anyway it would be a quick change in the PR, I'm not particularly
attached
to the notation.
Both. |V| and |E| are more standard for this, as V and E represent
the vertices and edges themselves.
Post by Ivan Lazar Miljenovic
Post by Olivier S.
When key is Integral, and keys (of nodes passed to the graph creation
function) form a set of /consecutive/ values (for example : [4,5,6,7]
or
Post by Ivan Lazar Miljenovic
Post by Olivier S.
[5,6,4,7]), we can have an O(1) lookup by substracting the value of
the
Post by Ivan Lazar Miljenovic
Post by Olivier S.
smallest key, and checking for bounds.
I'm not sure I follow this part; are you ignoring order in these lists
(you're referring to sets but using list notation)?
-- (1) keys contains no duplicates
Set.size (Set.fromList keys) == length keys
sort keys == [minimum keys .. maximum keys]
The O(1) lookup is at line 516 of Data/Graph.hs in
https://github.com/haskell/containers/pull/549/files (key_vertex)
Post by Ivan Lazar Miljenovic
Post by Olivier S.
Hence, graph creation complexity is improved, and user algorithms
using
Post by Ivan Lazar Miljenovic
Post by Olivier S.
(key
-> Maybe Vertex) lookups will see their complexity reduced by a factor of
up-to O(log V).
The PR introduces this lookup and uses it for functions
graphFromEdgesWithConsecutiveKeys and
graphFromEdgesWithConsecutiveAscKeys.
Here is a summary of complexities for (graph creation, lookup
function)
Post by Ivan Lazar Miljenovic
Post by Olivier S.
as
O( (V+E) * log V ), O(log V)
O( E + (V*log V) ), O(1)
O( V+E ), O(1)
- Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])] in
favor of `graphFromMap` taking (Map key (node,[key]))
If we pass the same key twice in the list we pass to 'graphFromEdges'
it
Post by Ivan Lazar Miljenovic
Post by Olivier S.
is
undefined which node for that key will actually be used.
Introducing 'graphFromMap', taking a (Map key (node,[key]) would alleviate
this issue, through the type used.
Off the top of my head, I'm not a big fan of this. If we're going to
improve this, then I'd prefer to do so in such a way that allowed for
usage with IntMap
Yes, IntMap seems to be better wrt performances than Map. Quoting the
doc of
This data structure performs especially well on binary operations like
union
and intersection. However, my benchmarks show that it is also (much)
faster
on insertions and deletions when compared to a generic size-balanced map
implementation (see Data.Map).
Post by Ivan Lazar Miljenovic
(is there an existing type-class that covers
association list-style data structures?).
https://hackage.haskell.org/package/collections-api-1.0.0.
0/docs/Data-Collections.html#g:4
Except that it's in another library ;-)
So it seems using Data.IntMap would be a good compromise?
With instances defined here, but only for Lazy versions Data.Map and
Note that the data structures for the Lazy and Strict variants of
[Int]Map are the same, it's just the strictness of the functions that
operate on them that differ.
That's interesting, I wasn't aware of this.
https://hackage.haskell.org/package/collections-base-
instances-1.0.0.0/docs/Data-Collections-BaseInstances.html
Also, the type-class class doesn't have toAscList (or toList) functions
(which is what we would use in the implementation).
So if we want to rely on this we would need to implement toAscList, and
probably add instances for Strict maps (Data.IntMap.Strict,
Data.Map.Strict)
Post by Ivan Lazar Miljenovic
Ideally you could also use
HashMap from unordered-containers as well, but since we ultimately
want `type Vertex = Int` I'm not sure if that's worth it; IntMap,
however, is.
I see another problem with HashMap : it doesn't provide a toAscList
function
where the keys are sorted, so we would have to sort them, incurring a
fixed
O(V log V) cost, whereas with Map and IntMap the user has the
possibility to
create the map from an ascending list (fromAscList), in O(V) time and we
can
get the list back also (toAscList) in O(V) time.
there
Post by Ivan Lazar Miljenovic
Post by Olivier S.
is
no need for sorting by key, as Map.toAscList gives exactly the sorted list
we want.
We could also deprecate graphFromEdgesWithConsecutiveKeys and
graphFromEdgesWithConsecutiveAscKeys (introduced in proposal I) in
favor
Post by Ivan Lazar Miljenovic
Post by Olivier S.
of
graphFromConsecutiveMap.
- graphFromEdges (takes a List, deprecated,
existing
Post by Ivan Lazar Miljenovic
Post by Olivier S.
function)
- graphFromEdgesInMap (takes a Map)
- graphFromEdgesInConsecutiveMap (takes a Map with consecutive
keys)
Post by Ivan Lazar Miljenovic
Post by Olivier S.
- graphFromEdges (takes a List, deprecated,
existing
Post by Ivan Lazar Miljenovic
Post by Olivier S.
function)
- graphFromMap
- graphFromConsecutiveMap
- graphFromList (takes a List, deprecated, redirects to
graphFromEdges)
- graphFromConsecutiveList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveKeys)
- graphFromConsecutiveAscList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveAscKeys)
Cheers,
Olivier Sohn
[1] https://github.com/haskell/containers/pull/549
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
--
Ivan Lazar Miljenovic
http://IvanMiljenovic.wordpress.com
--
Ivan Lazar Miljenovic
http://IvanMiljenovic.wordpress.com
Ivan Lazar Miljenovic
2018-04-02 23:51:25 UTC
Permalink
Post by Olivier S.
So it seems using Data.IntMap would be a good compromise?
IntMap only works if `node ~ Int`; otherwise we lose generality.

My preferences are:

* A typeclass (unfortunately Foldable for a Map is only on the values,
not on the Key; unless you provide a wrapper?)
* Lists (despite the issues you've pointed out, people can always
convert Maps, IntMaps, etc. to lists to convert those values to a
Graph)
* Map
--
Ivan Lazar Miljenovic
***@gmail.com
http://IvanMiljenovic.wordpress.com
Olivier S.
2018-04-03 00:11:23 UTC
Permalink
Post by Ivan Lazar Miljenovic
Post by Olivier S.
So it seems using Data.IntMap would be a good compromise?
IntMap only works if `node ~ Int`; otherwise we lose generality.
I was under the impression that we can replace [(node, key, [key])], by
IntMap (node, [Int]), node being anything we want. Is it not true?
Post by Ivan Lazar Miljenovic
* A typeclass (unfortunately Foldable for a Map is only on the values,
not on the Key; unless you provide a wrapper?)
* Lists (despite the issues you've pointed out, people can always
convert Maps, IntMaps, etc. to lists to convert those values to a
Graph)
* Map
--
Ivan Lazar Miljenovic
http://IvanMiljenovic.wordpress.com
Ivan Lazar Miljenovic
2018-04-03 00:54:11 UTC
Permalink
Post by Olivier S.
Post by Ivan Lazar Miljenovic
Post by Olivier S.
So it seems using Data.IntMap would be a good compromise?
IntMap only works if `node ~ Int`; otherwise we lose generality.
I was under the impression that we can replace [(node, key, [key])], by
IntMap (node, [Int]), node being anything we want. Is it not true?
I've never used Data.Graph in anger, but my understanding is that in
the most polymorphic sense you may consider the equivalent of `Map
node [key]` along with a `node -> key` function. For example: `data
MyNode key = MyNode { nodeID :: key, edges :: [key]}`; then you could
have `graphFromEdges . map (\mn -> (mn, nodeID mn, edges mn))`.

At this point in time the actual type of `node` is no longer useful,
so having `graphFromEdges` consume a `Map key [key]` may suffice...
but then you can't get back the original node type to map back to your
original values.
--
Ivan Lazar Miljenovic
***@gmail.com
http://IvanMiljenovic.wordpress.com
Olivier S.
2018-04-03 07:31:17 UTC
Permalink
I think in your example, nodes are `()` by the current meaning of node in
graphFromEdges.

To be closer to what graphFromEdges does today, we should change it to:

`data MyNode key nodeData = MyNode nodeData key [key]`
Post by Ivan Lazar Miljenovic
2018-04-03 1:51 GMT+02:00 Ivan Lazar Miljenovic <
Post by Ivan Lazar Miljenovic
Post by Olivier S.
So it seems using Data.IntMap would be a good compromise?
IntMap only works if `node ~ Int`; otherwise we lose generality.
I was under the impression that we can replace [(node, key, [key])], by
IntMap (node, [Int]), node being anything we want. Is it not true?
I've never used Data.Graph in anger, but my understanding is that in
the most polymorphic sense you may consider the equivalent of `Map
node [key]` along with a `node -> key` function. For example: `data
MyNode key = MyNode { nodeID :: key, edges :: [key]}`; then you could
have `graphFromEdges . map (\mn -> (mn, nodeID mn, edges mn))`.
At this point in time the actual type of `node` is no longer useful,
so having `graphFromEdges` consume a `Map key [key]` may suffice...
but then you can't get back the original node type to map back to your
original values.
--
Ivan Lazar Miljenovic
http://IvanMiljenovic.wordpress.com
Ivan Lazar Miljenovic
2018-04-03 07:50:33 UTC
Permalink
Post by Olivier S.
I think in your example, nodes are `()` by the current meaning of node in
graphFromEdges.
`data MyNode key nodeData = MyNode nodeData key [key]`
No, `node ~ MyNode` in my example. Whilst the key may be the unique
identifier, the node type itself may be richer (and contain the key!).
Post by Olivier S.
Post by Ivan Lazar Miljenovic
2018-04-03 1:51 GMT+02:00 Ivan Lazar Miljenovic <
Post by Ivan Lazar Miljenovic
Post by Olivier S.
So it seems using Data.IntMap would be a good compromise?
IntMap only works if `node ~ Int`; otherwise we lose generality.
I was under the impression that we can replace [(node, key, [key])], by
IntMap (node, [Int]), node being anything we want. Is it not true?
I've never used Data.Graph in anger, but my understanding is that in
the most polymorphic sense you may consider the equivalent of `Map
node [key]` along with a `node -> key` function. For example: `data
MyNode key = MyNode { nodeID :: key, edges :: [key]}`; then you could
have `graphFromEdges . map (\mn -> (mn, nodeID mn, edges mn))`.
At this point in time the actual type of `node` is no longer useful,
so having `graphFromEdges` consume a `Map key [key]` may suffice...
but then you can't get back the original node type to map back to your
original values.
--
Ivan Lazar Miljenovic
http://IvanMiljenovic.wordpress.com
--
Ivan Miljenovic
On mobile; please excuse any tpyos
Olivier S.
2018-04-03 07:58:31 UTC
Permalink
Post by Ivan Lazar Miljenovic
Post by Olivier S.
I think in your example, nodes are `()` by the current meaning of node in
graphFromEdges.
`data MyNode key nodeData = MyNode nodeData key [key]`
No, `node ~ MyNode` in my example. Whilst the key may be the unique
identifier, the node type itself may be richer (and contain the key!).
Just wanted to point out that if the node contains some data which is
neither key not list of neighbours, then `Map key [key]` won't be enough.
Post by Ivan Lazar Miljenovic
Post by Olivier S.
2018-04-03 2:54 GMT+02:00 Ivan Lazar Miljenovic <
Post by Ivan Lazar Miljenovic
2018-04-03 1:51 GMT+02:00 Ivan Lazar Miljenovic <
Post by Ivan Lazar Miljenovic
Post by Olivier S.
So it seems using Data.IntMap would be a good compromise?
IntMap only works if `node ~ Int`; otherwise we lose generality.
I was under the impression that we can replace [(node, key, [key])], by
IntMap (node, [Int]), node being anything we want. Is it not true?
I've never used Data.Graph in anger, but my understanding is that in
the most polymorphic sense you may consider the equivalent of `Map
node [key]` along with a `node -> key` function. For example: `data
MyNode key = MyNode { nodeID :: key, edges :: [key]}`; then you could
have `graphFromEdges . map (\mn -> (mn, nodeID mn, edges mn))`.
At this point in time the actual type of `node` is no longer useful,
so having `graphFromEdges` consume a `Map key [key]` may suffice...
but then you can't get back the original node type to map back to your
original values.
--
Ivan Lazar Miljenovic
http://IvanMiljenovic.wordpress.com
--
Ivan Miljenovic
On mobile; please excuse any tpyos
Olivier S.
2018-04-03 07:42:40 UTC
Permalink
Hello,

While we began discussing Proposal II with Ivan Lazar Miljenovic, I was
expecting first to discuss Proposal I, as my short-term goal is to get the
PR associated with Proposal I to be merged : https://github.com/haskell/
containers/pull/549

So please bring forward any remarks / concerns / approvals, for Proposal I
in priority!

Thank you,

Olivier
Post by Olivier S.
Hello,
I'm resending this proposal, which is simplified w.r.t the first one, and
where I removed a wrong analysis of a benchmark.
- Proposal I : Optimize the time complexity of (key -> Maybe Vertex)
lookups and graph creation when keys are Integral and consecutive.
(The related PR for proposal I is [1], including benchmarks showing the
performance improvements.)
Currently, (key -> Maybe Vertex) lookups returned by graphFromEdges
consist of a binary search on an array, with a time complexity of O(log V)
(I will use V for "Count of vertices", E for "Count of edges").
When key is Integral, and keys (of nodes passed to the graph creation
function) form a set of /consecutive/ values (for example : [4,5,6,7] or
[5,6,4,7]), we can have an O(1) lookup by substracting the value of the
smallest key, and checking for bounds.
Hence, graph creation complexity is improved, and user algorithms using
(key -> Maybe Vertex) lookups will see their complexity reduced by a factor
of up-to O(log V).
The PR introduces this lookup and uses it for functions
graphFromEdgesWithConsecutiveKeys and graphFromEdgesWithConsecutiveA
scKeys.
Here is a summary of complexities for (graph creation, lookup function) as
O( (V+E) * log V ), O(log V)
O( E + (V*log V) ), O(1)
O( V+E ), O(1)
- Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])] in
favor of `graphFromMap` taking (Map key (node,[key]))
If we pass the same key twice in the list we pass to 'graphFromEdges' it
is undefined which node for that key will actually be used.
Introducing 'graphFromMap', taking a (Map key (node,[key]) would alleviate
this issue, through the type used.
Also, using a Map makes the implementation a bit more "natural" : there is
no need for sorting by key, as Map.toAscList gives exactly the sorted list
we want.
We could also deprecate graphFromEdgesWithConsecutiveKeys and
graphFromEdgesWithConsecutiveAscKeys (introduced in proposal I) in favor
of graphFromConsecutiveMap.
- graphFromEdges (takes a List, deprecated, existing
function)
- graphFromEdgesInMap (takes a Map)
- graphFromEdgesInConsecutiveMap (takes a Map with consecutive keys)
- graphFromEdges (takes a List, deprecated, existing
function)
- graphFromMap
- graphFromConsecutiveMap
- graphFromList (takes a List, deprecated, redirects to
graphFromEdges)
- graphFromConsecutiveList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveKeys)
- graphFromConsecutiveAscList (takes a List, deprecated, redirects to
graphFromEdgesWithConsecutiveAscKeys)
Cheers,
Olivier Sohn
[1] https://github.com/haskell/containers/pull/549
Loading...