Discussion:
Proposal: Remove Semigroup and Monoid instances for Data.Map, Data.IntMap, Data.HashMap
David Feuer
2018-02-13 19:33:45 UTC
Permalink
Many people have recognized for years that the Semigroup and Monoid
instances for Data.Map, Data.IntMap, and Data.HashMap are not so
great. In particular, when the same key is present in both maps, they
simply use the value from the first argument, ignoring the other one.
This somewhat counter-intuitive behavior can lead to bugs. See, for
example, the discussion in Tim Humphries's blog post[*]. I would like
do do the following:

1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
following:

instance (Ord k, Semigroup v) => Semigroup (Map k v) where
(<>) = Data.Map.Strict.unionWith (<>)
instance (Ord k, Semigroup v) => Monoid (Map k v) where
mempty = Data.Map.Strict.empty

instance Semigroup v => Semigroup (IntMap v) where
(<>) = Data.IntMap.Strict.unionWith (<>)
instance Semigroup v => Monoid (IntMap v) where
mempty = Data.IntMap.Strict.empty

instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where
(<>) = Data.HashMap.Strict.unionWith (<>)
instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where
mempty = Data.HashMap.Strict.empty

Why do I want the strict versions? That choice may seem a bit
surprising, since the data structures are lazy. But the lazy versions
really like to leak memory, making them unsuitable for most practical
purposes.

The big risk:

Someone using old code or old tutorial documentation could get subtly
wrong behavior without noticing. That is why I have specified an
extended period between removing the current instances and adding the
desired ones.

Alternatives:

1. Remove the instances but don't add the new ones. I fear this may
lead others to write their own orphan instances, which may not even
all do the same thing.

2. Write separate modules with newtype-wrapped versions of the data
structures implementing the desired instances. Unfortunately, this
would be annoying to maintain, and also annoying to use--packages
using the unwrapped and wrapped versions will use different types.
Manually wrapping and unwrapping to make the different types work with
each other will introduce lots of potential for mistakes and
confusion.

Discussion period: three weeks.

[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html
Andrew Martin
2018-02-13 19:43:10 UTC
Permalink
I am strongly in favor of this proposal. I've actually proposal the same
thing (with additional motivations) on the GHC Trac:
https://ghc.haskell.org/trac/ghc/ticket/1460

I also agree the Monoid instance should be strict in the values. I'm not
sure that the Monoidless Map phase needs to last four or five years (I was
thinking more like two or three years), but that discussion could happen
later.
Post by David Feuer
Many people have recognized for years that the Semigroup and Monoid
instances for Data.Map, Data.IntMap, and Data.HashMap are not so
great. In particular, when the same key is present in both maps, they
simply use the value from the first argument, ignoring the other one.
This somewhat counter-intuitive behavior can lead to bugs. See, for
example, the discussion in Tim Humphries's blog post[*]. I would like
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
instance (Ord k, Semigroup v) => Semigroup (Map k v) where
(<>) = Data.Map.Strict.unionWith (<>)
instance (Ord k, Semigroup v) => Monoid (Map k v) where
mempty = Data.Map.Strict.empty
instance Semigroup v => Semigroup (IntMap v) where
(<>) = Data.IntMap.Strict.unionWith (<>)
instance Semigroup v => Monoid (IntMap v) where
mempty = Data.IntMap.Strict.empty
instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where
(<>) = Data.HashMap.Strict.unionWith (<>)
instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where
mempty = Data.HashMap.Strict.empty
Why do I want the strict versions? That choice may seem a bit
surprising, since the data structures are lazy. But the lazy versions
really like to leak memory, making them unsuitable for most practical
purposes.
Someone using old code or old tutorial documentation could get subtly
wrong behavior without noticing. That is why I have specified an
extended period between removing the current instances and adding the
desired ones.
1. Remove the instances but don't add the new ones. I fear this may
lead others to write their own orphan instances, which may not even
all do the same thing.
2. Write separate modules with newtype-wrapped versions of the data
structures implementing the desired instances. Unfortunately, this
would be annoying to maintain, and also annoying to use--packages
using the unwrapped and wrapped versions will use different types.
Manually wrapping and unwrapping to make the different types work with
each other will introduce lots of potential for mistakes and
confusion.
Discussion period: three weeks.
[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
--
-Andrew Thaddeus Martin
John Wiegley
2018-02-13 20:23:14 UTC
Permalink
AM> I am strongly in favor of this proposal. I've actually proposal the same
AM> thing (with additional motivations) on the GHC Trac: https://
AM> ghc.haskell.org/trac/ghc/ticket/1460

+1 here too
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
Chris Wong
2018-02-13 19:51:16 UTC
Permalink
Hi David,

On Feb 14, 2018 08:34, "David Feuer" <***@gmail.com> wrote:

Alternatives:

1. Remove the instances but don't add the new ones. I fear this may
lead others to write their own orphan instances, which may not even
all do the same thing.

2. Write separate modules with newtype-wrapped versions of the data
structures implementing the desired instances. Unfortunately, this
would be annoying to maintain, and also annoying to use--packages
using the unwrapped and wrapped versions will use different types.
Manually wrapping and unwrapping to make the different types work with
each other will introduce lots of potential for mistakes and
confusion.


A third alternative is to "poison" the instances by placing TypeError in
their constraints. This would prevent users from writing orphan instances,
as they would overlap with the poisoned ones.

On other Haskell implementations (if we still support them), we can use a
private class with no instances for this purpose.

See
https://hackage.haskell.org/package/base-4.10.1.0/docs/GHC-TypeLits.html#t:TypeError

Chris


Discussion period: three weeks.

[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html
Mario Blažević
2018-02-13 19:54:13 UTC
Permalink
+1. But whatever happened to your proposal from last May? I don't think
there were any objections to it. Would the two proposals be combined, or
have you decided to drop the previous one?

https://mail.haskell.org/pipermail/libraries/2017-May/028036.html
A lot of people have wrappers around Data.Map and Data.IntMap to give
them more useful (Semigroup and) Monoid instances. I'd like to add such
wrappers to containers. What we need to be able to do that are *names*
for the new modules. I can't think of any, so I'm reaching out to the
list. Please suggest names! Another question is whether we should take
the opportunity of new modules to modernize and streamline the API a
bit. I'd like, at least, to separate "safe" from "unsafe" functions,
putting the unsafe ones in .Unsafe modules.
Many people have recognized for years that the Semigroup and Monoid
instances for Data.Map, Data.IntMap, and Data.HashMap are not so
great. In particular, when the same key is present in both maps, they
simply use the value from the first argument, ignoring the other one.
This somewhat counter-intuitive behavior can lead to bugs. See, for
example, the discussion in Tim Humphries's blog post[*]. I would like
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
instance (Ord k, Semigroup v) => Semigroup (Map k v) where
(<>) = Data.Map.Strict.unionWith (<>)
instance (Ord k, Semigroup v) => Monoid (Map k v) where
mempty = Data.Map.Strict.empty
instance Semigroup v => Semigroup (IntMap v) where
(<>) = Data.IntMap.Strict.unionWith (<>)
instance Semigroup v => Monoid (IntMap v) where
mempty = Data.IntMap.Strict.empty
instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where
(<>) = Data.HashMap.Strict.unionWith (<>)
instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where
mempty = Data.HashMap.Strict.empty
Why do I want the strict versions? That choice may seem a bit
surprising, since the data structures are lazy. But the lazy versions
really like to leak memory, making them unsuitable for most practical
purposes.
Someone using old code or old tutorial documentation could get subtly
wrong behavior without noticing. That is why I have specified an
extended period between removing the current instances and adding the
desired ones.
1. Remove the instances but don't add the new ones. I fear this may
lead others to write their own orphan instances, which may not even
all do the same thing.
2. Write separate modules with newtype-wrapped versions of the data
structures implementing the desired instances. Unfortunately, this
would be annoying to maintain, and also annoying to use--packages
using the unwrapped and wrapped versions will use different types.
Manually wrapping and unwrapping to make the different types work with
each other will introduce lots of potential for mistakes and
confusion.
Discussion period: three weeks.
[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
David Feuer
2018-02-13 19:59:42 UTC
Permalink
I don't think the old proposal was really the right way.
Post by Mario Blažević
+1. But whatever happened to your proposal from last May? I don't think
there were any objections to it. Would the two proposals be combined, or
have you decided to drop the previous one?
https://mail.haskell.org/pipermail/libraries/2017-May/028036.html
A lot of people have wrappers around Data.Map and Data.IntMap to give
them more useful (Semigroup and) Monoid instances. I'd like to add such
wrappers to containers. What we need to be able to do that are *names*
for the new modules. I can't think of any, so I'm reaching out to the
list. Please suggest names! Another question is whether we should take
the opportunity of new modules to modernize and streamline the API a
bit. I'd like, at least, to separate "safe" from "unsafe" functions,
putting the unsafe ones in .Unsafe modules.
Many people have recognized for years that the Semigroup and Monoid
instances for Data.Map, Data.IntMap, and Data.HashMap are not so
great. In particular, when the same key is present in both maps, they
simply use the value from the first argument, ignoring the other one.
This somewhat counter-intuitive behavior can lead to bugs. See, for
example, the discussion in Tim Humphries's blog post[*]. I would like
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
instance (Ord k, Semigroup v) => Semigroup (Map k v) where
(<>) = Data.Map.Strict.unionWith (<>)
instance (Ord k, Semigroup v) => Monoid (Map k v) where
mempty = Data.Map.Strict.empty
instance Semigroup v => Semigroup (IntMap v) where
(<>) = Data.IntMap.Strict.unionWith (<>)
instance Semigroup v => Monoid (IntMap v) where
mempty = Data.IntMap.Strict.empty
instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where
(<>) = Data.HashMap.Strict.unionWith (<>)
instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where
mempty = Data.HashMap.Strict.empty
Why do I want the strict versions? That choice may seem a bit
surprising, since the data structures are lazy. But the lazy versions
really like to leak memory, making them unsuitable for most practical
purposes.
Someone using old code or old tutorial documentation could get subtly
wrong behavior without noticing. That is why I have specified an
extended period between removing the current instances and adding the
desired ones.
1. Remove the instances but don't add the new ones. I fear this may
lead others to write their own orphan instances, which may not even
all do the same thing.
2. Write separate modules with newtype-wrapped versions of the data
structures implementing the desired instances. Unfortunately, this
would be annoying to maintain, and also annoying to use--packages
using the unwrapped and wrapped versions will use different types.
Manually wrapping and unwrapping to make the different types work with
each other will introduce lots of potential for mistakes and
confusion.
Discussion period: three weeks.
[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Daniel Cartwright
2018-02-13 20:07:00 UTC
Permalink
I am strongly in favour of this as well. I do not like the newtype
approach, I think backwards-*incompatibility* with four to five year-old
(or two to three year-old) code is OK.
I am strongly in favour of this as well. I do not like the newtype
approach, I think backwards-*incompatibility* with four to five year-old
(or two to three year-old) code is OK.
Post by David Feuer
I don't think the old proposal was really the right way.
Post by Mario Blažević
+1. But whatever happened to your proposal from last May? I don't think
there were any objections to it. Would the two proposals be combined, or
have you decided to drop the previous one?
https://mail.haskell.org/pipermail/libraries/2017-May/028036.html
A lot of people have wrappers around Data.Map and Data.IntMap to give
them more useful (Semigroup and) Monoid instances. I'd like to add such
wrappers to containers. What we need to be able to do that are *names*
for the new modules. I can't think of any, so I'm reaching out to the
list. Please suggest names! Another question is whether we should take
the opportunity of new modules to modernize and streamline the API a
bit. I'd like, at least, to separate "safe" from "unsafe" functions,
putting the unsafe ones in .Unsafe modules.
Many people have recognized for years that the Semigroup and Monoid
instances for Data.Map, Data.IntMap, and Data.HashMap are not so
great. In particular, when the same key is present in both maps, they
simply use the value from the first argument, ignoring the other one.
This somewhat counter-intuitive behavior can lead to bugs. See, for
example, the discussion in Tim Humphries's blog post[*]. I would like
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
instance (Ord k, Semigroup v) => Semigroup (Map k v) where
(<>) = Data.Map.Strict.unionWith (<>)
instance (Ord k, Semigroup v) => Monoid (Map k v) where
mempty = Data.Map.Strict.empty
instance Semigroup v => Semigroup (IntMap v) where
(<>) = Data.IntMap.Strict.unionWith (<>)
instance Semigroup v => Monoid (IntMap v) where
mempty = Data.IntMap.Strict.empty
instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where
(<>) = Data.HashMap.Strict.unionWith (<>)
instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v)
where
Post by Mario Blažević
mempty = Data.HashMap.Strict.empty
Why do I want the strict versions? That choice may seem a bit
surprising, since the data structures are lazy. But the lazy versions
really like to leak memory, making them unsuitable for most practical
purposes.
Someone using old code or old tutorial documentation could get subtly
wrong behavior without noticing. That is why I have specified an
extended period between removing the current instances and adding the
desired ones.
1. Remove the instances but don't add the new ones. I fear this may
lead others to write their own orphan instances, which may not even
all do the same thing.
2. Write separate modules with newtype-wrapped versions of the data
structures implementing the desired instances. Unfortunately, this
would be annoying to maintain, and also annoying to use--packages
using the unwrapped and wrapped versions will use different types.
Manually wrapping and unwrapping to make the different types work with
each other will introduce lots of potential for mistakes and
confusion.
Discussion period: three weeks.
[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Kris Nuttycombe
2018-02-13 20:16:30 UTC
Permalink
Given that we're talking about a years-long process, what about using the
newtype approach as an intermediate step, so that we don't get additional
proliferation of third-party newtype wrappers in the interim? Deprecate the
current instances and add instances using newtype wrappers at the same
type, such that a newtype'd version of the current functionality is
available to folks as a migration path, as well as a newtype for which the
instances provide the correct behavior. Then, in the future, the newtypes
for the corrected behavior will eventually become superfluous, and for
those who rely on the newtypes that represent the current behavior, they
won't have to change anything.
Post by Daniel Cartwright
I am strongly in favour of this as well. I do not like the newtype
approach, I think backwards-*incompatibility* with four to five year-old
(or two to three year-old) code is OK.
Post by Daniel Cartwright
I am strongly in favour of this as well. I do not like the newtype
approach, I think backwards-*incompatibility* with four to five year-old
(or two to three year-old) code is OK.
Post by David Feuer
I don't think the old proposal was really the right way.
Post by Mario Blažević
+1. But whatever happened to your proposal from last May? I don't think
there were any objections to it. Would the two proposals be combined,
or
Post by Mario Blažević
have you decided to drop the previous one?
https://mail.haskell.org/pipermail/libraries/2017-May/028036.html
A lot of people have wrappers around Data.Map and Data.IntMap to give
them more useful (Semigroup and) Monoid instances. I'd like to add
such
Post by Mario Blažević
wrappers to containers. What we need to be able to do that are *names*
for the new modules. I can't think of any, so I'm reaching out to the
list. Please suggest names! Another question is whether we should take
the opportunity of new modules to modernize and streamline the API a
bit. I'd like, at least, to separate "safe" from "unsafe" functions,
putting the unsafe ones in .Unsafe modules.
Many people have recognized for years that the Semigroup and Monoid
instances for Data.Map, Data.IntMap, and Data.HashMap are not so
great. In particular, when the same key is present in both maps, they
simply use the value from the first argument, ignoring the other one.
This somewhat counter-intuitive behavior can lead to bugs. See, for
example, the discussion in Tim Humphries's blog post[*]. I would like
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
instance (Ord k, Semigroup v) => Semigroup (Map k v) where
(<>) = Data.Map.Strict.unionWith (<>)
instance (Ord k, Semigroup v) => Monoid (Map k v) where
mempty = Data.Map.Strict.empty
instance Semigroup v => Semigroup (IntMap v) where
(<>) = Data.IntMap.Strict.unionWith (<>)
instance Semigroup v => Monoid (IntMap v) where
mempty = Data.IntMap.Strict.empty
instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where
(<>) = Data.HashMap.Strict.unionWith (<>)
instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v)
where
Post by Mario Blažević
mempty = Data.HashMap.Strict.empty
Why do I want the strict versions? That choice may seem a bit
surprising, since the data structures are lazy. But the lazy versions
really like to leak memory, making them unsuitable for most practical
purposes.
Someone using old code or old tutorial documentation could get subtly
wrong behavior without noticing. That is why I have specified an
extended period between removing the current instances and adding the
desired ones.
1. Remove the instances but don't add the new ones. I fear this may
lead others to write their own orphan instances, which may not even
all do the same thing.
2. Write separate modules with newtype-wrapped versions of the data
structures implementing the desired instances. Unfortunately, this
would be annoying to maintain, and also annoying to use--packages
using the unwrapped and wrapped versions will use different types.
Manually wrapping and unwrapping to make the different types work with
each other will introduce lots of potential for mistakes and
confusion.
Discussion period: three weeks.
[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
David Feuer
2018-02-13 20:34:00 UTC
Permalink
I don't want to maintain a whole package of newtype-wrapped everything
for this, but I gather that Ben Gamari already has one people can use
if they like. Now that Semigroup is in base, I think Kris's idea of
using newtype wrappers *permanently* might be a good idea. That way we
can have

newtype UnionCombine k v = UnionCombine (Map k v)
UnionCombine xs <> UnionCombine ys = UnionCombine $ unionWith (<>) xs ys

newtype IntersectionCombine k v = IntersectionCombine (Map k v)
IntersectionCombine xs <> IntersectionCombine ys = IntersectionCombine
$ intersectionWith (<>) xs ys

We can add such newtypes (by whatever names people prefer)
immediately, and kick the question of whether to ever reinstate the
instances down the road.

On Tue, Feb 13, 2018 at 3:16 PM, Kris Nuttycombe
Post by Kris Nuttycombe
Given that we're talking about a years-long process, what about using the
newtype approach as an intermediate step, so that we don't get additional
proliferation of third-party newtype wrappers in the interim? Deprecate the
current instances and add instances using newtype wrappers at the same type,
such that a newtype'd version of the current functionality is available to
folks as a migration path, as well as a newtype for which the instances
provide the correct behavior. Then, in the future, the newtypes for the
corrected behavior will eventually become superfluous, and for those who
rely on the newtypes that represent the current behavior, they won't have to
change anything.
Post by Daniel Cartwright
I am strongly in favour of this as well. I do not like the newtype
approach, I think backwards-*incompatibility* with four to five year-old (or
two to three year-old) code is OK.
Post by Daniel Cartwright
I am strongly in favour of this as well. I do not like the newtype
approach, I think backwards-*incompatibility* with four to five year-old (or
two to three year-old) code is OK.
Post by David Feuer
I don't think the old proposal was really the right way.
Post by Mario Blažević
+1. But whatever happened to your proposal from last May? I don't think
there were any objections to it. Would the two proposals be combined, or
have you decided to drop the previous one?
https://mail.haskell.org/pipermail/libraries/2017-May/028036.html
A lot of people have wrappers around Data.Map and Data.IntMap to give
them more useful (Semigroup and) Monoid instances. I'd like to add such
wrappers to containers. What we need to be able to do that are *names*
for the new modules. I can't think of any, so I'm reaching out to the
list. Please suggest names! Another question is whether we should take
the opportunity of new modules to modernize and streamline the API a
bit. I'd like, at least, to separate "safe" from "unsafe" functions,
putting the unsafe ones in .Unsafe modules.
Many people have recognized for years that the Semigroup and Monoid
instances for Data.Map, Data.IntMap, and Data.HashMap are not so
great. In particular, when the same key is present in both maps, they
simply use the value from the first argument, ignoring the other one.
This somewhat counter-intuitive behavior can lead to bugs. See, for
example, the discussion in Tim Humphries's blog post[*]. I would like
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
instance (Ord k, Semigroup v) => Semigroup (Map k v) where
(<>) = Data.Map.Strict.unionWith (<>)
instance (Ord k, Semigroup v) => Monoid (Map k v) where
mempty = Data.Map.Strict.empty
instance Semigroup v => Semigroup (IntMap v) where
(<>) = Data.IntMap.Strict.unionWith (<>)
instance Semigroup v => Monoid (IntMap v) where
mempty = Data.IntMap.Strict.empty
instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k
v)
where
(<>) = Data.HashMap.Strict.unionWith (<>)
instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where
mempty = Data.HashMap.Strict.empty
Why do I want the strict versions? That choice may seem a bit
surprising, since the data structures are lazy. But the lazy versions
really like to leak memory, making them unsuitable for most practical
purposes.
Someone using old code or old tutorial documentation could get subtly
wrong behavior without noticing. That is why I have specified an
extended period between removing the current instances and adding the
desired ones.
1. Remove the instances but don't add the new ones. I fear this may
lead others to write their own orphan instances, which may not even
all do the same thing.
2. Write separate modules with newtype-wrapped versions of the data
structures implementing the desired instances. Unfortunately, this
would be annoying to maintain, and also annoying to use--packages
using the unwrapped and wrapped versions will use different types.
Manually wrapping and unwrapping to make the different types work with
each other will introduce lots of potential for mistakes and
confusion.
Discussion period: three weeks.
[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Kris Nuttycombe
2018-02-13 19:56:34 UTC
Permalink
I actually had this very issue bite me *in an interview*. I would love
these instances to be removed, and probably not replaced directly -
instead, I think we should take the `Sum`/`Product` approach, and provide
instances for newtypes around these type constructors so that there can be
no backwards-incompatible-semantics issues.
Post by David Feuer
Many people have recognized for years that the Semigroup and Monoid
instances for Data.Map, Data.IntMap, and Data.HashMap are not so
great. In particular, when the same key is present in both maps, they
simply use the value from the first argument, ignoring the other one.
This somewhat counter-intuitive behavior can lead to bugs. See, for
example, the discussion in Tim Humphries's blog post[*]. I would like
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
instance (Ord k, Semigroup v) => Semigroup (Map k v) where
(<>) = Data.Map.Strict.unionWith (<>)
instance (Ord k, Semigroup v) => Monoid (Map k v) where
mempty = Data.Map.Strict.empty
instance Semigroup v => Semigroup (IntMap v) where
(<>) = Data.IntMap.Strict.unionWith (<>)
instance Semigroup v => Monoid (IntMap v) where
mempty = Data.IntMap.Strict.empty
instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where
(<>) = Data.HashMap.Strict.unionWith (<>)
instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where
mempty = Data.HashMap.Strict.empty
Why do I want the strict versions? That choice may seem a bit
surprising, since the data structures are lazy. But the lazy versions
really like to leak memory, making them unsuitable for most practical
purposes.
Someone using old code or old tutorial documentation could get subtly
wrong behavior without noticing. That is why I have specified an
extended period between removing the current instances and adding the
desired ones.
1. Remove the instances but don't add the new ones. I fear this may
lead others to write their own orphan instances, which may not even
all do the same thing.
2. Write separate modules with newtype-wrapped versions of the data
structures implementing the desired instances. Unfortunately, this
would be annoying to maintain, and also annoying to use--packages
using the unwrapped and wrapped versions will use different types.
Manually wrapping and unwrapping to make the different types work with
each other will introduce lots of potential for mistakes and
confusion.
Discussion period: three weeks.
[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Herbert Valerio Riedel
2018-02-13 22:28:37 UTC
Permalink
David,


On 2018-02-13 at 14:33:45 -0500, David Feuer wrote:

[...]
Post by David Feuer
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
Why does this need to be a such a dreadfully long-winded process? All we
need is a way to somehow signal a semantic change in the exposed API --
and it turns out that we actually already have the technology for this!

This is exactly one of the core ideas of "semantic versioning" (as a
dep-solver-computation-friendly proxy for machine-checkable formal API
contracts...) and it's got even "semantic" in its name!


In other words, the proposal can be greatly simplified to


1. Make a new major release (or maybe even make it a super-major
release, i.e. to `containers-1.0.0.0` for extra saliency) replacing
the instances with the more desirable ones; describe the change
prominently in the changelog.


Profit!


Life's short; do we really need a multi-generational journey where the
original supporters may not even be around anymore to see the proposal
reach its final destination... ;-)


Cheers,
HVR
Matt Renaud
2018-02-13 22:39:36 UTC
Permalink
+1 to HVR's suggestion, this could be a great opportunity to clean out all
the old cruft from the containers and unordered-containers APIs in one go.
That combined with all the documentation improvements would warrant a
1.0.0.0 for both packages IMHO :) I'd be happy to help with figuring out
what that 1.0.0.0 API could/should look like, since it should almost
certainly include a lot of community input/feedback.
Post by Chris Wong
David,
[...]
Post by David Feuer
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
Why does this need to be a such a dreadfully long-winded process? All we
need is a way to somehow signal a semantic change in the exposed API --
and it turns out that we actually already have the technology for this!
This is exactly one of the core ideas of "semantic versioning" (as a
dep-solver-computation-friendly proxy for machine-checkable formal API
contracts...) and it's got even "semantic" in its name!
In other words, the proposal can be greatly simplified to
1. Make a new major release (or maybe even make it a super-major
release, i.e. to `containers-1.0.0.0` for extra saliency) replacing
the instances with the more desirable ones; describe the change
prominently in the changelog.
Profit!
Life's short; do we really need a multi-generational journey where the
original supporters may not even be around anymore to see the proposal
reach its final destination... ;-)
Cheers,
HVR
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
David Feuer
2018-02-13 22:41:25 UTC
Permalink
We may not have to be *quite* as long-winded as I initially suggested,
but I don't think your way is sufficiently long-winded. The advantage
of removing the instances first and then adding them back is that the
version(s) without the instances will make compilation of every single
package that uses them fail. The way you suggest, if someone has a
package using containers or unordered-containers, they don't even have
a simple way to find out whether they need to make changes. On the
opposite side, Chris Wong's suggestion would let us be very explicit,
with a type error that warns users that the instance is missing
*because it is changing* and that no code using the instance will work
for both 0.* and 1.* versions. But I do think we should also consider
Kris's more conservative approach.

On Tue, Feb 13, 2018 at 5:28 PM, Herbert Valerio Riedel
Post by Chris Wong
David,
[...]
Post by David Feuer
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
Why does this need to be a such a dreadfully long-winded process? All we
need is a way to somehow signal a semantic change in the exposed API --
and it turns out that we actually already have the technology for this!
This is exactly one of the core ideas of "semantic versioning" (as a
dep-solver-computation-friendly proxy for machine-checkable formal API
contracts...) and it's got even "semantic" in its name!
In other words, the proposal can be greatly simplified to
1. Make a new major release (or maybe even make it a super-major
release, i.e. to `containers-1.0.0.0` for extra saliency) replacing
the instances with the more desirable ones; describe the change
prominently in the changelog.
Profit!
Life's short; do we really need a multi-generational journey where the
original supporters may not even be around anymore to see the proposal
reach its final destination... ;-)
Cheers,
HVR
Herbert Valerio Riedel
2018-02-13 22:48:06 UTC
Permalink
Hi,
Post by David Feuer
We may not have to be *quite* as long-winded as I initially suggested,
but I don't think your way is sufficiently long-winded. The advantage
of removing the instances first and then adding them back is that the
version(s) without the instances will make compilation of every single
package that uses them fail. The way you suggest, if someone has a
package using containers or unordered-containers, they don't even have
a simple way to find out whether they need to make changes. On the
opposite side, Chris Wong's suggestion would let us be very explicit,
with a type error that warns users that the instance is missing
*because it is changing* and that no code using the instance will work
for both 0.* and 1.* versions.
Alright, then let's do a little Gedankenexperiment; what if you release
a containers-0.9.0.0 (which drops the instances) and a
containers-1.0.0.0 (which 'adds back' the desired new instances) on the
same day!

...wouldn't this allow us to have the cake and eat it too? ;-)
David Feuer
2018-02-13 22:52:47 UTC
Permalink
It would let us have some cake. Users would be able to test against 0.9, in
theory. But they'd have to do it intentionally. And Stack-based projects
would probably need some shenanigans to deal with a version of containers
that doesn't come with GHC. So I really think that avoiding these subtle
bugs requires at least a full GHC release cycle.

On Feb 13, 2018 5:48 PM, "Herbert Valerio Riedel" <***@gmail.com>
wrote:

Hi,
Post by David Feuer
We may not have to be *quite* as long-winded as I initially suggested,
but I don't think your way is sufficiently long-winded. The advantage
of removing the instances first and then adding them back is that the
version(s) without the instances will make compilation of every single
package that uses them fail. The way you suggest, if someone has a
package using containers or unordered-containers, they don't even have
a simple way to find out whether they need to make changes. On the
opposite side, Chris Wong's suggestion would let us be very explicit,
with a type error that warns users that the instance is missing
*because it is changing* and that no code using the instance will work
for both 0.* and 1.* versions.
Alright, then let's do a little Gedankenexperiment; what if you release
a containers-0.9.0.0 (which drops the instances) and a
containers-1.0.0.0 (which 'adds back' the desired new instances) on the
same day!

...wouldn't this allow us to have the cake and eat it too? ;-)
Matt Renaud
2018-02-13 23:14:06 UTC
Permalink
avoiding these subtle bugs requires at least a full GHC release cycle
Now that GHC is on a 6 month release cycle this seems preferable to waiting
3-5 years.
It would let us have some cake. Users would be able to test against 0.9,
in theory. But they'd have to do it intentionally. And Stack-based projects
would probably need some shenanigans to deal with a version of containers
that doesn't come with GHC. So I really think that avoiding these subtle
bugs requires at least a full GHC release cycle.
Hi,
Post by David Feuer
We may not have to be *quite* as long-winded as I initially suggested,
but I don't think your way is sufficiently long-winded. The advantage
of removing the instances first and then adding them back is that the
version(s) without the instances will make compilation of every single
package that uses them fail. The way you suggest, if someone has a
package using containers or unordered-containers, they don't even have
a simple way to find out whether they need to make changes. On the
opposite side, Chris Wong's suggestion would let us be very explicit,
with a type error that warns users that the instance is missing
*because it is changing* and that no code using the instance will work
for both 0.* and 1.* versions.
Alright, then let's do a little Gedankenexperiment; what if you release
a containers-0.9.0.0 (which drops the instances) and a
containers-1.0.0.0 (which 'adds back' the desired new instances) on the
same day!
...wouldn't this allow us to have the cake and eat it too? ;-)
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Herbert Valerio Riedel
2018-02-13 23:50:06 UTC
Permalink
David,
Post by David Feuer
Post by Herbert Valerio Riedel
Alright, then let's do a little Gedankenexperiment; what if you release
a containers-0.9.0.0 (which drops the instances) and a
containers-1.0.0.0 (which 'adds back' the desired new instances) on the
same day!
...wouldn't this allow us to have the cake and eat it too? ;-)
It would let us have some cake. Users would be able to test against
0.9, in theory. But they'd have to do it intentionally.
This is what I was trying to get at: that's always the case. It doesn't
matter whether you release it same day, a week later, or a year later. I
have to intentionally *not* skip the 0.9 release. In fact, I'd want to
skip the 0.9 release myself if possible, as supporting both,
containers-0.9 and containers-1.0 in the same consumer code is going to
be awkward enough for me not wanting to do that (or I'd have to
basically not use the instances at all); If I care about the new
instances, I'd rather specify `containers ^>= 1.0.0.0` and thus not
support containers-0.9.

I would have proceeded to point out in my Gedankenexperiment, that in
order to use such a container-0.9 release, you'd have to force every
package that depends on `containers` to go through such a 0.9 phase in
lockstep, which in itself poses an ordeal in terms of package update
logistic; and then if you spread the time between the 0.9 and the 1.0
release apart, repeat this dance a 2nd time, with yet another new major
version bump, which requires yet another round of consumer-code reviews
(*because* a major version increment was signalled; and we do that as
API consumers are supposed to pay attention to the major version
increment signal...)

So consequently, if anything, a container-0.9 release would be a kind of
an artificial pseudo-release anyway IMO. You could even just condense it
into a cabal package flag of a containers-1.0 release, which disables
the Monoid/Semigroup instances; then you don't even need a distinct
container-0.9 release at all, nor do you have to contaminate the API
progression with an artificial container-0.9 discontinuity.
Post by David Feuer
And Stack-based projects would probably need some shenanigans to deal
with a version of containers that doesn't come with GHC.
I'm not convinced we should let the weakest link hold us back. If
there's limitations in the tooling, we should rather address them, rather
than contort ourselves; c.f. the Genius Tailor
( http://www.wealthculture.cn/infoShow.asp?nid=880&sort=Article%20List )
Post by David Feuer
So I really think that avoiding these subtle bugs requires at least a
full GHC release cycle.
I don't think waiting a full GHC release would be either necessary nor
effective at addressing your concerns, which I consider to be disputable
arguments to begin with.


-- hvr
Oleg Grenrus
2018-02-14 15:48:36 UTC
Permalink
I don't like a "no-instances" window. We can change the instances to correct
ones directly. That's what semantic versioning is about, to be able to
communicate semantic changes.

We should go directly to

   Semigroup v => Semigroup (Map k v)
   Semigroup v => Monoid (Map k v)

If `containers-0.6` or `containers-1.0` included some other (compile-time)
breaking change: even better.

I want to point out that a lot of code *will* break at compile-time with the
change.  In polymorphic setting `v` doesn't have `Semigroup v`
attached.  Many
concrete types arent `Semigroup` (e.g. `Int`; here and later when I say
isn't
`Semigroup`, I mean that it isn't `Monoid` either).

I tried to compile our `futurice-prelude` [1] and one small size internal
app (40 modules) with patched containers: things break at compile time

This change *won't get thru unnoticed*. See below for details.

Also people (we and others) use `mempty` for `Map.empty`, but for `mempty`
semantics won't change. We might need to use `Map.empty` if `v` isn't
`Semigroup`,
but that's catched by compiler. Note: that would mean to use `Map.empty`
everywhere
if there is no-instances window.

I also conjecture, based on the results, that `Ord k => Monoid (Map k
v)` for
`union` is currently avoided in libs. Maybe because it does *the wrong
thing*,
so people don't use it. In our own 50kLOC codebase we have three
`Map.unionWith` or `Map.unionsWith`, one of which is

    groupThings = Map.unionsWith (<>) . map f).

I think `Map k String` or `Map k Text` things might be common in private
codebases (not ours). Maybe not the most popular opinion, but I think they
deserve to break.


Something else: If we replace the instance, we have to leave it out of
`containers` for older than `base-4.9`.
- Either we provide `Semigroup` (and `Monoid` !) in `semigroups` (See +) or
- just have `base >= 4.9` in that release of `containers`
  (that will be three GHCs soon).


To conclude, The brekage on Hackage would be driven by things not having
`Semigroup` instances, not semantics. There might be subtle breakages,
but not
something we as community cannot handle. So let's just make a change.

Best regards, Oleg

+ At this point we want `Semigroup v` for `Monoid (Map k v)`, and because
  `semigroups` depend on `containers` (and not other way), we'd need to have
  `Monoid (Map k v)` also in `semigroups`.

---

I did an experiment: compile our `futurice-prelude` [1] with amended
`containers`, I changed only instances for `Map`.


Removal of the instances breaks Cabal, so I could test far enough what
happens:

    Distribution/Simple/Program/GHC.hs:546:12: error:
        • No instance for (Semigroup (Map Extension String))
            arising from a use of ‘gmempty’
        • In the expression: gmempty
          In an equation for ‘mempty’: mempty = gmempty
          In the instance declaration for ‘Monoid GhcOptions’
        |
    546 |   mempty = gmempty
        |            ^^^^^^^

This is one place where we want `union` semantics.
OTOH in current development version of Cabal the map is of type

    Map Extension (Maybe Compiler.Flag),

as we moving avoid from stringly-typed programming. And `Flag` doesn't have
`Semigroup` instance.


Second step: Let's see where Monoid (Map k v) is used. We cannot
deprecate the
instance, but we can se where it's used by a -fdefer-type-error trick
(thanks Bartosz Nitka aka niteria [2])
Note that this step actually has changes from third step already, so
we don't see failures where `Semigroup v` context would have caused
compilation
failure.

To my happy surprise the instance were used only once in Cabal-2.0.1.1
(above),
and then in `futurice-prelude` itself where it would also break with
`Semigroup v`.
So 231 packages at the bottom of dependency graph of lens+servant apps
don't use `Monoid (Map k v)`. Interesting fact, I'd say.


Third step: If I change instance to use `unionWith (<>)`, then the first
failure in our production code is `reducers-3.12.2`

    src/Data/Semigroup/Reducer.hs:233:10: error:
        • Could not deduce (Semigroup v)
            arising from the superclasses of an instance declaration
          from the context: Ord k
            bound by the instance declaration
            at src/Data/Semigroup/Reducer.hs:233:10-42
          Possible fix:
            add (Semigroup v) to the context of the instance declaration
        • In the instance declaration for ‘Reducer (k, v) (Map k v)’
        |
    233 | instance Ord k => Reducer (k, v) (Map k v) where

and `servant-0.13`

    src/Servant/Server/Internal.hs:691:34: error:
        • No instance for (Data.Semigroup.Semigroup
                             (Router' env RoutingApplication))
            arising from a use of ‘mempty’
        • In the first argument of ‘StaticRouter’, namely ‘mempty’
          In the expression: StaticRouter mempty mempty
          In an equation for ‘route’:
              route Proxy _ _ = StaticRouter mempty mempty
        |
    691 |   route Proxy _ _ = StaticRouter mempty mempty

and `swagger2-2.2`.  `SwaggerMonoid` differes for `Maybe`, but has default
implementation in terms of `Monoid`.

    src/Data/Swagger/Internal/Utils.hs:116:10: error:
        • Could not deduce (Data.Semigroup.Semigroup v)
            arising from a use of
‘Data.Swagger.Internal.Utils.$dmswaggerMappend’
          from the context: Ord k
            bound by the instance declaration
            at src/Data/Swagger/Internal/Utils.hs:116:10-41
          Possible fix:
            add (Data.Semigroup.Semigroup v) to the context of
              the instance declaration
        • In the expression:
            Data.Swagger.Internal.Utils.$dmswaggerMappend @Map k v
          In an equation for ‘swaggerMappend’:
              swaggerMappend
                = Data.Swagger.Internal.Utils.$dmswaggerMappend @Map k v
          In the instance declaration for ‘SwaggerMonoid (Map k v)’
        |
    116 | instance Ord k => SwaggerMonoid (Map k v)

And then we get to `futurice-prelude` which also fails:

    src/Futurice/Prelude.hs:177:41: error:
        • Could not deduce (Semigroup v) arising from a use of ‘ifolded’

Funny thing about that code, is that there we work with `Map k (Map l
v)` and
fold them.  For other one we want `UnionWith` and for other any semantic
should
be ok  (we have tests, they *will* break if I get things wrong).

We also have a `Map` wrapper which also breaks loudly.


Fourth step: fix above and contunue investigation where `Monoid (Map k
v)` is
used.  I built one internal application which pulls in even more
dependencies
in addition to what `futurice-prelude` already depends upon (465 deps, e.g.
`amazonka`).

- Two of our internal libraries use `mempty` (second one a lot). Not
breaking.

- One package uses a `Map` wrapper used above (for the sake of experiment I
  passed the `Warning` constraint down)

- The app itself uses `mempty` of `folded` on things which aren't
`Semigroup`.

- `yaml-0.8.28` breaks:

    Data/Yaml/Parser.hs:154:13: error:
        • Could not deduce (Data.Map.Internal.Warning YamlValue)
            arising from a use of ‘await’

  `YamlValue` doesn't have a `Semigroup` (or `Monoid`) instance.

- `amazonka-core-1.5.0`: `mempty`: non silent breakage.

    src/Network/AWS/Data/XML.hs:252:49: error:
        • No instance for (containers-0.5.11.0:Data.Map.Internal.Warning
                             Text)
            arising from a use of ‘mempty’
        • In the second argument of ‘Element’, namely ‘mempty’
          In the first argument of ‘NodeElement’, namely
            ‘(Element (name k) mempty [NodeContent v])’
          In the expression:
            NodeElement (Element (name k) mempty [NodeContent v])
        |
    252 |     node (k, v) = NodeElement (Element (name k) mempty
[NodeContent v])


[1]: https://github.com/futurice/haskell-futurice-prelude
[2]: https://ghc.haskell.org/trac/ghc/ticket/12014#comment:6
Post by Chris Wong
David,
Post by David Feuer
Post by Herbert Valerio Riedel
Alright, then let's do a little Gedankenexperiment; what if you release
a containers-0.9.0.0 (which drops the instances) and a
containers-1.0.0.0 (which 'adds back' the desired new instances) on the
same day!
...wouldn't this allow us to have the cake and eat it too? ;-)
It would let us have some cake. Users would be able to test against
0.9, in theory. But they'd have to do it intentionally.
This is what I was trying to get at: that's always the case. It doesn't
matter whether you release it same day, a week later, or a year later. I
have to intentionally *not* skip the 0.9 release. In fact, I'd want to
skip the 0.9 release myself if possible, as supporting both,
containers-0.9 and containers-1.0 in the same consumer code is going to
be awkward enough for me not wanting to do that (or I'd have to
basically not use the instances at all); If I care about the new
instances, I'd rather specify `containers ^>= 1.0.0.0` and thus not
support containers-0.9.
I would have proceeded to point out in my Gedankenexperiment, that in
order to use such a container-0.9 release, you'd have to force every
package that depends on `containers` to go through such a 0.9 phase in
lockstep, which in itself poses an ordeal in terms of package update
logistic; and then if you spread the time between the 0.9 and the 1.0
release apart, repeat this dance a 2nd time, with yet another new major
version bump, which requires yet another round of consumer-code reviews
(*because* a major version increment was signalled; and we do that as
API consumers are supposed to pay attention to the major version
increment signal...)
So consequently, if anything, a container-0.9 release would be a kind of
an artificial pseudo-release anyway IMO. You could even just condense it
into a cabal package flag of a containers-1.0 release, which disables
the Monoid/Semigroup instances; then you don't even need a distinct
container-0.9 release at all, nor do you have to contaminate the API
progression with an artificial container-0.9 discontinuity.
Post by David Feuer
And Stack-based projects would probably need some shenanigans to deal
with a version of containers that doesn't come with GHC.
I'm not convinced we should let the weakest link hold us back. If
there's limitations in the tooling, we should rather address them, rather
than contort ourselves; c.f. the Genius Tailor
( http://www.wealthculture.cn/infoShow.asp?nid=880&sort=Article%20List )
Post by David Feuer
So I really think that avoiding these subtle bugs requires at least a
full GHC release cycle.
I don't think waiting a full GHC release would be either necessary nor
effective at addressing your concerns, which I consider to be disputable
arguments to begin with.
-- hvr
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Carter Schonwald
2018-02-17 16:41:42 UTC
Permalink
Plus one to Herbert’s idea. Let’s do a major version bump and tear the
bandaid. The current left biased default ones are a recurrent source of
errors and bugs in code, in my experiences thereof.
Post by Chris Wong
David,
[...]
Post by David Feuer
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
Why does this need to be a such a dreadfully long-winded process? All we
need is a way to somehow signal a semantic change in the exposed API --
and it turns out that we actually already have the technology for this!
This is exactly one of the core ideas of "semantic versioning" (as a
dep-solver-computation-friendly proxy for machine-checkable formal API
contracts...) and it's got even "semantic" in its name!
In other words, the proposal can be greatly simplified to
1. Make a new major release (or maybe even make it a super-major
release, i.e. to `containers-1.0.0.0` for extra saliency) replacing
the instances with the more desirable ones; describe the change
prominently in the changelog.
Profit!
Life's short; do we really need a multi-generational journey where the
original supporters may not even be around anymore to see the proposal
reach its final destination... ;-)
Cheers,
HVR
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
M Farkas-Dyck
2018-02-13 23:44:15 UTC
Permalink
I am strongly in favor of parts 1 and 2, and HVR's proposal for
immediately defining new instances with a major version bump — this is
what major versions are for, and if we leave them undefined, it will
encourage users to define orphan instances.
Henning Thielemann
2018-02-14 01:35:58 UTC
Permalink
Post by David Feuer
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
I like the plan. Re-adding with Semigroup superclass would prevent silent
re-interpretation of code if it is generic.
Akio Takano
2018-02-14 08:49:04 UTC
Permalink
Hi David,
Post by David Feuer
Many people have recognized for years that the Semigroup and Monoid
instances for Data.Map, Data.IntMap, and Data.HashMap are not so
great. In particular, when the same key is present in both maps, they
simply use the value from the first argument, ignoring the other one.
This somewhat counter-intuitive behavior can lead to bugs. See, for
example, the discussion in Tim Humphries's blog post[*]. I would like
I'm against this proposal because I think the improvement is, if not
negative, too small to justify breaking existing code.

I can think of 3 reasonable definitions for (<>) for lazy maps:

1. (<>) = union
2. (<>) = unionWith (<>)
3. (<>) = Strict.unionWith (<>)

Of these, (1) is by far the most useful operation in my experience.
(2) has the advantage that it seems like the most obvious choice, but
it's not very useful in practice. (3) is slightly more useful than
(2), but feels less canonical. Also (3) seems inconsistent with how
fmap is defined.

So there doesn't seem to be a clear winner. Perhaps there shouldn't
have been a Monoid instance for maps in the first place. Given that we
already have one, however, it seems that the marginal gain of removing
it doesn't justify the cost of breaking code.

(If the removal is going to happen anyway, I vote for not adding back
any instance.)

Regards,
Takano Akio
Evan Laforge
2018-02-14 09:08:33 UTC
Permalink
Post by Akio Takano
I'm against this proposal because I think the improvement is, if not
negative, too small to justify breaking existing code.
I agree for the same reasons.

I also find that unions is the most common. In my code, I have many
uses of (<>) with the union meaning... it's hard to grep for, but I
guess in the 40-80 range. Meanwhile, I have a Util.Map.mappend that
also mappends the values, and it's used in 2 places (out of around
140k lines). It wouldn't be so bad to have to convert them all to
Map.union, but doesn't seem worth the bother either.
Mario Blažević
2018-02-14 14:47:55 UTC
Permalink
Post by Evan Laforge
Post by Akio Takano
I'm against this proposal because I think the improvement is, if not
negative, too small to justify breaking existing code.
I agree for the same reasons.
I also find that unions is the most common. In my code, I have many
uses of (<>) with the union meaning... it's hard to grep for, but I
guess in the 40-80 range. Meanwhile, I have a Util.Map.mappend that
also mappends the values, and it's used in 2 places (out of around
140k lines). It wouldn't be so bad to have to convert them all to
Map.union, but doesn't seem worth the bother either.
I understand it's hard to grep, but can you guess how many of those
40-80 uses of (<>) with the union meaning actually depend on the
left-biased semantics? I've looked at my code, and it turns out that
wherever this operation appears, the two maps are guaranteed to have
distinct keys.

There is actually one use I'm not sure about, I'll have to make sure
there's no bug hiding there. That is one reason I'm supporting the
proposal, by the way: the existing behaviour that silently drops values
is much more likely to cause bugs.
Joachim Breitner
2018-02-14 15:03:09 UTC
Permalink
Hi,
Post by Mario Blažević
I understand it's hard to grep, but can you guess how many of those
40-80 uses of (<>) with the union meaning actually depend on the
left-biased semantics? I've looked at my code, and it turns out that
wherever this operation appears, the two maps are guaranteed to have
distinct keys.
does this indicate the need for

disjointUnion :: IntMap a -> IntMap a -> IntMap a

that throws an error when the maps are not disjoint? (run-time error
only, because we are not doing theorem proving here
)

One can get the effect with

unionWith (error "not disjoint)

of course (and I have done so), but having `disjointUnion` in the API
might remind people that they have to think about which variant they
want.

Cheers,
Joachim
--
Joachim Breitner
***@joachim-breitner.de
http://www.joachim-breitner.de/
David Feuer
2018-02-14 15:07:52 UTC
Permalink
This, of course, is another argument for Kris's idea of just offering
a pile of newtypes, each with their own Semigroup (and sometimes
Monoid) instance.

On Wed, Feb 14, 2018 at 10:03 AM, Joachim Breitner
Post by Joachim Breitner
Hi,
Post by Mario Blažević
I understand it's hard to grep, but can you guess how many of those
40-80 uses of (<>) with the union meaning actually depend on the
left-biased semantics? I've looked at my code, and it turns out that
wherever this operation appears, the two maps are guaranteed to have
distinct keys.
does this indicate the need for
disjointUnion :: IntMap a -> IntMap a -> IntMap a
that throws an error when the maps are not disjoint? (run-time error
only, because we are not doing theorem proving here…)
One can get the effect with
unionWith (error "not disjoint)
of course (and I have done so), but having `disjointUnion` in the API
might remind people that they have to think about which variant they
want.
Cheers,
Joachim
--
Joachim Breitner
http://www.joachim-breitner.de/
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Mario Blažević
2018-02-14 16:13:44 UTC
Permalink
Post by David Feuer
This, of course, is another argument for Kris's idea of just offering
a pile of newtypes, each with their own Semigroup (and sometimes
Monoid) instance.
I would gladly use disjointUnion where appropriate, but I don't think
I'd ever use a newtype providing the same effect. Why? There are two
situations when I depend on union* or (<>) on maps:

1. Directly at the top level, where I can use either union or (<>) on
two known maps. In this situation I often use (<>) only because (a <> b)
is shorter and easier to read then (union a b), but that criterion
points the other way if I have to say (fromDisjoint (Disjoint a <>
Disjoint b)) instead of (disjointUnion a b).

2. Indirectly when folding a Foldable collection of maps. In this case I
have no choice but to use (<>), but I really can't think of any occasion
when all maps in the collection are guaranteed to have distinct keys.
Post by David Feuer
On Wed, Feb 14, 2018 at 10:03 AM, Joachim Breitner
Post by Joachim Breitner
Hi,
Post by Mario Blažević
I understand it's hard to grep, but can you guess how many of those
40-80 uses of (<>) with the union meaning actually depend on the
left-biased semantics? I've looked at my code, and it turns out that
wherever this operation appears, the two maps are guaranteed to have
distinct keys.
does this indicate the need for
disjointUnion :: IntMap a -> IntMap a -> IntMap a
that throws an error when the maps are not disjoint? (run-time error
only, because we are not doing theorem proving here…)
One can get the effect with
unionWith (error "not disjoint)
of course (and I have done so), but having `disjointUnion` in the API
might remind people that they have to think about which variant they
want.
Matt Renaud
2018-02-14 16:17:52 UTC
Permalink
Minor nomenclature note: disjoint union has a specific meaning
<https://en.wikipedia.org/wiki/Disjoint_union> in set theory which is
different than how its being used here.
Post by Mario Blažević
Post by David Feuer
This, of course, is another argument for Kris's idea of just offering
a pile of newtypes, each with their own Semigroup (and sometimes
Monoid) instance.
I would gladly use disjointUnion where appropriate, but I don't
think I'd ever use a newtype providing the same effect. Why? There are two
1. Directly at the top level, where I can use either union or (<>) on two
known maps. In this situation I often use (<>) only because (a <> b) is
shorter and easier to read then (union a b), but that criterion points the
other way if I have to say (fromDisjoint (Disjoint a <> Disjoint b))
instead of (disjointUnion a b).
2. Indirectly when folding a Foldable collection of maps. In this case I
have no choice but to use (<>), but I really can't think of any occasion
when all maps in the collection are guaranteed to have distinct keys.
On Wed, Feb 14, 2018 at 10:03 AM, Joachim Breitner
Post by David Feuer
Post by Joachim Breitner
Hi,
Post by Mario Blažević
I understand it's hard to grep, but can you guess how many of those
40-80 uses of (<>) with the union meaning actually depend on the
left-biased semantics? I've looked at my code, and it turns out that
wherever this operation appears, the two maps are guaranteed to have
distinct keys.
does this indicate the need for
disjointUnion :: IntMap a -> IntMap a -> IntMap a
that throws an error when the maps are not disjoint? (run-time error
only, because we are not doing theorem proving here
)
One can get the effect with
unionWith (error "not disjoint)
of course (and I have done so), but having `disjointUnion` in the API
might remind people that they have to think about which variant they
want.
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Evan Laforge
2018-02-15 21:25:56 UTC
Permalink
Post by Mario Blažević
I understand it's hard to grep, but can you guess how many of those
40-80 uses of (<>) with the union meaning actually depend on the left-biased
semantics? I've looked at my code, and it turns out that wherever this
operation appears, the two maps are guaranteed to have distinct keys.
I know that in a few cases I do explicitly want "left wins", and I do
frequently use (x<>) as a way to accumulate new data. Sometimes I
expect that new data to override old stuff, sometimes I don't expect
them to overlap, but it's not a problem if they do. In the cases
where it is a problem (e.g. warn about shadowed names or something), I
have

Util.Map.unique_union :: Ord k =>
Map.Map k a -> Map.Map k a -> (Map.Map k a, Map.Map k a)
-- ^ (union, rejected)

I can grep for that: 3 calls, plus 11 more for the version that takes
[(k, a)]. However, in the absence of actual numbers for (<>), don't
take me too seriously. These kinds of feelings are often shown wrong
by the real numbers. But even if I could count them, it's not like it
proves anything. It comes down to style and problem domain.

That said though, is there a way to grep for methods overloaded at a
certain type? One thing I thought of was recompile Data.Map with a
TypeError instance, but it seems it might be a hassle for a base
package like containers. Otherwise I imagine you could parse core,
look for resolved methods, and map back to file and line. Is there a
general purpose core analysis suite out there?
Post by Mario Blažević
There is actually one use I'm not sure about, I'll have to make sure
there's no bug hiding there. That is one reason I'm supporting the proposal,
by the way: the existing behaviour that silently drops values is much more
likely to cause bugs.
It's true it does, but since I think of (<>) as union, and union does
that too, it doesn't surprise me. Actually the surprising thing was
that Map.union is left-biased, while it seems like most languages go
the other way. But that was just a general haskell oddity that I
accustomed myself to many years ago.
Oliver Charles
2018-02-14 09:25:17 UTC
Permalink
Post by Akio Takano
1. (<>) = union
2. (<>) = unionWith (<>)
3. (<>) = Strict.unionWith (<>)
Of these, (1) is by far the most useful operation in my experience.
(2) has the advantage that it seems like the most obvious choice, but
it's not very useful in practice. (3) is slightly more useful than
(2), but feels less canonical. Also (3) seems inconsistent with how
fmap is defined.
I think you are very much in the minority here. (1) which we have now is
almost never what I want. (2) is much more likely to be what I need. And of
course, you can recover (1) with (2) by simply mapping First into every
element.

So there doesn't seem to be a clear winner. Perhaps there shouldn't
Post by Akio Takano
have been a Monoid instance for maps in the first place. Given that we
already have one, however, it seems that the marginal gain of removing
it doesn't justify the cost of breaking code.
There should be a Monoid instance, and
http://conal.net/papers/type-class-morphisms/ argues that it should be (2).
Akio Takano
2018-02-14 09:38:29 UTC
Permalink
Hi Oliver,
Post by Oliver Charles
Post by Akio Takano
1. (<>) = union
2. (<>) = unionWith (<>)
3. (<>) = Strict.unionWith (<>)
Of these, (1) is by far the most useful operation in my experience.
(2) has the advantage that it seems like the most obvious choice, but
it's not very useful in practice. (3) is slightly more useful than
(2), but feels less canonical. Also (3) seems inconsistent with how
fmap is defined.
I think you are very much in the minority here. (1) which we have now is
almost never what I want. (2) is much more likely to be what I need. And of
course, you can recover (1) with (2) by simply mapping First into every
element.
This is not true if you consider efficiency. Mapping `First` takes
linear time. Even if you use `coerce` instead of `map First`, the
resulting map will contain thunks that were not in either of the
original maps.
Post by Oliver Charles
Post by Akio Takano
So there doesn't seem to be a clear winner. Perhaps there shouldn't
have been a Monoid instance for maps in the first place. Given that we
already have one, however, it seems that the marginal gain of removing
it doesn't justify the cost of breaking code.
There should be a Monoid instance, and
http://conal.net/papers/type-class-morphisms/ argues that it should be (2).
That is an interesting paper, thanks. However, I don't think this is
going to change the fact that I almost never want to use (2) in
practice.

- Akio
Oleg Grenrus
2018-02-14 09:48:14 UTC
Permalink
I agree with Oliver, was just writing the very same.


`Map.map First = Map.map coerce = coerce`

i.e. mapping First is _zero-cost_, thanks to the fact `v` in `Map k v`
has represenational role.
(If there aren't RULES in place, it's a bug).

See Zero-Cost coercions in Haskell
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/coercible.pdf

Cheers, Oleg.
Post by Akio Takano
Hi Oliver,
Post by Oliver Charles
Post by Akio Takano
1. (<>) = union
2. (<>) = unionWith (<>)
3. (<>) = Strict.unionWith (<>)
Of these, (1) is by far the most useful operation in my experience.
(2) has the advantage that it seems like the most obvious choice, but
it's not very useful in practice. (3) is slightly more useful than
(2), but feels less canonical. Also (3) seems inconsistent with how
fmap is defined.
I think you are very much in the minority here. (1) which we have now is
almost never what I want. (2) is much more likely to be what I need. And of
course, you can recover (1) with (2) by simply mapping First into every
element.
This is not true if you consider efficiency. Mapping `First` takes
linear time. Even if you use `coerce` instead of `map First`, the
resulting map will contain thunks that were not in either of the
original maps.
Post by Oliver Charles
Post by Akio Takano
So there doesn't seem to be a clear winner. Perhaps there shouldn't
have been a Monoid instance for maps in the first place. Given that we
already have one, however, it seems that the marginal gain of removing
it doesn't justify the cost of breaking code.
There should be a Monoid instance, and
http://conal.net/papers/type-class-morphisms/ argues that it should be (2).
That is an interesting paper, thanks. However, I don't think this is
going to change the fact that I almost never want to use (2) in
practice.
- Akio
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Akio Takano
2018-02-14 10:16:13 UTC
Permalink
Hi Oleg,
Post by Oleg Grenrus
I agree with Oliver, was just writing the very same.
`Map.map First = Map.map coerce = coerce`
i.e. mapping First is _zero-cost_, thanks to the fact `v` in `Map k v`
has represenational role.
Yes, I agree that mapping First is zero-cost. I'm saying that
`unionWith (<>)` over the First monoid costs more than `union`, due to
allocation of extra thunks.

- Akio
David Feuer
2018-02-14 14:47:47 UTC
Permalink
We can use rewrite rules to recognize and optimize unionWith in one or
more special cases:

unionWith f = plainUnionWith f
{-# NOINLINE [1] unionWith #-}

bottom1, bottom2 :: a
bottom1 = undefined
{-# NOINLINE bottom1 #-}
bottom2 = undefined
{-# NOINLINE bottom2 #-}

unionWithMagic :: a -> (a -> a -> a) -> Map k a -> Map k a -> Map k a
unionWithMagic _ f = plainUnionWith f
{-# INLINE [1] unionWithMagic #-}

RULES:
unionWith f ===> unionWithMagic (f bottom1 bottom2) f
unionWithMagic bottom1 f ===> union
Post by Akio Takano
Hi Oleg,
Post by Oleg Grenrus
I agree with Oliver, was just writing the very same.
`Map.map First = Map.map coerce = coerce`
i.e. mapping First is _zero-cost_, thanks to the fact `v` in `Map k v`
has represenational role.
Yes, I agree that mapping First is zero-cost. I'm saying that
`unionWith (<>)` over the First monoid costs more than `union`, due to
allocation of extra thunks.
- Akio
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Bardur Arantsson
2018-02-15 21:13:57 UTC
Permalink
[ blahblah :) ]

+1 for the original suggestion.

+2 for Kris' newtype-only suggestion -- especially if we can actually
prevent orphan instances effectively. (I think I got that right, it's a
big thread at this point.). IMO, it's fine to force people to at least
*consider* which instance they want absent a "best" answer. I do think
it's important that we are actually able to effectively prevent people
from just declaring orphans willy-nilly.

+0 for _hvr's dual-incompatible releases. It would be a +1, but my worry
here is (as usual) the worry about *forcing* the whole ecosystem to
upgrade all at once. Usually this is not actually a big deal, but
"containers" is used by almost *everything*, so it's effectively a
"split" until everything upgrades. (Obviously, things are not quite are
dramatic as that since, presumably, relatively few packages actually use
the instance in question, but...)

Cheers,
Petr Pudlák
2018-02-16 15:53:53 UTC
Permalink
+1 for the proposal!
Post by David Feuer
Many people have recognized for years that the Semigroup and Monoid
instances for Data.Map, Data.IntMap, and Data.HashMap are not so
great. In particular, when the same key is present in both maps, they
simply use the value from the first argument, ignoring the other one.
This somewhat counter-intuitive behavior can lead to bugs. See, for
example, the discussion in Tim Humphries's blog post[*]. I would like
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
instance (Ord k, Semigroup v) => Semigroup (Map k v) where
(<>) = Data.Map.Strict.unionWith (<>)
instance (Ord k, Semigroup v) => Monoid (Map k v) where
mempty = Data.Map.Strict.empty
instance Semigroup v => Semigroup (IntMap v) where
(<>) = Data.IntMap.Strict.unionWith (<>)
instance Semigroup v => Monoid (IntMap v) where
mempty = Data.IntMap.Strict.empty
instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where
(<>) = Data.HashMap.Strict.unionWith (<>)
instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where
mempty = Data.HashMap.Strict.empty
Why do I want the strict versions? That choice may seem a bit
surprising, since the data structures are lazy. But the lazy versions
really like to leak memory, making them unsuitable for most practical
purposes.
Someone using old code or old tutorial documentation could get subtly
wrong behavior without noticing. That is why I have specified an
extended period between removing the current instances and adding the
desired ones.
1. Remove the instances but don't add the new ones. I fear this may
lead others to write their own orphan instances, which may not even
all do the same thing.
2. Write separate modules with newtype-wrapped versions of the data
structures implementing the desired instances. Unfortunately, this
would be annoying to maintain, and also annoying to use--packages
using the unwrapped and wrapped versions will use different types.
Manually wrapping and unwrapping to make the different types work with
each other will introduce lots of potential for mistakes and
confusion.
Discussion period: three weeks.
[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Tony Morris
2018-02-16 23:31:30 UTC
Permalink
I agree with this proposal, except for the deprecation period, then waiting
for years. Fix it immediately, in my opinion.
Post by David Feuer
Many people have recognized for years that the Semigroup and Monoid
instances for Data.Map, Data.IntMap, and Data.HashMap are not so
great. In particular, when the same key is present in both maps, they
simply use the value from the first argument, ignoring the other one.
This somewhat counter-intuitive behavior can lead to bugs. See, for
example, the discussion in Tim Humphries's blog post[*]. I would like
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.
2. Remove the deprecated instances.
3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
instance (Ord k, Semigroup v) => Semigroup (Map k v) where
(<>) = Data.Map.Strict.unionWith (<>)
instance (Ord k, Semigroup v) => Monoid (Map k v) where
mempty = Data.Map.Strict.empty
instance Semigroup v => Semigroup (IntMap v) where
(<>) = Data.IntMap.Strict.unionWith (<>)
instance Semigroup v => Monoid (IntMap v) where
mempty = Data.IntMap.Strict.empty
instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where
(<>) = Data.HashMap.Strict.unionWith (<>)
instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where
mempty = Data.HashMap.Strict.empty
Why do I want the strict versions? That choice may seem a bit
surprising, since the data structures are lazy. But the lazy versions
really like to leak memory, making them unsuitable for most practical
purposes.
Someone using old code or old tutorial documentation could get subtly
wrong behavior without noticing. That is why I have specified an
extended period between removing the current instances and adding the
desired ones.
1. Remove the instances but don't add the new ones. I fear this may
lead others to write their own orphan instances, which may not even
all do the same thing.
2. Write separate modules with newtype-wrapped versions of the data
structures implementing the desired instances. Unfortunately, this
would be annoying to maintain, and also annoying to use--packages
using the unwrapped and wrapped versions will use different types.
Manually wrapping and unwrapping to make the different types work with
each other will introduce lots of potential for mistakes and
confusion.
Discussion period: three weeks.
[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Loading...