Proposal: Add a splitBy / splitOn in Data.List

Discussion:

Saurabh Nanda

2018-11-01 17:33:37 UTC

This has certainly been discussed before. A quick Google search turned up
the following past discussions:

- https://mail.haskell.org/pipermail/libraries/2006-July/005494.html
- https://mail.haskell.org/pipermail/libraries/2012-July/018228.html

Is there anything blocking this discussion & implementation? Anything that
can be done to unblock it?

-- Saurabh.

Edward Kmett

2018-11-02 05:51:18 UTC

Permalink

The main thing that prevented it from going into base is the number of
subtleties about what precisely it means to properly "split" something.

Most languages make fairly arbitrary calls on topics such as:

* Do you split on list elements (e.g. ',') or list of elements, so you can
multi-character delimiters ", "? What about multiple types of thing that
are all delimiters, e.g. any whitespace character?
* What do you do with the delimiters?
* What happens with runs of delimiters?
* What about initial or final runs of delimiters (e.g. leading spaces)?

The end result was that a split package was written by Brent Yorgey back in
2008 or so that rather comprehensively covers the design space, and it was
incorporated into the Haskell Platform.

http://hackage.haskell.org/package/split-0.2.3.3/docs/Data-List-Split.html

-Edward

Post by Saurabh Nanda
This has certainly been discussed before. A quick Google search turned up
- https://mail.haskell.org/pipermail/libraries/2006-July/005494.html
- https://mail.haskell.org/pipermail/libraries/2012-July/018228.html
Is there anything blocking this discussion & implementation? Anything that
can be done to unblock it?
-- Saurabh.
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Elliot Cameron

2018-11-02 11:38:45 UTC

Permalink

Despite these subtleties, I must confess I've often wanted to whip up a
quick script and been frustrated that these functions are missing from
base. For example using Haskell as a sed/awk alternative can be pleasant
*if* the functions you need are in base. What's more, in many years I've
only really wanted one or two versions of this.

What if we added the most flexible of versions and included only that? This
version would accept multicharacter delimiters, always throw them away, and
always produce a new entry in the result for every occurrence of the
delimiter. If you don't want the empty entries, you can filter. If you
don't want leading, you can dropWhile. If you want the delimiters back, you
can map. This seems like a nice trade-off for just being available in base.

Post by Edward Kmett
The main thing that prevented it from going into base is the number of
subtleties about what precisely it means to properly "split" something.
* Do you split on list elements (e.g. ',') or list of elements, so you
can multi-character delimiters ", "? What about multiple types of thing
that are all delimiters, e.g. any whitespace character?
* What do you do with the delimiters?
* What happens with runs of delimiters?
* What about initial or final runs of delimiters (e.g. leading spaces)?
The end result was that a split package was written by Brent Yorgey back
in 2008 or so that rather comprehensively covers the design space, and it
was incorporated into the Haskell Platform.
http://hackage.haskell.org/package/split-0.2.3.3/docs/Data-List-Split.html
-Edward

Post by Saurabh Nanda
This has certainly been discussed before. A quick Google search turned up
- https://mail.haskell.org/pipermail/libraries/2006-July/005494.html
- https://mail.haskell.org/pipermail/libraries/2012-July/018228.html
Is there anything blocking this discussion & implementation? Anything
that can be done to unblock it?
-- Saurabh.
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Vanessa McHale

2018-11-02 12:01:35 UTC

Permalink

cabal now has the ability to be used for
[scripting](https://github.com/haskell/cabal/pull/5483#issuecomment-409633079)
which I think addresses your use case (at least, it's easier than
forking base...).

Post by Elliot Cameron
Despite these subtleties, I must confess I've often wanted to whip up
a quick script and been frustrated that these functions are missing
from base. For example using Haskell as a sed/awk alternative can be
pleasant *if* the functions you need are in base. What's more, in many
years I've only really wanted one or two versions of this.
What if we added the most flexible of versions and included only that?
This version would accept multicharacter delimiters, always throw them
away, and always produce a new entry in the result for every
occurrence of the delimiter. If you don't want the empty entries, you
can filter. If you don't want leading, you can dropWhile. If you want
the delimiters back, you can map. This seems like a nice trade-off for
just being available in base.
The main thing that prevented it from going into base is the
number of subtleties about what precisely it means to properly
"split" something.
* Do you split on list elements (e.g. ',') or list of elements, so
you can multi-character delimitersÂ ", "? What about multiple types
of thing that are all delimiters, e.g. any whitespace character?
* What do you do with the delimiters?
* What happens with runs of delimiters?Â
* What about initial or final runs of delimiters (e.g. leading spaces)?
The end result was that a split package was written by Brent
Yorgey back in 2008 or so that rather comprehensively covers the
design space, and it was incorporated into the Haskell Platform.
http://hackage.haskell.org/package/split-0.2.3.3/docs/Data-List-Split.html
-Edward
On Thu, Nov 1, 2018 at 1:34 PM Saurabh Nanda
This has certainly been discussed before. A quick Google
* https://mail.haskell.org/pipermail/libraries/2006-July/005494.html
* https://mail.haskell.org/pipermail/libraries/2012-July/018228.html
Is there anything blocking this discussion & implementation?
Anything that can be done to unblock it?
-- Saurabh.
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

--
*Vanessa McHale*
Functional Compiler Engineer | Chicago, IL

Website: www.iohk.io <http://iohk.io>
Twitter: @vamchale
PGP Key ID: 4209B7B5

Input Output <http://iohk.io>

Twitter <https://twitter.com/InputOutputHK> Github
<https://github.com/input-output-hk> LinkedIn
<https://www.linkedin.com/company/input-output-global>

This e-mail and any file transmitted with it are confidential and
intended solely for the use of the recipient(s) to whom it is addressed.
Dissemination, distribution, and/or copying of the transmission by
anyone other than the intended recipient(s) is prohibited. If you have
received this transmission in error please notify IOHK immediately and
delete it from your system. E-mail transmissions cannot be guaranteed to
be secure or error free. We do not accept liability for any loss,
damage, or error arising from this transmission

Theodore Lief Gannon

2018-11-02 12:17:47 UTC

Permalink

If you accept more than one delimiter but drop them, you've lost info about
which one caused each break and can't map them back. It's more generic to
keep them, since you can still filter.

Post by Saurabh Nanda
This has certainly been discussed before. A quick Google search turned
- https://mail.haskell.org/pipermail/libraries/2006-July/005494.html
- https://mail.haskell.org/pipermail/libraries/2012-July/018228.html
Is there anything blocking this discussion & implementation? Anything
that can be done to unblock it?
-- Saurabh.
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Elliot Cameron

2018-11-02 12:55:28 UTC

Permalink

I didn't realize cabal now supported scripting. I suppose that addresses a
large number of my use cases for having this.

I didn't mean choosing different delimiters but only a single multielement
delimiter, albeit that is also not flexible. If we also had a
multicharacter replace function then a single-element split would be more
tolerable.

I'm still in favor of providing one or two of the most common, most
flexible versions of this just to help newcomers from other languages that
have these functions in their standard libraries, but my opinion is not
very strongly held.

Post by Theodore Lief Gannon
If you accept more than one delimiter but drop them, you've lost info
about which one caused each break and can't map them back. It's more
generic to keep them, since you can still filter.

Post by Edward Kmett
The main thing that prevented it from going into base is the number of
subtleties about what precisely it means to properly "split" something.
* Do you split on list elements (e.g. ',') or list of elements, so you
can multi-character delimiters ", "? What about multiple types of thing
that are all delimiters, e.g. any whitespace character?
* What do you do with the delimiters?
* What happens with runs of delimiters?
* What about initial or final runs of delimiters (e.g. leading spaces)?
The end result was that a split package was written by Brent Yorgey
back in 2008 or so that rather comprehensively covers the design space, and
it was incorporated into the Haskell Platform.
http://hackage.haskell.org/package/split-0.2.3.3/docs/Data-List-Split.html
-Edward

_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Dan Burton

2018-11-02 14:43:11 UTC

Permalink

What about just adding Data.List.Split to base?

-- Dan Burton

Post by Elliot Cameron
I didn't realize cabal now supported scripting. I suppose that addresses a
large number of my use cases for having this.
I didn't mean choosing different delimiters but only a single multielement
delimiter, albeit that is also not flexible. If we also had a
multicharacter replace function then a single-element split would be more
tolerable.
I'm still in favor of providing one or two of the most common, most
flexible versions of this just to help newcomers from other languages that
have these functions in their standard libraries, but my opinion is not
very strongly held.

Post by Edward Kmett
The main thing that prevented it from going into base is the number of
subtleties about what precisely it means to properly "split" something.
* Do you split on list elements (e.g. ',') or list of elements, so you
can multi-character delimiters ", "? What about multiple types of
thing that are all delimiters, e.g. any whitespace character?
* What do you do with the delimiters?
* What happens with runs of delimiters?
* What about initial or final runs of delimiters (e.g. leading spaces)?
The end result was that a split package was written by Brent Yorgey
back in 2008 or so that rather comprehensively covers the design space, and
it was incorporated into the Haskell Platform.
http://hackage.haskell.org/package/split-0.2.3.3/docs/Data-List-Split.html
-Edward

Post by Saurabh Nanda
This has certainly been discussed before. A quick Google search turned
-
https://mail.haskell.org/pipermail/libraries/2006-July/005494.html
-
https://mail.haskell.org/pipermail/libraries/2012-July/018228.html
Is there anything blocking this discussion & implementation? Anything
that can be done to unblock it?
-- Saurabh.
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________

Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Henning Thielemann

2018-11-02 14:47:28 UTC

Permalink

Post by Dan Burton
What about just adding Data.List.Split to base?

... and then splitting 'base'? :-)

Elliot Cameron

2018-11-02 14:49:08 UTC

Permalink

Ah in the context of splitting base this seems like a backward move. The
solution must really be to have tooling that can pull in libraries with
minimal friction.

On Fri, Nov 2, 2018 at 10:47 AM Henning Thielemann <

Post by Henning Thielemann

Post by Dan Burton
What about just adding Data.List.Split to base?

... and then splitting 'base'? :-)

Dan Burton

2018-11-02 15:42:26 UTC

Permalink

If and when base is split, then just include Data.List.Split with whatever
package all the other List stuff gets put in. My point is, this module
should live in the same package where the other list functions live.

I'm in favor of splitting base, but things should not be so broken up to
the extreme of having a package just for left-pad. It is possible to find
middle ground.

-- Dan Burton

Post by Elliot Cameron
Ah in the context of splitting base this seems like a backward move. The
solution must really be to have tooling that can pull in libraries with
minimal friction.
On Fri, Nov 2, 2018 at 10:47 AM Henning Thielemann <

Post by Henning Thielemann

Post by Dan Burton
What about just adding Data.List.Split to base?

... and then splitting 'base'? :-)

Bryan Richter

2018-11-03 14:48:36 UTC

Permalink

+1 to adding a single function that splits a list by a multi-element
delimiter, e.g. the hypothetical

Post by Dan Burton

Post by Elliot Cameron

*Data.List.split [a, b] [c, a, b, d, a, b, e, a]

[[c], [d], [e, a]]

The split package seems to heavyweight for base (I know I'd always have to
look up the differences between splitOn, split, chop, and divvy), and more
sophisticated needs should probably be filled by a special-purpose parser.

I would even say it might make sense to just restrict the function to
Strings, unless there is widespread need for supporting Lists in general.

Post by Dan Burton
If and when base is split, then just include Data.List.Split with whatever
package all the other List stuff gets put in. My point is, this module
should live in the same package where the other list functions live.
I'm in favor of splitting base, but things should not be so broken up to
the extreme of having a package just for left-pad. It is possible to find
middle ground.
-- Dan Burton

Post by Dan Burton
What about just adding Data.List.Split to base?

... and then splitting 'base'? :-)

_______________________________________________

Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries