opt out of accursedUnutterablePerformIO

Post by Ben Franksen
I had a program I was working on lately (darcs) crash with a
segmentation fault after I made a seemingly harmless refactoring. The
original code was
{-# INLINE linesPS #-}
linesPS :: B.ByteString -> [B.ByteString]
linesPS ps
| B.null ps = [B.empty]
| otherwise = BC.split '\n' ps
which I wanted to optimize, pasting code from the definition of
Data.ByteString.lines, to
linesPS ps =
case search ps of
Nothing -> [ps]
Just n -> B.take n ps : linesPS (B.drop (n + 1) ps)
where
search = BC.elemIndex '\n'
So I looked at the bytestring library to see if there was something that
could explain the crash. I found that it uses
accursedUnutterablePerformIO all over the place.
The dire warnings accompanying this "function" (including the citation
of a number of problem reports against commonly used libraries) made me
think that it may be worthwhile to offer an opt-out for users of
libraries like bytestring or text. (Note that I am not claiming my
particular crash is due to a bug in bytestring, I merely want to exclude
the possibility.)
For the libraries in question it would be simple to do this: just add a
cabal flag to optionally disable use of accursedUnutterablePerformIO.
That still leaves the question of how users can make their project
depend on a bytestring that has been built with this flag. I know this
can be done with manual installation of a new version of the library,
but I would rather use cabal new-build (as I am used to) and let it
figure out itself that it has to attach a new hash to the variant with
the flag.

I'm afraid it's not possible to provide the interfaces exposed by bytestring without some form of unsafety. Lazy IO alone requires unsafeInterleaveIO and the bytestring indexing operations require at very least unsafePerformIO since GHC treats access to foreign memory as an effect.

accursedUnutterablePerformIO is an optimized form of unsafePerformIO which likely won't cause any issues that wouldn't otherwise manifest with plain unsafePerformIO. Consequently I am not sure it's worth providing a means to disable its usage.

Cheers,

- Ben

Ben Franksen

2018-03-04 21:11:58 UTC

On March 4, 2018 5:19:26 AM EST, Ben Franksen

Post by Ben Franksen
I had a program I was working on lately (darcs) crash with a
segmentation fault after I made a seemingly harmless refactoring.
[...]>> So I looked at the bytestring library to see if there was
something that could explain the crash. I found that it uses
accursedUnutterablePerformIO all over the place.
The dire warnings accompanying this "function" (including the
citation of a number of problem reports against commonly used
libraries) made me think that it may be worthwhile to offer an
opt-out for users of libraries like bytestring or text. (Note that
I am not claiming my particular crash is due to a bug in
bytestring, I merely want to exclude the possibility.)
For the libraries in question it would be simple to do this: just
add a cabal flag to optionally disable use of
accursedUnutterablePerformIO.
That still leaves the question of how users can make their project
depend on a bytestring that has been built with this flag. I know
this can be done with manual installation of a new version of the
library, but I would rather use cabal new-build (as I am used to)
and let it figure out itself that it has to attach a new hash to
the variant with the flag.

I am well aware of that.

accursedUnutterablePerformIO is an optimized form of unsafePerformIO
which likely won't cause any issues that wouldn't otherwise manifest
with plain unsafePerformIO. Consequently I am not sure it's worth
providing a means to disable its usage.

I envy your confidence. The documentation mentions two bytestring
commits that fix bugs by reverting accursedUnutterablePerformIO to
unsafePerformIO. I guess this is plain evidence that there can in fact
be "issues that wouldn't otherwise manifest with plain unsafePerformIO".

Anyway. Would you perchance have any idea what could possibly make a
program work fine with the first version of linesPS and crash with the
second one? I find this pretty scary and would like to understand it.

Cheers
Ben

Ben Gamari

2018-03-04 21:48:00 UTC

Post by Ben Gamari
I'm afraid it's not possible to provide the interfaces exposed by
bytestring without some form of unsafety. Lazy IO alone requires
unsafeInterleaveIO and the bytestring indexing operations require at
very least unsafePerformIO since GHC treats access to foreign memory
as an effect.

I am well aware of that.

Post by Ben Gamari
accursedUnutterablePerformIO is an optimized form of unsafePerformIO
which likely won't cause any issues that wouldn't otherwise manifest
with plain unsafePerformIO. Consequently I am not sure it's worth
providing a means to disable its usage.

Ahh yes, I had forgotten that accursedUnutterablePerformIO is
implemented directly in terms of realWorld#. This implementation is
terribly bug-prone since floating can result in inappropriate sharing,
as seen in the tickets referred to in the documentation.

However, GHC 8.0 and later offer a much safer primitive, runRW#, which
can be used to implement things like accursedUnutterablePerformIO
without fear of over-zealous simplification. I believe the issues
pointed out in documentation could not have happened with an
implementation built on runRW#. Moreover, I highly doubt that the
switching to runRW# would incur any measurable performance penalty.

Post by Ben Franksen
Anyway. Would you perchance have any idea what could possibly make a
program work fine with the first version of linesPS and crash with the
second one? I find this pretty scary and would like to understand it.

Indeed that is quite scary. The cause is not at all obvious. Do you have
an isolated reproducer that you could share?

Cheers,

- Ben

David Feuer

2018-03-04 21:51:41 UTC

For the record, the "public" version of runRW# is called
unsafeDupablePerformIO. It would be a good idea to find out if it can be
used throughout bytestring without a significant penalty. If so, that would
eliminate a bunch of scary stuff. If not, it would be nice to understand
why.

I am well aware of that.

Ahh yes, I had forgotten that accursedUnutterablePerformIO is
implemented directly in terms of realWorld#. This implementation is
terribly bug-prone since floating can result in inappropriate sharing,
as seen in the tickets referred to in the documentation.
However, GHC 8.0 and later offer a much safer primitive, runRW#, which
can be used to implement things like accursedUnutterablePerformIO
without fear of over-zealous simplification. I believe the issues
pointed out in documentation could not have happened with an
implementation built on runRW#. Moreover, I highly doubt that the
switching to runRW# would incur any measurable performance penalty.

Indeed that is quite scary. The cause is not at all obvious. Do you have
an isolated reproducer that you could share?
Cheers,
- Ben
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

John Wiegley

2018-03-05 05:07:38 UTC

DF> For the record, the "public" version of runRW# is called
DF> unsafeDupablePerformIO. It would be a good idea to find out if it can
DF> be used throughout bytestring without a significant penalty. If so,
DF> that would eliminate a bunch of scary stuff. If not, it would be nice
DF> to understand why.

In my tests, using unsafeDupablePerformIO instead of
accursedUnutterablePerformIO induces a 15% performance penalty.

--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2

Ben Franksen

2018-03-05 12:56:00 UTC

Thanks. I believe there are cases where people (other than me) would
like to have the option of sacrificing 15% performance to get rid of
aUPerformIO, if only for testing/bug-hunting purposes.

Cheers
Ben

Tom Murphy

2018-03-10 19:32:04 UTC

John, are you able to share this benchmarking code?

Thanks,
Tom

Post by John Wiegley
DF> For the record, the "public" version of runRW# is called
DF> unsafeDupablePerformIO. It would be a good idea to find out if it can
DF> be used throughout bytestring without a significant penalty. If so,
DF> that would eliminate a bunch of scary stuff. If not, it would be nice
DF> to understand why.
In my tests, using unsafeDupablePerformIO instead of
accursedUnutterablePerformIO induces a 15% performance penalty.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

John Wiegley

2018-03-11 23:39:12 UTC

TM> John, are you able to share this benchmarking code?

The code I used for our ICFP paper is here, although it only tests a few
methods in basic ways:

https://github.com/jwiegley/bytestring-fiat/blob/foldr/extract/Bench.hs

However, it should be clear how to add more tests, and to improve what's
there.

--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2

Ben Franksen

2018-03-05 13:47:24 UTC

Indeed that is quite scary. The cause is not at all obvious. Do you have
an isolated reproducer that you could share?

Sharing is no problem, this happened with darcs, which notably has its
own share of dark and dangerous corners. I have considered trying to
cook this down to some small example but my hopes of succeeding with
this are almost nil. As I said, I have no evidence whatsoever that the
problem has even anything to do with that-what-must-not-be-named, it's
been purely a guess.

FWIW, I re-checked that the behavior is reproducible. I have uploaded a
clone of darcs-screened with my patch applied on top to
https://hub.darcs.net/bf/darcs-screened
To reproduce the crash, cabal configure with --enable-tests, then do
./dist...whatever...darcs-test --shell=no
Unpull the last patch and everything works.

Cheers
Ben

David Feuer

2018-03-05 15:24:48 UTC

Does bytestring turn off some optimization flags (e.g., -fno-full-laziness)
to make this work?

Indeed that is quite scary. The cause is not at all obvious. Do you have
an isolated reproducer that you could share?

Ben Franksen

2018-03-05 20:51:30 UTC

Indeed that is quite scary. The cause is not at all obvious. Do you have
an isolated reproducer that you could share?

Sharing is no problem, this happened with darcs, which notably has its
own share of dark and dangerous corners. I have considered trying to
cook this down to some small example but my hopes of succeeding with
this are almost nil. As I said, I have no evidence whatsoever that the
problem has even anything to do with that-what-must-not-be-named, it's
been purely a guess.
FWIW, I re-checked that the behavior is reproducible. I have uploaded a
clone of darcs-screened with my patch applied on top to
https://hub.darcs.net/bf/darcs-screened
To reproduce the crash, cabal configure with --enable-tests, then do
./dist...whatever...darcs-test --shell=no
Unpull the last patch and everything works.

I have managed to track the whole thing down to the interaction with
another utility function for ByteStrings (betweenLinesPS) that works
with the internal representation. The small test program is attached.

Cheers
Ben

Ben Franksen

2018-03-05 21:04:12 UTC

Post by Ben Franksen
I have managed to track the whole thing down to the interaction with
another utility function for ByteStrings (betweenLinesPS) that works
with the internal representation. The small test program is attached.

Oh, and BTW it does not crash when I remove the INLINE annotation for
linesPS2.

Ben Franksen

2018-03-05 21:06:50 UTC

Oh, and BTW it does not crash when I remove the INLINE annotation for
linesPS2.

Ops, wrong, I made a mistake when testing. It should not inline anyway
since it is a recursive function. Sorry for the noise.

Ben Franksen

2018-03-07 02:11:32 UTC

This has nothing to do with <insane>PerformIO.
[...]
You are kind of lucky that linesPS1 works.

Thanks a lot! I did not know that bytestring uses a null pointer for an
empty buffer. Clever. And you somewhat restored my belief in the safety
of bytestring. This use of the Data.ByteString.Internal is indeed unsafe
but for quite humanly understandable reasons and the whole
<whatever>PerformIO was a red herring. I have already written a new
version of betweenLinesPS that uses breakSubstring (not available back
then) and it is now perfectly safe. I have also added a ticket to remind
us to review (purge, if possible) all uses of the Internal BS API.

Cheers
Ben

Ben Franksen

2018-03-05 15:11:51 UTC

Post by Ben Franksen
That still leaves the question of how users can make their project
depend on a bytestring that has been built with this flag. I know this
can be done with manual installation of a new version of the library,
but I would rather use cabal new-build (as I am used to) and let it
figure out itself that it has to attach a new hash to the variant with
the flag.

Looking at the file generated by cabal new-freeze make it pretty clear
how to set flags for dependencies (and apparently cabal does make use of
this for a number of packages).

Cheers
Ben

Ben Franksen

2018-03-05 21:59:58 UTC