Discussion:
Proposal to add mapAccumL function to vector package
John Ky
2018-11-08 11:30:30 UTC
Permalink
Hello,

I'd like to add the mapAccumL function to the vector package.

Specifically the Data.Vector.Storable module, but it would also be useful
other vector modules.

This is my attempt at an implementation:

{-# LANGUAGE ScopedTypeVariables #-}

mapAccumL :: forall a b c. (Storable b, Storable c)
=> (a -> b -> (a, c))
-> a
-> DVS.Vector b
-> (a, DVS.Vector c)
mapAccumL f a vb = DVS.createT $ do
vc <- DVSM.unsafeNew (DVS.length vb)
a' <- go 0 a vc
return (a', vc)
where go :: Int -> a -> DVS.MVector s c -> ST s a
go i a0 vc = if i < DVS.length vb
then do
let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
DVSM.unsafeWrite vc i c1
go (i + 1) a1 vc
else return a0
{-# INLINE mapAccumL #-}

The implementation should obey the following law:

import qualified Data.List as L
import qualified Data.Vector.Storable as DVS

(DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f a bs

Cheers,

-John
Carter Schonwald
2018-11-08 13:14:09 UTC
Permalink
Hey John
:
I’m happy to help you contribute to to vector.

1). This might not actually be needed with stream fusion on ... though
perhaps this sort of shared computation needs to be its own combinators
because of the sharing meaning that the stream fusion on map and foldl
might not work. (Have you tried doing let x = map ... in let y = foldl ...
in something with x and y? Even eg just writing map accum l in terms of
just that? It could very well fuse ... though I don’t think it can with the
current vector fusion framework. Though I think one of the more exotic
fusion frameworks Amos Robinson did a few years ago could handle that
fusion. Sadly that one requires an ILP solver at compile time. But
there’s some tricks I think we could do )

2) writing it in stream fusion form / as map accum l for streams will
actually be simpler I think

3) put a ticket on our GitHub. I’ve a huge backlog (life and stuff got me a
bit slow), but this sounds like a super reasonable feature request
Post by John Ky
Hello,
I'd like to add the mapAccumL function to the vector package.
Specifically the Data.Vector.Storable module, but it would also be useful
other vector modules.
{-# LANGUAGE ScopedTypeVariables #-}
mapAccumL :: forall a b c. (Storable b, Storable c)
=> (a -> b -> (a, c))
-> a
-> DVS.Vector b
-> (a, DVS.Vector c)
mapAccumL f a vb = DVS.createT $ do
vc <- DVSM.unsafeNew (DVS.length vb)
a' <- go 0 a vc
return (a', vc)
where go :: Int -> a -> DVS.MVector s c -> ST s a
go i a0 vc = if i < DVS.length vb
then do
let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
DVSM.unsafeWrite vc i c1
go (i + 1) a1 vc
else return a0
{-# INLINE mapAccumL #-}
import qualified Data.List as L
import qualified Data.Vector.Storable as DVS
(DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f a bs
Cheers,
-John
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Carter Schonwald
2018-11-08 14:17:58 UTC
Permalink
I thought about it a teeny bit more

This should be trivially definable using mapM or equivalent using state t

Have you tried doing that simple high level definition? I think that works
with vector fusion quite nicely.

... ohhh. I see. There’s two vectors of inputs. I’ll have to think about
this more.
Post by Carter Schonwald
Hey John
I’m happy to help you contribute to to vector.
1). This might not actually be needed with stream fusion on ... though
perhaps this sort of shared computation needs to be its own combinators
because of the sharing meaning that the stream fusion on map and foldl
might not work. (Have you tried doing let x = map ... in let y = foldl ...
in something with x and y? Even eg just writing map accum l in terms of
just that? It could very well fuse ... though I don’t think it can with the
current vector fusion framework. Though I think one of the more exotic
fusion frameworks Amos Robinson did a few years ago could handle that
fusion. Sadly that one requires an ILP solver at compile time. But
there’s some tricks I think we could do )
2) writing it in stream fusion form / as map accum l for streams will
actually be simpler I think
3) put a ticket on our GitHub. I’ve a huge backlog (life and stuff got me
a bit slow), but this sounds like a super reasonable feature request
Post by John Ky
Hello,
I'd like to add the mapAccumL function to the vector package.
Specifically the Data.Vector.Storable module, but it would also be
useful other vector modules.
{-# LANGUAGE ScopedTypeVariables #-}
mapAccumL :: forall a b c. (Storable b, Storable c)
=> (a -> b -> (a, c))
-> a
-> DVS.Vector b
-> (a, DVS.Vector c)
mapAccumL f a vb = DVS.createT $ do
vc <- DVSM.unsafeNew (DVS.length vb)
a' <- go 0 a vc
return (a', vc)
where go :: Int -> a -> DVS.MVector s c -> ST s a
go i a0 vc = if i < DVS.length vb
then do
let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
DVSM.unsafeWrite vc i c1
go (i + 1) a1 vc
else return a0
{-# INLINE mapAccumL #-}
import qualified Data.List as L
import qualified Data.Vector.Storable as DVS
(DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f a bs
Cheers,
-John
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
John Ky
2018-11-08 22:31:48 UTC
Permalink
Hi Carter,

I've created a ticket <https://github.com/haskell/vector/issues/228> as
requested and have reproduced the text here:

I've implemented a hand rolled version, and another two versions based on a
combination of mapM and the lazy and strict versions of State monad.

haskell-works/hw-prim#38 <https://github.com/haskell-works/hw-prim/pull/38>

The benchmarks show that the hand rolled versions run two times faster than
the lazy state monad version and 16 times faster than the strict state
monad.

I found the slow performance of the strict monad version most surprising.

I'm aware that the version that using mapM might enable fusion, however it
is a fair bit slower than a hand rolled version that defeats fusion.

I would love to have a fusion-enabled version that runs as fast as the hand
rolled version. Would that be possible?


Cheers,


-John
Post by Carter Schonwald
I thought about it a teeny bit more
This should be trivially definable using mapM or equivalent using state t
Have you tried doing that simple high level definition? I think that
works with vector fusion quite nicely.
... ohhh. I see. There’s two vectors of inputs. I’ll have to think about
this more.
On Thu, Nov 8, 2018 at 8:14 AM Carter Schonwald <
Post by Carter Schonwald
Hey John
I’m happy to help you contribute to to vector.
1). This might not actually be needed with stream fusion on ... though
perhaps this sort of shared computation needs to be its own combinators
because of the sharing meaning that the stream fusion on map and foldl
might not work. (Have you tried doing let x = map ... in let y = foldl ...
in something with x and y? Even eg just writing map accum l in terms of
just that? It could very well fuse ... though I don’t think it can with the
current vector fusion framework. Though I think one of the more exotic
fusion frameworks Amos Robinson did a few years ago could handle that
fusion. Sadly that one requires an ILP solver at compile time. But
there’s some tricks I think we could do )
2) writing it in stream fusion form / as map accum l for streams will
actually be simpler I think
3) put a ticket on our GitHub. I’ve a huge backlog (life and stuff got me
a bit slow), but this sounds like a super reasonable feature request
Post by John Ky
Hello,
I'd like to add the mapAccumL function to the vector package.
Specifically the Data.Vector.Storable module, but it would also be
useful other vector modules.
{-# LANGUAGE ScopedTypeVariables #-}
mapAccumL :: forall a b c. (Storable b, Storable c)
=> (a -> b -> (a, c))
-> a
-> DVS.Vector b
-> (a, DVS.Vector c)
mapAccumL f a vb = DVS.createT $ do
vc <- DVSM.unsafeNew (DVS.length vb)
a' <- go 0 a vc
return (a', vc)
where go :: Int -> a -> DVS.MVector s c -> ST s a
go i a0 vc = if i < DVS.length vb
then do
let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
DVSM.unsafeWrite vc i c1
go (i + 1) a1 vc
else return a0
{-# INLINE mapAccumL #-}
import qualified Data.List as L
import qualified Data.Vector.Storable as DVS
(DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f a bs
Cheers,
-John
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
John Ky
2018-11-08 22:53:41 UTC
Permalink
After removing the load time of the file from benchmarking, the difference
between mapM and hand rolled version is even more stark 😱

benchmarking medium.csv/mapAccumL for DVS.Vector Word64
time 2.371 ms (2.347 ms .. 2.396 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 2.442 ms (2.420 ms .. 2.477 ms)
std dev 91.95 ÎŒs (57.33 ÎŒs .. 137.7 ÎŒs)
variance introduced by outliers: 23% (moderately inflated)

benchmarking medium.csv/mapAccumLViaStrictState for DVS.Vector Word64
time 604.4 ms (479.0 ms .. 700.5 ms)
0.995 R² (0.983 R² .. 1.000 R²)
mean 643.8 ms (607.3 ms .. 695.2 ms)
std dev 51.95 ms (13.07 ms .. 69.16 ms)
variance introduced by outliers: 21% (moderately inflated)

benchmarking medium.csv/mapAccumLViaLazyState for DVS.Vector Word64
time 62.08 ms (61.14 ms .. 63.23 ms)
0.999 R² (0.999 R² .. 1.000 R²)
mean 63.16 ms (62.43 ms .. 65.35 ms)
std dev 2.013 ms (637.6 ÎŒs .. 3.434 ms)
Post by John Ky
Hi Carter,
I've created a ticket <https://github.com/haskell/vector/issues/228> as
I've implemented a hand rolled version, and another two versions based on
a combination of mapM and the lazy and strict versions of State monad.
haskell-works/hw-prim#38
<https://github.com/haskell-works/hw-prim/pull/38>
The benchmarks show that the hand rolled versions run two times faster
than the lazy state monad version and 16 times faster than the strict state
monad.
I found the slow performance of the strict monad version most surprising.
I'm aware that the version that using mapM might enable fusion, however
it is a fair bit slower than a hand rolled version that defeats fusion.
I would love to have a fusion-enabled version that runs as fast as the
hand rolled version. Would that be possible?
Cheers,
-John
Post by Carter Schonwald
I thought about it a teeny bit more
This should be trivially definable using mapM or equivalent using state t
Have you tried doing that simple high level definition? I think that
works with vector fusion quite nicely.
... ohhh. I see. There’s two vectors of inputs. I’ll have to think
about this more.
On Thu, Nov 8, 2018 at 8:14 AM Carter Schonwald <
Post by Carter Schonwald
Hey John
I’m happy to help you contribute to to vector.
1). This might not actually be needed with stream fusion on ... though
perhaps this sort of shared computation needs to be its own combinators
because of the sharing meaning that the stream fusion on map and foldl
might not work. (Have you tried doing let x = map ... in let y = foldl ...
in something with x and y? Even eg just writing map accum l in terms of
just that? It could very well fuse ... though I don’t think it can with the
current vector fusion framework. Though I think one of the more exotic
fusion frameworks Amos Robinson did a few years ago could handle that
fusion. Sadly that one requires an ILP solver at compile time. But
there’s some tricks I think we could do )
2) writing it in stream fusion form / as map accum l for streams will
actually be simpler I think
3) put a ticket on our GitHub. I’ve a huge backlog (life and stuff got
me a bit slow), but this sounds like a super reasonable feature request
Post by John Ky
Hello,
I'd like to add the mapAccumL function to the vector package.
Specifically the Data.Vector.Storable module, but it would also be
useful other vector modules.
{-# LANGUAGE ScopedTypeVariables #-}
mapAccumL :: forall a b c. (Storable b, Storable c)
=> (a -> b -> (a, c))
-> a
-> DVS.Vector b
-> (a, DVS.Vector c)
mapAccumL f a vb = DVS.createT $ do
vc <- DVSM.unsafeNew (DVS.length vb)
a' <- go 0 a vc
return (a', vc)
where go :: Int -> a -> DVS.MVector s c -> ST s a
go i a0 vc = if i < DVS.length vb
then do
let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
DVSM.unsafeWrite vc i c1
go (i + 1) a1 vc
else return a0
{-# INLINE mapAccumL #-}
import qualified Data.List as L
import qualified Data.Vector.Storable as DVS
(DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f a bs
Cheers,
-John
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Carter Schonwald
2018-11-09 00:54:03 UTC
Permalink
great

the one using streams for your hand rolled stuff would be more
instructive... your proposed definiton cant fuse.

i'll switch to the ticket
Post by John Ky
After removing the load time of the file from benchmarking, the difference
between mapM and hand rolled version is even more stark 😱
benchmarking medium.csv/mapAccumL for DVS.Vector Word64
time 2.371 ms (2.347 ms .. 2.396 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 2.442 ms (2.420 ms .. 2.477 ms)
std dev 91.95 ÎŒs (57.33 ÎŒs .. 137.7 ÎŒs)
variance introduced by outliers: 23% (moderately inflated)
benchmarking medium.csv/mapAccumLViaStrictState for DVS.Vector Word64
time 604.4 ms (479.0 ms .. 700.5 ms)
0.995 R² (0.983 R² .. 1.000 R²)
mean 643.8 ms (607.3 ms .. 695.2 ms)
std dev 51.95 ms (13.07 ms .. 69.16 ms)
variance introduced by outliers: 21% (moderately inflated)
benchmarking medium.csv/mapAccumLViaLazyState for DVS.Vector Word64
time 62.08 ms (61.14 ms .. 63.23 ms)
0.999 R² (0.999 R² .. 1.000 R²)
mean 63.16 ms (62.43 ms .. 65.35 ms)
std dev 2.013 ms (637.6 ÎŒs .. 3.434 ms)
Post by John Ky
Hi Carter,
I've created a ticket <https://github.com/haskell/vector/issues/228> as
I've implemented a hand rolled version, and another two versions based on
a combination of mapM and the lazy and strict versions of State monad.
haskell-works/hw-prim#38
<https://github.com/haskell-works/hw-prim/pull/38>
The benchmarks show that the hand rolled versions run two times faster
than the lazy state monad version and 16 times faster than the strict state
monad.
I found the slow performance of the strict monad version most surprising.
I'm aware that the version that using mapM might enable fusion, however
it is a fair bit slower than a hand rolled version that defeats fusion.
I would love to have a fusion-enabled version that runs as fast as the
hand rolled version. Would that be possible?
Cheers,
-John
Post by Carter Schonwald
I thought about it a teeny bit more
This should be trivially definable using mapM or equivalent using state t
Have you tried doing that simple high level definition? I think that
works with vector fusion quite nicely.
... ohhh. I see. There’s two vectors of inputs. I’ll have to think
about this more.
On Thu, Nov 8, 2018 at 8:14 AM Carter Schonwald <
Post by Carter Schonwald
Hey John
I’m happy to help you contribute to to vector.
1). This might not actually be needed with stream fusion on ... though
perhaps this sort of shared computation needs to be its own combinators
because of the sharing meaning that the stream fusion on map and foldl
might not work. (Have you tried doing let x = map ... in let y = foldl ...
in something with x and y? Even eg just writing map accum l in terms of
just that? It could very well fuse ... though I don’t think it can with the
current vector fusion framework. Though I think one of the more exotic
fusion frameworks Amos Robinson did a few years ago could handle that
fusion. Sadly that one requires an ILP solver at compile time. But
there’s some tricks I think we could do )
2) writing it in stream fusion form / as map accum l for streams will
actually be simpler I think
3) put a ticket on our GitHub. I’ve a huge backlog (life and stuff got
me a bit slow), but this sounds like a super reasonable feature request
Post by John Ky
Hello,
I'd like to add the mapAccumL function to the vector package.
Specifically the Data.Vector.Storable module, but it would also be
useful other vector modules.
{-# LANGUAGE ScopedTypeVariables #-}
mapAccumL :: forall a b c. (Storable b, Storable c)
=> (a -> b -> (a, c))
-> a
-> DVS.Vector b
-> (a, DVS.Vector c)
mapAccumL f a vb = DVS.createT $ do
vc <- DVSM.unsafeNew (DVS.length vb)
a' <- go 0 a vc
return (a', vc)
where go :: Int -> a -> DVS.MVector s c -> ST s a
go i a0 vc = if i < DVS.length vb
then do
let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
DVSM.unsafeWrite vc i c1
go (i + 1) a1 vc
else return a0
{-# INLINE mapAccumL #-}
import qualified Data.List as L
import qualified Data.Vector.Storable as DVS
(DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f a bs
Cheers,
-John
_______________________________________________
Libraries mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Loading...