Note: The complete code associated with this post can be seen in this repository on the generic-vinyl
branch.
Background
In the last post where we were trying to access Postgres from a dataframe, we had to manually declare an instance of the FromBeam
class for the record at hand (i.e. UserT
). With this manually declared instance we were able to go from the plain record to its vinyl representation.
But can we do better ? For instance, can we just say something like deriveVinyl ''UserT
and have a typeclass instance generated for us automatically? More importantly, how can we make this instance generation work for arbitrary plain Haskell records (i.e. make it generic) ? These are some of the questions tackled in this blog post.
What we want to do
We want to convert from plain Haskell records to the corresponding vinyl
representation. While looking up ways to do this generically, I stumbled across this comment on StackOverflow:
"...At least since vinyl-0.5, their Rec type is structurally isomorphic to NP in generics-sop..." - kosmikus
The above comment gave me quite a hint. I had to dig deeper. Looking at vinyl
’s Rec:
data Rec :: (u -> *) -> [u] -> * where
RNil :: Rec f '[]
(:&) :: !(f r) -> !(Rec f rs) -> Rec f (r ': rs)
And next at the NP (stands for N-ary Product) data type from generics-sop
:
data NP :: (k -> *) -> [k] -> * where
Nil :: NP f '[]
(:*) :: f x -> NP f xs -> NP f (x ': xs)
As can now be seen, both the data declarations are parameterised over a functor (say f) and a type-level list (labels, in case of Rec
); the type of values at any position, say for r :: u
is given by f r :: *
. Note that u
and k
are equivalent. Moreover, given an N-ary Product, by recursing over its structure and replacing its constructors appropriately with those of Rec
, it should be possible to get it into a Rec
representation; the opposite is also possible.
There is the records-sop library that “provides utilities for working with labelled single-constructor record types via generics-sop”. But it doesn’t work with vinyl
records (yet!), which is what we are interested in here. Nevertheless, it is certainly prior work in the same direction as that pursued in this blog post.
So the transformation we are now interested in is:
Plain-records ~~~> Vinyl Representation (Overall goal)
And we know that:
NP Representation ~ Vinyl Representation
Therefore, the following transformation is certainly worth exploring:
Plain-records ---> NP Representation ===> Vinyl Representation (Revised goal)
Note that all the funny looking arrows are an informal, made-up notation of sorts, different from any formal usage of “arrows”. Arrows as used above represent a conversion from one representation to another, and are used in a very informal sense. The second step of the transformation in the ‘Revised goal’ (i.e. “===>”) has a different kind of arrow. This is only to highlight (as we shall see) a different “sort” of transformation from NP
to Rec
in the final implementation, which is in addition to the isomorphism. We shall revisit this point again.
The next section delves into the essentials of the generics-sop
library, which we shall be using to get an “intermediate” representation of plain-records.
A Short Introduction to generics-sop
Note: This section has been adapted from “Applying Type-level and Generic Programming in Haskell” by Andres Löh, who is one of the co-authors of the library. You should read the pdf if you want a deeper dive into generics-sop
and the type-level machinery it leverages.
Moving on; assume there is a typeclass:
class Generic a where
type Rep a
from :: a -> Rep a
to :: Rep a -> a
The core of what generics-sop
enables is that given a type a
that is representable as Rep a
, it is possible to write functions that would work for all such representable data types.
More specifically, if all Rep a
have a commmon structure then it becomes possible to define a generic function:
geq :: Generic a => Rep a -> Rep a -> Bool
We can now implement the following function that works for all representable types:
eq :: Generic a => a -> a -> Bool
eq x y = geq (from x) (from y)
To use a (rough) visual metaphor, a large number of data types can be “unfurled” into a structurally similar representation. Once in this new representation, many sorts of generic functions can be written over this representation. This representation is called ‘Sum of Products’ (the ‘sop’ in generics-sop
; see pdf for more details). And it is also possible to “pack-up” the representation back into the original structure.
The generics-sop
library provides two approaches for “unfurling”/“furling” a
to Rep a
/vice-versa (i.e. comimg up with the from
and to
functions):
A. Using GHC.Generics
B. Using TemplateHaskell
According to the authors of generics-sop
: using option A or B is “…mainly a matter of personal taste. The version based on Template Haskell probably has less run-time overhead.” Now for us, since the Rep
is an intermediate step, we would benefit from the transparency afforded to us by the TemplateHaskell approach (by using some compiler options to inspect the generated code). So ‘option B’ it is.
Next, we try to inspect the output of trying to auto-derive a Generic
instance for the UserT
record. We setup the requisite compiler flags (‘-ddump-splices’, ‘-ddump-to-file’) in the package.yaml
file. After adding the requisite language extensions/imports (see code repo for details), all we need to do is:
-- TestDeriveVinyl.hs
deriveGeneric ''UserT
The above snippet when compiled, brings into scope the following code (along with some other code that we’re not interested in). This can be seen in the TestDeriveVinyl.dump-splices
file (note that some of the type variables have been made to look better (“sugared”) than their original generated form; also certain names have been un-qualified; if you were to compile the repo, then the code is sure to look slightly different):
gsoc-blog-code/src/TestDeriveVinyl.hs:23:1-21: Splicing declarations
deriveGeneric ''UserT
======>
instance Generic (UserT f) where
type Code (UserT f) = '['[Columnar f Text,
Columnar f Text,
Columnar f Text,
Columnar f Bool,
Columnar f Int]]
from (User x1 x2 x3 x4 x5)
= SOP
(Z ((I x1)
:*
((I x2)
:*
((I x3) :* ((I x4) :* ((I x5) :* Nil))))))
to
(SOP (Z ((I x_1)
:*
((I x_2)
:*
((I x_3) :* ((I x_4) :* ((I x_5) :* Nil)))))))
= ((((User x_1) x_2) x_3) x_4) x_5
to _ = error "unreachable"
As can be observed, the from
function unfurls the record values into the SOP representation. And the to
function gets us back to a User record. The Code
type synonym stands for the field types, parametrised over an interpretation functor.
We have our intermediate representation. Next we try to take this to the vinyl
representation.
GenericVinyl FTW!
Consider the following type family and type-class declaration:
-- Vinylize.hs
type family ZipTypes (ns :: [Symbol]) (ys :: [*]) = (zs :: [k]) | zs -> ns ys
type instance ZipTypes '[] '[] = '[]
type instance ZipTypes (n ': ns) (y ': ys) = ( n :-> y) ': (ZipTypes ns ys)
class GenericVinyl a names rs | a -> names rs where
type FieldNames a :: [Symbol]
createRecId :: a -> Rec VF.Identity (ZipTypes names rs)
The type family ZipTypes
takes two type variable lists (one for column-names and one for column types) and returns the column types of the vinyl
record, by zipping the correspinding elements using the :->
type constructor imported from Frames.Col
. Note that we add the type family dependency zs -> ns ys
to state that this type family is injective, i.e. ZipTypes a1 b1 ~ ZipTypes a2 b2
implies that (a1, b1) ~ (a2, b2)
. Next we declare the GenericVinyl
typeclass, that is similar to the FromBeam
typeclass from the previous post, with the addition of the FieldNames a
declaration and createRecId
return type making use of the ZipTypes
type family.
Before we write the GenericVinyl
instance, we need to extract a type-level list of record field names. Consider the following code:
{-# LANGUAGE TemplateHaskell #-}
module Helpers where
import Language.Haskell.TH
import Language.Haskell.TH.Syntax
fNamesTypeLevel :: Name -> Q Type
fNamesTypeLevel name = do
fnames <- fmap getRecordFields $ reify name
fnames' <- fnames
foldr (\x xs -> appT (appT promotedConsT x) xs) promotedNilT $ map (litT . strTyLit) fnames'
getRecordFields :: Info -> Q [String]
getRecordFields (TyConI (DataD _ _ _ _ cons _)) = return $ concatMap getRF cons
getRecordFields _ = return []
getRF :: Con -> [String]
getRF (RecC _name fields) = map getFieldInfo fields
getRF _ = []
getFieldInfo :: (Name, Strict, Type) -> String
getFieldInfo (name, _, AppT (AppT (ConT _) (VarT f)) (ConT ty)) = (nameBase name)
The function fNamesTypeLevel
takes a Name
, i.e. an in-scope record in our case and returns a type-level list of its field-names.
Next we try to write a GenericVinyl
typeclass instance for plain records of the sort used by beam
. Consider the following code:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeFamilyDependencies #-}
{-# LANGUAGE TypeOperators #-}
module Vinylize where
import Data.Proxy
import Data.Vinyl
import qualified Data.Vinyl.Functor as VF
import qualified Database.Beam as B
import Frames.Col
import Generics.SOP
import qualified Generics.SOP.NP as GSN
import GHC.TypeLits
import Helpers (fNamesTypeLevel)
import Language.Haskell.TH
import Language.Haskell.TH.Syntax
-- ZipTypes and GenericVinyl class declaration elided here
deriveVinyl :: Name -> DecsQ
deriveVinyl name = entireInstance
where
n = conT name
typeList1 = fNamesTypeLevel name
entireInstance=
[d|
instance (((Code ($(n) B.Identity)) ~ '[rs]),
(ns3 ~ FieldNames ($(n) B.Identity)) )
=> GenericVinyl ($(n) B.Identity) ns3 rs where
type FieldNames ($(n) B.Identity) = $(typeList1)
createRecId r = (go tranformedNP)
where
SOP (Z prod) = from r
tranformedNP = (((GSN.trans_NP (Proxy :: Proxy (LiftedCoercible I VF.Identity)) (\(I x) -> VF.Identity $ coerce x) prod)) )
go = GSN.cata_NP RNil (:&)
|]
The deriveVinyl
function takes a record name as input and generates a GenericVinyl
instance declaration for this record. On the whole, trans_NP
transforms the NP
from one interpretation functor to another, (safely) coercing the field values to the appropriate column types at the same time. In order to have a coercion constraint lifted over the interpretation functor, we use the Proxy (LiftedCoercible I VF.Identity)
argument. Specifically, trans_NP :: AllZip c xs ys => proxy c -> (forall x y. c x y => f x -> g y) -> NP f xs -> NP g ys
, so the proxy argument provides a partially applied class constraint, that upon being fully applied yeilds a constraint LiftedCoercible I VF.Identity x y
, which happens to be equivalent to Coercible (I x) (VF.Identity y)
(Coercible
class comes from Data.Coerce
; LiftedCoercible
is declared in Generics.SOP.Constraint
). In my understanding, this is necessary as the compiler usually infers on its own Coercible x y
when x
and y
have the same representation; but here we need to lift the Coercible
constraint over the respective interpretation functors.
Of particular note is the functiongo
, which effectively replaces the NP
constructors with constructors from vinyl
(GSN.cata_NP
is a generalization of foldr
; we pass it RNil
as the neutral element and (:&)
combines consecutive elements). The overall effect of createRecId
is therefore not just an isomorphism, but some additional things happening as well (most notably the coercing). This was the point that I had flagged earlier as one to be revisited, when we were discussing the informal arrow diagrams.
Testing it all
Consider the following code:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeFamilyDependencies #-}
{-# LANGUAGE UndecidableInstances #-}
module TestDeriveVinyl where
import Data.Coerce
import Data.Vinyl
import qualified Data.Vinyl.Functor as VF
import qualified Database.Beam as B
import Database.PostgreSQL.Simple
import Frames
import Generics.SOP
import Generics.SOP.TH
import PostgresAccess (User, UserT, selectAllUsers)
import Vinylize (createRecId, deriveVinyl)
deriveGeneric ''UserT
deriveVinyl ''UserT
Compiling and checking for the generated code, reveals that the following instance of GenericVinyl
for UserT
has been generated (again, the type variables below have been sugared and names un-qualified in places):
gsoc-blog-code/src/TestDeriveVinyl.hs:25:1-19: Splicing declarations
deriveVinyl ''UserT
======>
instance (Code (UserT B.Identity) ~ '[rs],
ns3 ~ FieldNames (UserT B.Identity)) =>
GenericVinyl (UserT B.Identity) ns3 rs where
type FieldNames (UserT B.Identity) = '["_userEmail",
"_userFirstName",
"_userLastName",
"_userIsMember",
"_userDaysInQueue"]
createRecId r
= go tranformedNP
where
SOP (Z prod) = from r
tranformedNP
= ((Generics.SOP.NP.trans_NP
(Proxy :: Proxy (LiftedCoercible I VF.Identity)))
(\ I x -> (VF.Identity $ (coerce x))))
prod
go = (Generics.SOP.NP.cata_NP RNil) (Data.Vinyl.:&)
Next, we try to test this instance with the following test code (which is similar to what we did from ghci
in the last post):
-- TestDeriveVinyl
test :: IO ()
test = do
conn <- connectPostgreSQL "host=localhost dbname=shoppingcart1"
us <- selectAllUsers conn
mapM_ print $ toFrame $ map createRecId us
And then assuming you still have the test data from the last post in your local Postgres instance, upon going to ghci
:
ghci>test
SELECT "t0"."email" AS "res0", "t0"."first_name" AS "res1", "t0"."last_name" AS "res2", "t0"."is_member" AS "res3", "t0"."days_in_queue" AS"res4" FROM "cart_users" AS "t0"
{_userEmail :-> "james@example.com", _userFirstName :-> "James", _userLastName :-> "Smith", _userIsMember :-> True, _userDaysInQueue :-> 1}
-- some output elided
We can confirm the following as well:
ghci>:set -XOverloadedStrings
ghci>conn <- connectPostgreSQL "host=localhost dbname=shoppingcart1"
ghci>us <- PostgresAccess.selectAllUsers conn
SELECT "t0"."email" AS "res0", "t0"."first_name" AS "res1", "t0"."last_name" AS "res2", "t0"."is_member" AS "res3", "t0"."days_in_queue" AS"res4" FROM "cart_users" AS "t0"
ghci>:t us
us :: [User]
-- type User = UserT B.Identity
ghci>:t (map createRecId us)
(map createRecId us)
:: [Rec
VF.Identity
'["_userEmail" :-> Text, "_userFirstName" :-> Text,
"_userLastName" :-> Text, "_userIsMember" :-> Bool,
"_userDaysInQueue" :-> Int]]
It would be safe at this point to say: it works!
Conclusion
In this post we saw a (somewhat experimental) approach of converting plain records to their vinyl
representation. We did this in order to convert our list of plain records (representing beam
query results) to a dataframe with minimal boilerplate at the end-user side (compare the module TestDeriveVinyl
to our older approach from the last post in module PostgresFrame
).
Thanks
Thanks to Marco Zocca (@ocramz), my GSoC mentor, for reviewing an earlier version of this blog post, and providing valuable feedback/guidance over the course of this project.