GADTing rid of IO

August 16, 2016

Looming IO

As a newcomer to Haskell, I have a tendency to scatter IO liberally about my code, because I still tend to reason about algorithms imperatively.

One of the problems with IO is that it obscures arbitrary side effects. If a function returns IO Int, there's no telling what crazy side effects might occur when computing and digging out that integer.

Maybe there won't be any side effects:

getNum :: IO Int
getNum = return 42

On the other hand, maybe there will be dire side effects;

getNum :: IO Int
getNum = do
  removeDirectoryRecursive "/"
  return 42

It would be nice to find a way to make functions like these safe. Removing IO from their types would do it, and we could rest assured that running them can never cause any side effects.

Database I/O

Let's look at a real-world example: interacting with a relational database.

The sqlite-simple library makes it easy to get up and running with a SQLite database.

We first create a data type and define a way to produce it from a database row:

data TestField = TestField Int String deriving (Show)

instance FromRow TestField where
  fromRow = TestField <$> field <*> field

We then make a few functions using SQL commands and queries:

initDB :: Connection -> IO ()
initDB conn = execute_ conn "CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY, str TEXT)"

addRow :: Connection -> String -> IO ()
addRow conn value = execute conn "INSERT INTO test (str) VALUES (?)" [value]

getRows :: Connection -> IO [TestField]
getRows conn = query_ conn "SELECT * from test" :: IO [TestField]

addRowThenGetRows :: Connection -> String -> IO [TestField]
addRowThenGetRows conn value = do
  addRow conn value
  getRows conn

We finally make it go:

main :: IO ()
main = do
  conn <- open "test.db"
  initDB conn
  testFields <- addRowThenGetRows conn "test string 2"
  mapM_ print testFields
  close conn

A complete example of this is available here.

Annoyingly, those pesky IOs are littered everywhere. We can't know (without reading the source code) that addRow doesn't take the liberty to drop all the rows, and all other tables too for good measure.

Wouldn't it be nice to get them out of the initDB, addRow, getRows, and addRowThenGetRows functions?

GADTs to the rescue

To do this, we'll need an abstract way to describe database actions at the type level. Let's create a generalized algebraic data type called SQL:

data SQL a where
  SQLPure :: a -> SQL a
  SQLQuery :: FromRow a => Query -> [SQLData] -> SQL [a]
  SQLCommand :: Query -> [SQLData] -> SQL ()
  SQLAp :: SQL (a -> b) -> SQL a -> SQL b
  SQLBind :: SQL a -> (a -> SQL b) -> SQL b

An SQL a is a value that represents some database action to take. After this action is taken, a value of type a is produced.

In the above example, the function addRowThenGetRows puts together the functions addRow and getRows. We'll want to be able to put together SQL values in do blocks, so we need to implement Monad, Applicative, and Functor for SQL:

instance Functor SQL where
  fmap f s = SQLBind s (\a -> SQLPure $ f a)

instance Applicative SQL where
  pure a = SQLPure a
  (<*>) f a = SQLAp f a

instance Monad SQL where
  (>>=) a mb = SQLBind a mb

An SQL a isn't very useful on its own. To do anything with it, we'll need a way to actually run it, invoking whatever underlying database commands and queries it abstracts. This is where we defer the IO-imposing uses of sqlite-simple's query and execute functions:

runSQL :: Connection -> SQL a -> IO a
runSQL c (SQLPure a) = return a
runSQL c (SQLQuery q ds) = query c q ds
runSQL c (SQLCommand q ds) = execute c q ds >> return ()
runSQL c (SQLAp f a) = (runSQL c f) <*> (runSQL c a)
runSQL c (SQLBind s fs) = (runSQL c s) >>= (\a -> runSQL c (fs a))

Now we can get rid of IO by dropping the Connection argument from our functions, swapping IO with SQL, and using SQL constructors as necessary:

initDB :: SQL ()

addRow :: String -> SQL ()
addRow value = SQLCommand "INSERT INTO test (str) VALUES (?)" $ toRow [value]

getRows :: SQL [TestField]
getRows = SQLQuery "SELECT * from test" []

addRowThenGetRows :: String -> SQL [TestField]
addRowThenGetRows value = do
  addRow value

Usage is nearly the same as before, except we use runSQL from our IO block to execute the SQL action:

main :: IO ()
main = do
  conn <- open "test.db"
  runSQL conn initDB
  testFields <- runSQL conn $ addRowThenGetRows "test string 2"
  mapM_ print testFields
  close conn

A complete example of this is available here.