Haskell read CSV file -> load XML file from url -> write out CSV file again

489 Views Asked by At

I am trying to

  1. Load a CSV File
  2. Read IDs from the file
  3. Load an external xml file for each Id
  4. Read some names from XML
  5. Write out the ID and names into new CSV File

I am new to Haskell and really wanna learn it, I am still in the copy and paste phase of understanding. I have found tutorials for each part on its own, but I struggle with combining them.

The CSV is simple, like:

736572,"Mount Athos"
6697806,"North Aegean"

I use Cassava to read the CSV and HandsomeSoup for XML reading.

Here my attempt to read the id, load the xml and print the names from the xml at least.

{-# LANGUAGE ScopedTypeVariables #-}

import qualified Data.ByteString.Lazy as BL
import Data.Csv
import qualified Data.Vector as V

import Text.XML.HXT.Core
import Text.HandsomeSoup

import Data.List
import Data.Char


getPlaceNames::String->String->String
getPlaceNames pid name = do
    let doc = fromUrl ("http://api.geonames.org/get?geonameId="++pid++"&username=demo")

    c<-runX $ doc >>> css "alternateNames" >>> deep getText
    return (head c)


main :: IO ()
main = do
    csvData <- BL.readFile "input.csv"
    case decode NoHeader csvData of
        Left err -> putStrLn err
        Right v -> V.forM_ v $ \ ( pid, name ) ->
          putStrLn $  getPlaceNames pid name

I think I am doing something wrong when I call getPlaceNames and return the names. I am not even sure if I should use the 'do' statement in getPlaceNames.

Error says

 Couldn't match expected type ‘[[Char]]’
            with actual type ‘IO [String]’
In a stmt of a 'do' block:
  c <- runX $ doc >>> css "alternateNames" >>> deep getText
In the expression:
  do { let doc
             = fromUrl
                 ("http://api.geonames.org/get?geonameId="
                  ++ pid ++ "&username=demo");
       c <- runX $ doc >>> css "alternateNames" >>> deep getText;
       return (head c) }
In an equation for ‘getPlaceNames’:
    getPlaceNames pid name
      = do { let doc = ...;
             c <- runX $ doc >>> css "alternateNames" >>> deep getText;
             return (head c) }

But thats probably just one thing that I am doing wrong because of my lack of understanding of monads and bindings.

Any help appreciated, even if its just a pointer to the right docs.

Cheers

Bjorn

1

There are 1 best solutions below

6
Björn Grambow On

Thanks to chi I've figured out the whole procedure. I am posting my code for anyone else who needs to do something similar.

In the end I did not only take the names from the xml but multiple fields. So I changed getPlaceNames to gtPlaceDetails.

I show the full code, because it also shows how I read different fields from the XML and how I merge alternateName elements in the XML to one String.

{-# LANGUAGE ScopedTypeVariables #-}


import qualified Data.ByteString.Lazy.Char8 as BL


import Data.Csv
import qualified Data.Vector as V

import Text.XML.HXT.Core
import Text.HandsomeSoup
import Data.List
import Data.Char


uppercase :: String -> String
uppercase = map toUpper


toLanguageStr :: (String, String) -> String
toLanguageStr (lan,name) = uppercase lan ++ ":" ++ name


getPlaceDetails::String->String->IO (Int,String,Float,Float,Float,Float,Float,Float,String,String)
getPlaceDetails pid name = do
    let doc = fromUrl ("http://api.geonames.org/get?geonameId="++pid++"&username=demo")

    id<-runX $ doc >>> css "geonameId" >>> deep getText
    name<-runX $ doc >>> css "name" >>> deep getText
    s<- runX $ doc >>> css "south" >>> deep getText
    w<- runX $ doc >>> css "west" >>> deep getText
    n<- runX $ doc >>> css "north" >>>  deep getText
    e<- runX $ doc >>> css "east" >>> deep getText
    lat<- runX $ doc >>> css "lat" >>> deep getText
    lng<- runX $ doc >>> css "lng" >>> deep getText
    translations<- runX $ doc >>> css "alternateName" >>> (getAttrValue "lang" &&& (deep getText))
    terms<- runX $ doc >>> css "alternateNames" >>> deep getText
    return ( read (head id),head name, read (head lat), read (head lng), read (head s), read (head w), read (head n), read (head e), intercalate "|" $ map toLanguageStr translations, head terms )



main :: IO ()
main = do
    csvData <- BL.readFile "input.csv"
    case decode NoHeader csvData of
        Left err -> putStrLn err
        Right v -> V.forM_ v $ \ ( pid, name )->do
            details <- getPlaceDetails pid name
            BL.appendFile "out.csv" $ encode [details]
            BL.putStrLn  (encode [details]) 

For example the input.csv line

736572,"Mount Athos"

maps into out.csv this

736572,"Mount Athos",40.15798,24.33021,40.11294,23.99234,40.4563,24.40044,"KO:아토스 산|:Aftónomos Periochí Agíou Órous|:Ágion Óros|:Ágio Óros|:Athos|NO:Áthos|EN:Autonomous Monastic State of the Holy Mountain|:Avtonómos Periokhí Ayíou Órous|:Áyion Óros|:Dhioíkisis Ayíou Órous|:Hagion Oros|:Holy Athonite Republic|LINK:http://en.wikipedia.org/wiki/Mount_Athos|CA:Mont Athos|FR:Mont Athos|EN:Mount Athos|FR:République monastique du Mont Athos|EL:Αυτόνομη Μοναστική Πολιτεία Αγίου Όρους","Aftonomos Periochi Agiou Orous,Aftónomos Periochí Agíou Órous,Agio Oros,Agion Oros,Athos,Autonome Monastike Politeia Agiou Orous,Autonomous Monastic State of the Holy Mountain,Avtonomos Periokhi Ayiou Orous,Avtonómos Periokhí Ayíou Órous,Ayion Oros,Dhioikisis Ayiou Orous,Dhioíkisis Ayíou Órous,Hagion Oros,Holy Athonite Republic,Mont Athos,Mount Athos,Republique monastique du Mont Athos,République monastique du Mont Athos,atoseu san,Ágio Óros,Ágion Óros,Áthos,Áyion Óros,Αυτόνομη Μοναστική Πολιτεία Αγίου Όρους,아토스 산"