Brad Lucas

Programming, Clojure and other interests
March 4, 2012

Loading Data With Clojure

If you are coming to Clojure from Java you are probably familiar with JDBC. If not do a bit of research and you'll find that it is a layer that allows you to program against a database agnostic layer that communicates to your database through a database specific library. In the following example we'll use the clojure.java.jdbc wrapper for JDBC-based access to databases. This library is not part of what you would automatically have when you just install Clojure. For use to use it we'll have to download it ourselves or use a build tool to do it for us. To keep things simple here I'll assume you are using Leiningen as a build tool. If not you can find clojure.java.jdbc on GitHub. Also, you will have to have a database. Here I'll assume MySQL and explain how to connect to that. The example we'll build here will be an enhancement to the stock quote downloader presented in Reading Files article. Instead of writing the quotes to a file we'll load them into a database. You might want to review the article before continuing.

Database

Here we'll use MySQL but you could be using another. The only difference in the following will be the setting up of your database's JDBC library.

Install and Setup JDBC Driver

You'll need to install and reference the MySQL JDBC driver. At this point you should consider using a build tool. Manually you'll need to download the jar file and make sure it is on your class path before starting your Clojure Repl or running Clojure to evaluate your file. Both techniques are worth understanding thoroughly but once you do understand you'll find using a build tool to be more efficient and satisfying.

Manual Install

You'll need to download a copy of the JDBC Driver for MySQL. You can do that on the MySQL Connectors page (http://www.mysql.com/products/connector/). Grab the one for Java. They call it Connector/J. (Download link http://www.mysql.com/downloads/connector/j/)

Inside the package the file you want is the jar named mysql-connector-java-5.1.18-bin.jar. Before you run your Clojure repl you'll want this file on your Classpath.

Using a Build Tool

There are two popular build tools for Clojure, Leiningen and Cake. They are similar in idea to Ant in that they help manage where dependencies are, which files are source files and where to put compiled files. Make sure you understand using Clojure from the command line and with a command-line started Repl before using a build tool so you have a deeper understanding of what the build tool is solving. I use Leiningen and there is talk of a merging with Cake so let's go with Leiningen here. (At some point I'll write an intro to Leiningen. Here I'll assume you figured it out and you have installed Leiningen and can create a new project) After you've created your project with "lein new quote-loader" you'll see a project.clj file. Here is where you put your external dependencies. In this case we need the MySql driver. To include it you add "[mysql/mysql-connector-java "5.1.18"]" to the dependencies vector. Notice also that you need to add [org.clojure/java.jdbc "0.1.1"] to get the clojure.java.jdbc support.

(defproject quote-loader "1.0.0"
  :description "quote-loader"
  :dependencies [[org.clojure/clojure "1.3.0"]
                 [org.clojure/java.jdbc "0.1.1"]
                 [mysql/mysql-connector-java "5.1.18"]])

Analyze Quote Data

We previously downloaded our quote data so we know what the data looks like. The first line is a header and it tells us what the numbers mean. We will want to ignore this line when processing the file.

Date,Open,High,Low,Close,Volume,Adj Close

Subsequent rows are the quotes each on their own line.

2012-02-17,31.20,31.32,30.95,31.25,70036500,31.25

Data is present in a descending order. The oldest is last in the file.

Create Database and Table

First let's create database called quote. Then create a test user with access to that database.

mysql> create database quote;

mysql> create user 'tester'@'localhost' identified by 'password';

mysql> grant all on quote.* to 'tester'@'localhost';

Next, login as 'tester' and create the quote table.

drop table if exists quote;
create table quote (
       quote_id    int(10) unsigned not null auto_increment,
       symbol      varchar(6),
       date        date,
       open        decimal(18,4),
       high        decimal(18,4),
       low         decimal(18,4),
       close       decimal(18,4),
       vol         bigint,
       adjclose    decimal(18,4),
       primary key (quote_id),
       index (symbol, date)
);

-- YYYY-MM-DD  http://dev.mysql.com/doc/refman/5.1/en/datetime.html
-- http://stackoverflow.com/questions/1523576/database-schema-for-organizing-historical-stock-data

Parsing Quote Data

Revisiting our sample data we'll have to have routine to parse the comma delimited fields so we can load them into the appropriate database columns.

2012-02-17,31.20,31.32,30.95,31.25,70036500,31.25

Parsing CSV data

To help us parse each line will use David Santiago's clojure-csv library. This is another external library we'll want our build tool to manage for us. The following will need be included in our dependencies list in the Leiningen project.clj file.

[clojure-csv/clojure-csv "1.3.2"]

To use the library require it as follows. Note I've given it a shortened name of csv to make the code that uses the library more readable.

(require '[clojure-csv.core :as csv])

Now we can call csv/parse-csv on each row to get a vector with items for each value in our row. The call returns a vector of strings with a vector for each row. It is meant to handle an entire file but here I'm using on a single row. With that the first call is needed to pull the first row out. If you want to try this by hand enter the following.

(def test-data "2012-02-17,31.20,31.32,30.95,31.25,70036500,31.25")

(first (csv/parse-csv test-data))

You should get back the following.

["2012-02-17" "31.20" "31.32" "30.95" "31.25" "70036500" "31.25"]

Now, we'll need those values in some sort of structure that allows use to put them in the database in the appropriate columns. We could do this position-ally but that wouldn't be very clear if we revisited the code in the future. What would be nice is to convert the vector to a map so we could pull values out by keywords. To do that we can use the zipmap function that returns a map with the keys mapped to the corresponding values. The following shows that in a parse-row function.

(defn parse-row [row]
  (let [v (first (csv/parse-csv row))]
    (zipmap [:date :open :high :low :close :vol :adjclose] v)))

Try calling parse-row with the test-data as a parameter to see how it works.

user=> (parse-row test-data)

You should get the following.

{:adjclose "31.25", :vol "70036500", :close "31.25", :low "30.95", :high "31.32", :open "31.20", :date "2012-02-17"}

Connecting to the Database

Over on wikibooks where is a short example showing how to connect to MySQL. You can review it or use the following as an example.


(def db-host "localhost")
(def db-port 8889)
(def db-name "quote")
(def db-user "tester")
(def db-pwd "password")

(def database-dev-settings
  {
   :classname "com.mysql.jdbc.Driver" ; must be in classpath
   :subprotocol "mysql"
   :subname (str "//" db-host ":" db-port "/" db-name)
   :user db-user
   :password db-pwd
  }
)

Inserting into the database

Much like when you are reading or writing a file and you need to do so within a 'with-open' call our database calls will be done within a 'with-connection' call. The 'with-connection' call takes our database-settings hash and connects to the database executing the code we pass to it as a parameter. In our example we'll use update-or-insert-values. This is nice because we can rerun our loader everyday and it will update previous values and insert only the new values. Certainly not the most efficient plan but a simple one.

(defn insert-quote
  "Insert or update a quote passed in as a map"
  [sym q]
    (sql/with-connection database-dev-settings
      (sql/update-or-insert-values
       :quote
       ["symbol=? and date=?" sym (:date q)]
       {:symbol sym :date (:date q) :open (:open q) :high (:high q) :low (:low q) :close (:close q) :vol (:vol q) :adjclose (:adjclose q)})))

Using the assoc function

Based on some initial comments to this article we can rework insert-quote a bit to more readable. The think to look at is the map that is created and passed into update-or-insert-values. The line in question is.

{:symbol sym :date (:date q) :open (:open q) :high (:high q) :low (:low q) :close (:close q) :vol (:vol q) :adjclose (:adjclose q)}

First, realize that we are creating a new map from our parameters 'sym' a string and q a map. Also, notice this new map has the same keys as our parameter map q with the additional new key or :symbol. This and the function assoc gives a more readable method for creating a new map.

(assoc q :symbol sym)

The assoc function creates a new map with the same key values as the parameter q with the addtional key value of :symbol sym. Exactly what we were doing more verbosely in the earlier version. Using this new method we can improve insert-quote as follows.

(defn insert-quote1
  [sym q]
    (sql/with-connection database-dev-settings
      (sql/update-or-insert-values
       :quote
       ["symbol=? and date=?" sym (:date q)]
         (assoc q :symbol sym))))

Destructuring

There is another technique we can use to create our map to pass into update-or-insert-values, destructuring. Destructuring, also called abstract structural binding lets you pass variables in side of let binding lists and function parameter lists. It results in more concise code because assignments from data structures being passed in is done before the body of the let or function. This technique can be used in the insert-quote example as follows. Notice how the q map as a parameter is being destructured into variables which will be used inside the function.

(defn insert-quote2
  [sym {date :date
        open :open
        high :high
        low :low
        close :close
        vol :vol
        adjclose :adjclose}]
  (sql/with-connection database-dev-settings
    (sql/update-or-insert-values
     :quote
     ["symbol=? and date=?" sym date]
    {:symbol sym :date date :open open :high high :low low :close close :vol vol :adjclose adjclose})))

(see also {http://blog.jayfields.com/2010/07/clojure-destructuring.html)

Load Historical Quotes

The last thing we need to remember before we write our load function is that we need to ignore the first line. We can do this by knowing which line we are on but another way is to check the line's content before we try for the insert. A simple test is to see if the first character of the line is a number. We know that the header contains letters and our data is numbers so this is reasonable. We could enhance this to check the values in the line as well. A validation function but let's leave that for now.

(defn valid-data [str]
  (Character/isDigit (first str)))

(defn load-historical-quotes [sym]
  (let [url (build-url sym)]
    (with-open [rdr (io/reader url)]
      (doseq [line (line-seq rdr)]
        (if (valid-data line)
          (insert-quote sym (parse-row line)))))))

Source Code

The following is the entire example in one file. The project.clj file is shown first to show the required dependencies then the clj file, quote-loader.clj. If you create a new project with 'lein new quote-loader' then replace your project.clj and core.clj file you'll have everything you need. Remember to edit the db-user and db-pwd values (DB-USER and DB-PASSWORD for your database.

(defproject quote-loader "1.0.0-SNAPSHOT"
  :description "FIXME: write description"
  :dependencies [[org.clojure/clojure "1.3.0"]
                 [clojure-csv/clojure-csv "1.3.2"]
                 [org.clojure/java.jdbc "0.1.1"]
                 [mysql/mysql-connector-java "5.1.18"]
                 ])
(ns quote-loader.core
  (:require [clojure.java.io :as io])
  (:require [clojure.java.jdbc :as sql])
  (:require [clojure.string :as str])
  (:require [clojure-csv.core :as csv]))

(defn build-url [sym]
  (str "http://ichart.finance.yahoo.com/table.csv?s=" sym "&ignore=.csv"))


(def db-host "localhost")
(def db-port 8889)
(def db-name "quote")
(def db-user "tester")
(def db-pwd "password")

(def database-dev-settings
  {
   :classname "com.mysql.jdbc.Driver" ; must be in classpath
   :subprotocol "mysql"
   :subname (str "//" db-host ":" db-port "/" db-name)
   :user db-user
   :password db-pwd
  }
)

(defn parse-row [row]
  (let [v (first (csv/parse-csv row))]
    (zipmap [:date :open :high :low :close :vol :adjclose] v)))

(defn valid-data [str]
  (Character/isDigit (first str)))

(defn insert-quote
  "Insert or update a quote passed in as a map"
  [sym q]
    (sql/with-connection database-dev-settings
      (sql/update-or-insert-values
       :quote
       ["symbol=? and date=?" sym (:date q)]
       {:symbol sym :date (:date q) :open (:open q) :high (:high q) :low (:low q) :close (:close q) :vol (:vol q) :adjclose (:adjclose q)})))


(defn load-historical-quotes [sym]
  (let [url (build-url sym)]
    (with-open [rdr (io/reader url)]
      (doseq [line (line-seq rdr)]
        (if (valid-data line)
          (insert-quote sym (parse-row line)))))))

GitHub

See https://github.com/bradlucas/quote-loader for a working version of the example presented here. As a bonus the uploaded project includes support to run as a standalone program from the command line.

Tags: clojure