Brad Lucas

Programming, Clojure and other interests
October 5, 2017

Clojure Ads Txt Crawler Saving To A SQLite Database

SQLite is a small self contained SQL database engine that reads and writes to a normal file. It is a good first pass for adding database storage to your application. It's easy to use and with a single file to deal with offers no real issues. Being a SQL database though means you won't be far off if you decide to up your database requirements to Postgres or some other database system.

My recent project of building a Clojure Ads.txt file crawler is a good target for SQLite support for a couple reasons. One is that the project is simple and works just fine writing to result files but having a database would be good in the long term especially as the project grows. Second is that the original Python Ads.txt file crawler defaulted to writing to a SQLite database. I figure the Clojure project should at the very least be able to do the same.

The results of adding SQLite are in the repository if you are curious.

Strategy

  1. Model the data and create a database table schema
  2. Add the required libraries for SQLite to our project
  3. Support optionally writing to the database
  4. Support the writing to the database

Model the data

Since we'll be storing the output from the crawler we'll need to create a table that will support this data. We can start with the sql over in the https://github.com/InteractiveAdvertisingBureau/adstxtcrawler project and tweak it a bit.

BEGIN TRANSACTION;
DROP TABLE IF EXISTS adstxt;

CREATE TABLE adstxt(
       SITE_DOMAIN                  TEXT    NOT NULL,
       EXCHANGE_DOMAIN              TEXT    NOT NULL,
       SELLER_ACCOUNT_ID            TEXT    NOT NULL,
       ACCOUNT_TYPE                 TEXT    NOT NULL,
       TAG_ID                       TEXT    NOT NULL,
       ENTRY_COMMENT                TEXT    NOT NULL,
       UPDATED                      DATE    DEFAULT (datetime('now','localtime')),
    PRIMARY KEY (SITE_DOMAIN,EXCHANGE_DOMAIN,SELLER_ACCOUNT_ID,ACCOUNT_TYPE,TAG_ID)
);

END TRANSACTION;

To use this to create a database:

$ sqlite3 database.db < ./sql/create.sql

Note that there is a bit of a mismatch between the names I have used in the code. This will change in a subsequent version as I move away from the https://github.com/InteractiveAdvertisingBureau/adstxtcrawler version as a model.

Add the required libraries

There are two libraries you'll need to add to the project. The first is the jdbc library and the second is the SQLite driver library. The links for each of these are as follows.

Note though that the sqlite-jdbc library doesn't give you the hint on how to use it with Leiningen. This when first encountered is confusing. The secret to remember is to go look in Maven. In this case there is a button called "maven cntral 3.20.0" which you can click. It goes to http://search.maven.org/#artifactdetails%7Corg.xerial%7Csqlite-jdbc%7C3.20.0%7C and you can click on the word Leiningen under Dependency Information. In other cases you'll need to search the MVNRepository yourself. Here is a link https://mvnrepository.com/. Remember the project name and the latest version when searching.

Add the following to the dependencies map of the project.clj to add the libraries

                 [org.clojure/java.jdbc "0.7.2"]
                 [org.xerial/sqlite-jdbc "3.20.0"]

Support optional writing to the database

See the src/adstxtcrawler/crawl.clj file for the crawl-domain function. If you pass it a dbname you end up in the src/adstxtcrawler/database.clj file's save routine otherwise you'll use the previous print routine.

Writing to the database

The src/adstxtcrawler/database.clj file contains the routines for writing to the database. The key points to show include the connection specification and the actual inssert.

To accept a parameter to build a connection spec there is a routine called db-spec.

(defn db-spec [dbname]
  {:classname   "org.sqlite.JDBC"
   :subprotocol "sqlite"
   :subname     dbname
   })

The actual insert is done in save. It first unpacks the data by destructuring the data map. Then the jdbc/insert! does the insert. Note that duplicates will throw and exception which is ignored. In practice Ads.txt files can contain duplicates which for out purposes we can ignore.

(defn save [domain data dbname]
  (let [db-spec (db-spec dbname)]
    (doseq [m data]
      (let [{:keys [exchange-domain account-id account-type tag-id comment]} m]
        ;; some ads.txt files have duplicates
        ;; for these we can ignore the exception causes by trying to insert a duplicate
        (try
          (jdbc/insert! db-spec :adstxt {:site_domain domain
                                       :exchange_domain exchange-domain
                                       :seller_account_id account-id
                                       :account_type account-type
                                       :tag_id tag-id
                                         :entry_comment comment})
          (catch org.sqlite.SQLiteException e
            ;; Uncomment if you want to see the duplicates
            ;; (.println *err* (format "Duplicate: %s,%s,%s,%s,%s" domain exchange-domain account-id account-type tag-id))
            ))))))

Summary

Hopefully, this has been enough to show how easy it is to add SQLite support to a Clojure application. See the [https://github.com/bradlucas/ads-txt-crawler for all of the project's source code. Also, if you are interested in previous posts on this project and the Ads.txt specification feel free to visit list posts list http://blog.bradlucas.com/tags/ads.txt//

Tags: adtech clojure sqlite ads.txt