Updated Clojure Ads Txt Crawler

October 1, 2017

Work continues on the Ads.txt crawler with a focus on error handling and reporting. A new version (0.0.2) is not available and it works well against the 'Top 100 domains' file.

One particular issue was found with http-kit. It seems to have difficulty with sites that are SNI-enabled. When such a site is encountered you need a SNI-aware client.

In my top 100 list the url http://elpais.com/ads.txt causes the following error:

Error: javax.net.ssl.SSLException: Received fatal alert: handshake_failure for https://elpais.com/

This can be overcome by configuring http-kit.

(defn sni-configure
  [^SSLEngine ssl-engine ^URI uri]
  (let [^SSLParameters ssl-params (.getSSLParameters ssl-engine)]
    (.setServerNames ssl-params [(SNIHostName. (.getHost uri))])
    (.setSSLParameters ssl-engine ssl-params)))

(def client (http/make-client {:ssl-configurer sni-configure}))

To use this you pass the client in as one of the options like:

@(http/get url {:client client})

See https://github.com/bradlucas/ads-txt-crawler/blob/release/0.0.2/src/adstxtcrawler/httpkit.clj for a specific example. Note that I've left my old routine in for now to show what was in use in the previous release. Also, not that I've decided to use the SNI configuration only when there is an error.

Project Repo


Tags: adtech clojure ads.txt