Use wget to download a web page and all associated files.
$ wget -r -np -k http://domain.com/url
If you don't have wget and you are using a Mac use brew.
$ brew install wget
After taking a month off from writing blog posts I've just found something interesting to point out. The other day I found this "Advent Calendar" idea for programmers. Each day you solve a puzzle or two to unlock the day. With twenty five days to complete it looks like an interesting activity for the month of December.
If you are curious a bunch of Clojurians are listing their solutions repos here:
gpg: signing failed: Inappropriate ioctl for device
$ GPG_TTY=$(tty) $ export GPG_TTY
Continuing with the Ads.txt crawler has lead to the idea to store the crawler results in a database and have them available from a web site. This post introduces the first pass as such a site with the source code available in the following repository:
As a quick review the Ads.txt standard is one where publishers can host a simple text file with the names of authorized ad networks that have permission to sell the publisher's inventory. There is a reference Python crawler for such files and I've built a crawler in Clojure as an alternative. See this link for a series of posts about the Ads.txt specification and the development of the crawler. The crawler project is here.
Running a free Heroku app which falls asleep? You can use a service to ping it periodically to keep it alive.
Here is one: