Brad Lucas

Programming, Clojure and other interests
July 4, 2017

Tokenwatch (Part 2)

Once part 1 of TokenWatch was done the next step appeared when you saw all the details on each entries interior page. There I was most interested in the links to the Whitepapers so I could collect them and read through them more easily as a group.

For this part of the project I created another script tokenwatch_details.py which extends the previous tokenwatch.py script.

Code

To start I'll be getting the dataframe from the tokenwatch.py script sorted by NAME.

df = t.process().sort_values(['NAME'])

Each row in the dataframe has a link to the details page. The gist here will be to get this page and parse out the details I need. When inspecting the page it is noted that all the tables are classed with table-asset-data. The last one on the page is the most interesting. To grab that table see the following.

    html = requests.get(url, headers={'User-agent': 'Mozilla/5.0'}).text
    soup = BeautifulSoup(html, "lxml")
    tables = soup.findAll("table", {"class": "table-asset-data"})
    # Last table
    table = tables[-1]

For convience I grab the tables data into a dictionary.

    details = {}
    for td in table.find_all('td'):
        key = td.text.strip().split(' ')[0].lower()
        vals = td.find_all('a')
        if vals:
            value = vals[0]['href']
        else:
            value = '-'
        details[key] = value
    return details

If available there will be a link to a whitepaper. Most are links to pdf files so I defend against errors by checking the type and the url before downloading.

def get_whitepaper(name, details):
    try:
        whitepaper_link = details['whitepaper']
        if whitepaper_link != '-':
            # only download if the link has a pdf in it
            print whitepaper_link
            head = requests.head(whitepaper_link, headers={'User-agent': 'Mozilla/5.0'})
            # Some servers doesn't return the applcation/pdf type properly
            # As a double check look at the url
            if head.headers['Content-Type'] == 'application/pdf' or head.url.find(".pdf") > 0:
                whitepaper_filename = get_dir(name) + "/" + name + "-whitepaper.pdf"
                download_file(whitepaper_filename, whitepaper_link)
                print whitepaper_filename
            else:
                print "Unknown whitepaper type: " + whitepaper_link
        else:
            print "Unavailable whitepaper for " + name
    except:
        print "No whitepaper link in dictionary"

See the complete proje ct in the GitHub repo listed below.

  • [https://github.com/bradlucas/tokenwatch/tree/part2[(https://github.com/bradlucas/tokenwatch/tree/part2)
Tags: ethereum python