← Back to Archived sites

Saving Literatuurplein.nl to the Wayback Machine

Latest update: 22-04-2026

Screenshot of homepage of Literatuurplein.nl, 04-12-2019
Screenshot of homepage of Literatuurplein.nl, 04-12-2019

Wayback Machine screenshots

Homepage Thea Beckman Literaire prijzen
Screenshot of homepage, as archived in Wayback Machine on 25-11-2019 Screenshot of Author detail page about Thea Beckman, as archived in Wayback Machine on 29-11-2019 Screenshot of Overview of literary prizes, as archived in Wayback Machine on 29-11-2019
Screenshot of homepage, as archived in Wayback Machine on 25-11-2019 Screenshot of Author detail page about Thea Beckman, as archived in Wayback Machine on 29-11-2019 Screenshot of Overview of literary prizes, as archived in Wayback Machine on 29-11-2019
Canon van de Nederlandse geschiedenis Archief Nieuwsberichten Recensies
Screenshot of Canon van de Nederlandse geschiedenis, as archived in Wayback Machine on 28-11-2019 Screenshot of Archief Nieuwsberichten, as archived in Wayback Machine on 14-12-2019 Screenshot of Recensies page, as archived in Wayback Machine on 30-11-2019
Screenshot of Canon van de Nederlandse geschiedenis, as archived in Wayback Machine on 28-11-2019 Screenshot of Archief Nieuwsberichten, as archived in Wayback Machine on 14-12-2019 Screenshot of Recensies page, as archived in Wayback Machine on 30-11-2019

About

The site www.literatuurplein.nl has been phased out per 16 December 2019.

To preserve its content, e.g. for sourcing Wikipedia articles or (Wiki)data purposes, the KB submitted copies of (=archived) its most relevant pages to The Wayback Machine (WBM) of The Internet Archive during November and December 2019.

The results of this archiving effort are listed in the .xlsx and .tsv files that can be found using the table below.

Data overview

Each Category contains a README with statistics about the data and download links to TSV and Excel files.

Category Description Total URLs
personen Persons - mainly authors from the Netherlands, but also from abroad 31.002
boeken Descriptions (metadata) of books. No explicit titles or authors provided 16.677
nieuws Literary news archive 4.793
prijzen Literary awards in the Netherlands and Flanders 4.622
adressenbank Names and addresses of literary organisations (publishers, book sellers, libraries, reading clubs etc.) 3.464
canon Book titles related to the 50 topics in the canon of Dutch history 3.006
recensies Reviews of literary publications 1.982
wereldkaart Book titles related to certain locations on the world map 680
excursies Literary excursions to cities, towns and villages in the Netherlands and abroad 464
trefwoorden Book titles related to certain keywords 439
interviews Interviews with Dutch and foreign authors. Includes full-texts 365
evenementen Events from the literary agenda 247
leestips Reading tips 64
zoeken Pages related to simple and advanced search 51
poezie Profiles of 21 Dutch and Belgian poets 44
genres Book titles according to literary genre 43
columns Literary columns 36
themas Pages related to certain themes 18
overige Pages like Sitemap, Contact, Disclaimer, Colophon etc. 16

The data

Short description per file

For readability the

  1. prefix literatuurplein- , the
  2. suffix (_03122019), the datestamp when the file was created, and the
  3. file extension (.xlsx / .tsv) are omitted from the filenames below

The number behind the filename is the number of URLs captured (= number of rows in the Excel -1)

Persons (data/personen/)

Literary prizes (data/prijzen/)

Books (data/boeken/, data/canon/, data/wereldkaart/, data/trefwoorden/, data/genres/)

Other content

Obviously, more Literatuurplein URLs than are listed here are (likely to be) available in the WBM. This is because apart from the active archiving effort I’ve conducted, the WMB crawler/archiver has visited the site over its lifetime, thus archiving pages for many years (passive archiving).

Data sources

The data to make the above files was obtained from 3 sources:

1) Most relevant subsites of www.literatuurplein.nl : Page URLs and page content under the menu items Nieuws - Columns - Interviews - Literaire prijzen - Recensies - Canon - Excursies - Poezie - Literaire adressen, obtained via webscraping. 2) Most visited pages : URLs of pages that were requested 30 or more times over the last 5 years, obtained via Google Analytics. 3) Persons data : A data dump from the Literatuurplein CMS, containing the names, dates of birth & death and places of birth & death of 10.027 persons (mainly authors).

Steps taken

1) For webscraping source 1 I used the Chrome-plugin of Webscraper.io. With this tool you can specify which page URLs and HTML-elements (title, headers, bullet lists etc) you want to extract from a website. The result can be downloaded as a csv file for futher processing in Excel.

2) To get the URLs of the most visited pages (source 2), I used Google Analytics. This were 32K URLs in total, out of a total of 964K URLs that were requested in that time period (extreme long tail distribution).

3) In the data dump (source 3) I transfomed the ID in column 1 (e.g. 161934) into a Leesplein URL (https://www.literatuurplein.nl/persdetail?persId=161934). This data dump ended up in personen-allen and personen-namen-datums-plaatsen

4) I combined these three lists of URLs into a single list and did some deduplication (using Excel) to avoid any overlap, as the three sources are not necessarily disjunct.

5) Using a url-status-checker script I checked if all the Literatuurplein URLs actually worked (= status 200). This took many hours. I deleted the URLs giving 404s or other errors.

6) Once all the preparations were done, it was now time to actually archive all URLs to the Wayback Machine. For that I ran a Python script using waybackpy. This was not a 100% process, some URLs could not be captured correctly by the WBM and were thus omitted from further processing. See ../../scripts/wbm-archiver/ for current archiving scripts.

7) To make sure all generated WBM URLs actually work, I again ran the url-status-checker script, but now with the archived URLs as input. Once again this took many hours. I deleted the URLs giving 404s or other errors.

8) For improved overview I split up URLs list into 22 Excels, according to the file listing above.

9) I converted all Excels into open .tsv (tab separated value) files in plain text Unicode UTF-8. These can be readily imported/exported to other data formats.

Folder structure

Literatuurplein/
├── index.md                     # This page
├── README.md                    # Mirror of index.md
├── images/                      # Screenshots of the website in the WBM
└── data/                        # Archived URL data files organized by category
    ├── adressenbank/            # Literary organisations (3.464 URLs)
    ├── boeken/                  # Book metadata (16.677 URLs)
    ├── canon/                   # Canon of Dutch history books (3.006 URLs)
    ├── columns/                 # Literary columns (36 URLs)
    ├── evenementen/             # Literary events (247 URLs)
    ├── excursies/               # Literary excursions (464 URLs)
    ├── genres/                  # Books by genre (43 URLs)
    ├── interviews/              # Author interviews (365 URLs)
    ├── leestips/                # Reading tips (64 URLs)
    ├── nieuws/                  # Literary news (4.793 URLs)
    ├── overige/                 # Misc pages (16 URLs)
    ├── personen/                # Person/author data (31.002 URLs)
    ├── poezie/                  # Poet profiles (44 URLs)
    ├── prijzen/                 # Literary awards (4.622 URLs)
    ├── recensies/               # Book reviews (1.982 URLs)
    ├── themas/                  # Themed pages (18 URLs)
    ├── trefwoorden/             # Books by keyword (439 URLs)
    ├── wereldkaart/             # Books by world location (680 URLs)
    └── zoeken/                  # Search pages (51 URLs)