Saving Literatuurplein.nl to the Wayback Machine

Latest update: 22-04-2026

Screenshot of homepage of Literatuurplein.nl, 04-12-2019

Wayback Machine screenshots

Homepage	Thea Beckman	Literaire prijzen

Screenshot of homepage, as archived in Wayback Machine on 25-11-2019	Screenshot of Author detail page about Thea Beckman, as archived in Wayback Machine on 29-11-2019	Screenshot of Overview of literary prizes, as archived in Wayback Machine on 29-11-2019

Canon van de Nederlandse geschiedenis	Archief Nieuwsberichten	Recensies

Screenshot of Canon van de Nederlandse geschiedenis, as archived in Wayback Machine on 28-11-2019	Screenshot of Archief Nieuwsberichten, as archived in Wayback Machine on 14-12-2019	Screenshot of Recensies page, as archived in Wayback Machine on 30-11-2019

About

The site www.literatuurplein.nl has been phased out per 16 December 2019.

To preserve its content, e.g. for sourcing Wikipedia articles or (Wiki)data purposes, the KB submitted copies of (=archived) its most relevant pages to The Wayback Machine (WBM) of The Internet Archive during November and December 2019.

The results of this archiving effort are listed in the .xlsx and .tsv files that can be found using the table below.

Data overview

Each Category contains a README with statistics about the data and download links to TSV and Excel files.

Category	Description	Total URLs
personen	Persons - mainly authors from the Netherlands, but also from abroad	31.002
boeken	Descriptions (metadata) of books. No explicit titles or authors provided	16.677
nieuws	Literary news archive	4.793
prijzen	Literary awards in the Netherlands and Flanders	4.622
adressenbank	Names and addresses of literary organisations (publishers, book sellers, libraries, reading clubs etc.)	3.464
canon	Book titles related to the 50 topics in the canon of Dutch history	3.006
recensies	Reviews of literary publications	1.982
wereldkaart	Book titles related to certain locations on the world map	680
excursies	Literary excursions to cities, towns and villages in the Netherlands and abroad	464
trefwoorden	Book titles related to certain keywords	439
interviews	Interviews with Dutch and foreign authors. Includes full-texts	365
evenementen	Events from the literary agenda	247
leestips	Reading tips	64
zoeken	Pages related to simple and advanced search	51
poezie	Profiles of 21 Dutch and Belgian poets	44
genres	Book titles according to literary genre	43
columns	Literary columns	36
themas	Pages related to certain themes	18
overige	Pages like Sitemap, Contact, Disclaimer, Colophon etc.	16

The data

Every Excel file contains 4 standard columns:
- LiteratuurpleinURL : URL of the page on literatuurplein.nl. As this site has been phased out by now, these URLs are not accessible anymore.
- LiteratuurpleinArchiefURL : WBM URL of the archived page , starting with http://web.archive.org/web/
- ArchiefURLStatusCheck-datestamp : HTTP response status code of the WBM page, indicating if that page could be requested without issues at the given datestamp. All pages should have Status 200 = OK.
- Klik : Clicking on this will open the archived page in a browser.
Additionaly, some Excels contain extra columns, including unique IDs, page titles, person names, places or dates.
For every .xlsx there is a .tsv (tab separated value) in plain text Unicode UTF-8. This can be readily imported/exported to other data formats.
One page can be available under multiple URLs. For example, if you look into literatuurplein-adressenbank_03122019.tsv you see three lines for “55 Ambo/Anthos uitgevers, Herengracht 499, Amsterdam Noord-Holland”, as this page was available under 3 distinct URLs:
- https://literatuurplein.nl/detail/organisatie/ambo-anthos-uitgevers/55
- https://www.literatuurplein.nl/detail/organisatie/ambo-anthos-uitgevers/55
- https://www.literatuurplein.nl/organisatie.jsp?orgId=55
Because I archived URLs, not pages, this also means that this page has been archived under three distinct WBM URLs.
No overall file list is provided, you’ll need to compose that yourself from the individual .xlsx/.tsv files if you need it.

Short description per file

For readability the

prefix literatuurplein- , the
suffix (_03122019), the datestamp when the file was created, and the
file extension (.xlsx / .tsv) are omitted from the filenames below

The number behind the filename is the number of URLs captured (= number of rows in the Excel -1)

Persons (`data/personen/`)

personen-allen (19.404) : Persons - mainly authors from the Netherlands, but also from abroad. Without dates of birth & death and places of birth & death. Persons can occur more than once (as I archived URLs, not pages)
personen-namen-datums-plaatsen (11.598) : Subset of personen-allen containing only named persons. Persons occur only once. Additionally in many cases the dates of birth & death and places of birth & death are listed. The plan is to merge all these persons into Wikidata in the near future.

Literary prizes (`data/prijzen/`)

prijzen (243) : Literary awards in the Netherlands and Flanders. Individual editions on these awards are listed in prijzen-edities.
prijzen-edities (2.347) : Editions of literay awards in the Netherlands and Flanders.
prijzen-totaal (2.032) : Combined deduplicated listing of both awards and editions.

Books (`data/boeken/`, `data/canon/`, `data/wereldkaart/`, `data/trefwoorden/`, `data/genres/`)

boeken (16.677) : Descritions (metadata) of books. No explicit titles of authors provided.
canon (3.006) : Book titles related to the 50 topics in the canon of Dutch history.
wereldkaart (680) : Books titles related to certain locations on the world map.
trefwoorden (439) : Books titles related to certain keywords.
genres (43) : Book titles according to literary genre.

Data sources

The data to make the above files was obtained from 3 sources:

1) Most relevant subsites of www.literatuurplein.nl : Page URLs and page content under the menu items Nieuws - Columns - Interviews - Literaire prijzen - Recensies - Canon - Excursies - Poezie - Literaire adressen, obtained via webscraping. 2) Most visited pages : URLs of pages that were requested 30 or more times over the last 5 years, obtained via Google Analytics. 3) Persons data : A data dump from the Literatuurplein CMS, containing the names, dates of birth & death and places of birth & death of 10.027 persons (mainly authors).

Steps taken

1) For webscraping source 1 I used the Chrome-plugin of Webscraper.io. With this tool you can specify which page URLs and HTML-elements (title, headers, bullet lists etc) you want to extract from a website. The result can be downloaded as a csv file for futher processing in Excel.

2) To get the URLs of the most visited pages (source 2), I used Google Analytics. This were 32K URLs in total, out of a total of 964K URLs that were requested in that time period (extreme long tail distribution).

3) In the data dump (source 3) I transfomed the ID in column 1 (e.g. 161934) into a Leesplein URL (https://www.literatuurplein.nl/persdetail?persId=161934). This data dump ended up in personen-allen and personen-namen-datums-plaatsen

4) I combined these three lists of URLs into a single list and did some deduplication (using Excel) to avoid any overlap, as the three sources are not necessarily disjunct.

5) Using a url-status-checker script I checked if all the Literatuurplein URLs actually worked (= status 200). This took many hours. I deleted the URLs giving 404s or other errors.

6) Once all the preparations were done, it was now time to actually archive all URLs to the Wayback Machine. For that I ran a Python script using waybackpy. This was not a 100% process, some URLs could not be captured correctly by the WBM and were thus omitted from further processing. See ../../scripts/wbm-archiver/ for current archiving scripts.

7) To make sure all generated WBM URLs actually work, I again ran the url-status-checker script, but now with the archived URLs as input. Once again this took many hours. I deleted the URLs giving 404s or other errors.

8) For improved overview I split up URLs list into 22 Excels, according to the file listing above.

9) I converted all Excels into open .tsv (tab separated value) files in plain text Unicode UTF-8. These can be readily imported/exported to other data formats.

Folder structure

Literatuurplein/
├── index.md                     # This page
├── README.md                    # Mirror of index.md
├── images/                      # Screenshots of the website in the WBM
└── data/                        # Archived URL data files organized by category
    ├── adressenbank/            # Literary organisations (3.464 URLs)
    ├── boeken/                  # Book metadata (16.677 URLs)
    ├── canon/                   # Canon of Dutch history books (3.006 URLs)
    ├── columns/                 # Literary columns (36 URLs)
    ├── evenementen/             # Literary events (247 URLs)
    ├── excursies/               # Literary excursions (464 URLs)
    ├── genres/                  # Books by genre (43 URLs)
    ├── interviews/              # Author interviews (365 URLs)
    ├── leestips/                # Reading tips (64 URLs)
    ├── nieuws/                  # Literary news (4.793 URLs)
    ├── overige/                 # Misc pages (16 URLs)
    ├── personen/                # Person/author data (31.002 URLs)
    ├── poezie/                  # Poet profiles (44 URLs)
    ├── prijzen/                 # Literary awards (4.622 URLs)
    ├── recensies/               # Book reviews (1.982 URLs)
    ├── themas/                  # Themed pages (18 URLs)
    ├── trefwoorden/             # Books by keyword (439 URLs)
    ├── wereldkaart/             # Books by world location (680 URLs)
    └── zoeken/                  # Search pages (51 URLs)

Wikidata:WikiProject Dutch Literary Awards - Uses data from this archive
KB GLAM Wikidata projects - Broader context

SaveToWaybackMachine

Saving Literatuurplein.nl to the Wayback Machine

Wayback Machine screenshots

About

Data overview

The data

Short description per file

Persons (`data/personen/`)

Literary prizes (`data/prijzen/`)

Books (`data/boeken/`, `data/canon/`, `data/wereldkaart/`, `data/trefwoorden/`, `data/genres/`)

Other content

Data sources

Steps taken

Folder structure

Saving Literatuurplein.nl to the Wayback Machine

Wayback Machine screenshots

About

Data overview

The data

Short description per file

Persons (data/personen/)

Literary prizes (data/prijzen/)

Books (data/boeken/, data/canon/, data/wereldkaart/, data/trefwoorden/, data/genres/)

Other content

Data sources

Steps taken

Folder structure

Related projects

Persons (`data/personen/`)

Literary prizes (`data/prijzen/`)

Books (`data/boeken/`, `data/canon/`, `data/wereldkaart/`, `data/trefwoorden/`, `data/genres/`)