Saving Literatuurplein.nl to the Wayback Machine
Latest update: 22-04-2026
Screenshot of homepage of Literatuurplein.nl, 04-12-2019
Wayback Machine screenshots
| Homepage | Thea Beckman | Literaire prijzen |
|---|---|---|
![]() |
![]() |
![]() |
| Screenshot of homepage, as archived in Wayback Machine on 25-11-2019 | Screenshot of Author detail page about Thea Beckman, as archived in Wayback Machine on 29-11-2019 | Screenshot of Overview of literary prizes, as archived in Wayback Machine on 29-11-2019 |
| Canon van de Nederlandse geschiedenis | Archief Nieuwsberichten | Recensies |
|---|---|---|
![]() |
![]() |
![]() |
| Screenshot of Canon van de Nederlandse geschiedenis, as archived in Wayback Machine on 28-11-2019 | Screenshot of Archief Nieuwsberichten, as archived in Wayback Machine on 14-12-2019 | Screenshot of Recensies page, as archived in Wayback Machine on 30-11-2019 |
About
The site www.literatuurplein.nl has been phased out per 16 December 2019.
To preserve its content, e.g. for sourcing Wikipedia articles or (Wiki)data purposes, the KB submitted copies of (=archived) its most relevant pages to The Wayback Machine (WBM) of The Internet Archive during November and December 2019.
The results of this archiving effort are listed in the .xlsx and .tsv files that can be found using the table below.
Data overview
Each Category contains a README with statistics about the data and download links to TSV and Excel files.
| Category | Description | Total URLs |
|---|---|---|
| personen | Persons - mainly authors from the Netherlands, but also from abroad | 31.002 |
| boeken | Descriptions (metadata) of books. No explicit titles or authors provided | 16.677 |
| nieuws | Literary news archive | 4.793 |
| prijzen | Literary awards in the Netherlands and Flanders | 4.622 |
| adressenbank | Names and addresses of literary organisations (publishers, book sellers, libraries, reading clubs etc.) | 3.464 |
| canon | Book titles related to the 50 topics in the canon of Dutch history | 3.006 |
| recensies | Reviews of literary publications | 1.982 |
| wereldkaart | Book titles related to certain locations on the world map | 680 |
| excursies | Literary excursions to cities, towns and villages in the Netherlands and abroad | 464 |
| trefwoorden | Book titles related to certain keywords | 439 |
| interviews | Interviews with Dutch and foreign authors. Includes full-texts | 365 |
| evenementen | Events from the literary agenda | 247 |
| leestips | Reading tips | 64 |
| zoeken | Pages related to simple and advanced search | 51 |
| poezie | Profiles of 21 Dutch and Belgian poets | 44 |
| genres | Book titles according to literary genre | 43 |
| columns | Literary columns | 36 |
| themas | Pages related to certain themes | 18 |
| overige | Pages like Sitemap, Contact, Disclaimer, Colophon etc. | 16 |
The data
- Every Excel file contains 4 standard columns:
- LiteratuurpleinURL : URL of the page on literatuurplein.nl. As this site has been phased out by now, these URLs are not accessible anymore.
- LiteratuurpleinArchiefURL : WBM URL of the archived page , starting with http://web.archive.org/web/
- ArchiefURLStatusCheck-datestamp : HTTP response status code of the WBM page, indicating if that page could be requested without issues at the given datestamp. All pages should have Status 200 = OK.
- Klik : Clicking on this will open the archived page in a browser.
- Additionaly, some Excels contain extra columns, including unique IDs, page titles, person names, places or dates.
- For every .xlsx there is a .tsv (tab separated value) in plain text Unicode UTF-8. This can be readily imported/exported to other data formats.
- One page can be available under multiple URLs. For example, if you look into literatuurplein-adressenbank_03122019.tsv you see three lines for “55 Ambo/Anthos uitgevers, Herengracht 499, Amsterdam Noord-Holland”, as this page was available under 3 distinct URLs:
https://literatuurplein.nl/detail/organisatie/ambo-anthos-uitgevers/55https://www.literatuurplein.nl/detail/organisatie/ambo-anthos-uitgevers/55https://www.literatuurplein.nl/organisatie.jsp?orgId=55
Because I archived URLs, not pages, this also means that this page has been archived under three distinct WBM URLs.
- No overall file list is provided, you’ll need to compose that yourself from the individual .xlsx/.tsv files if you need it.
Short description per file
For readability the
- prefix literatuurplein- , the
- suffix (_03122019), the datestamp when the file was created, and the
- file extension (.xlsx / .tsv) are omitted from the filenames below
The number behind the filename is the number of URLs captured (= number of rows in the Excel -1)
Persons (data/personen/)
- personen-allen (19.404) : Persons - mainly authors from the Netherlands, but also from abroad. Without dates of birth & death and places of birth & death. Persons can occur more than once (as I archived URLs, not pages)
- personen-namen-datums-plaatsen (11.598) : Subset of personen-allen containing only named persons. Persons occur only once. Additionally in many cases the dates of birth & death and places of birth & death are listed. The plan is to merge all these persons into Wikidata in the near future.
Literary prizes (data/prijzen/)
- prijzen (243) : Literary awards in the Netherlands and Flanders. Individual editions on these awards are listed in prijzen-edities.
- prijzen-edities (2.347) : Editions of literay awards in the Netherlands and Flanders.
- prijzen-totaal (2.032) : Combined deduplicated listing of both awards and editions.
Books (data/boeken/, data/canon/, data/wereldkaart/, data/trefwoorden/, data/genres/)
- boeken (16.677) : Descritions (metadata) of books. No explicit titles of authors provided.
- canon (3.006) : Book titles related to the 50 topics in the canon of Dutch history.
- wereldkaart (680) : Books titles related to certain locations on the world map.
- trefwoorden (439) : Books titles related to certain keywords.
- genres (43) : Book titles according to literary genre.
Other content
- nieuws (4.793) : Literary news archive.
- adressenbank (3.464) : Names and adresses of literary organisations (publishers, book sellers, libraries, reading clubs etc.). Mainly in the Netherlands, sortable by province. Some in Belgium and Europe.
- recensies (1.982) : Reviews of literary publications.
- excursies (464) : Literary excursions to cities, towns and villages in the Netherlands and abroad.
- interviews (364) : Interviews with Dutch and foreign authors. Inludes full-texts of the interviews.
- evenementen (247) : Events from the literary agenda.
- leestips (64) : Reading tips.
- poezie (44) : Profiles of 21 Dutch and Belgium poets
- columns (36) : Literary columns.
- themas (18) : Pages related to certain themes.
- zoeken (51) : Pages related to simple and advanced search.
- overige (16) : Pages like Sitemap, Contact, Disclaimer, Colophon etc.
Obviously, more Literatuurplein URLs than are listed here are (likely to be) available in the WBM. This is because apart from the active archiving effort I’ve conducted, the WMB crawler/archiver has visited the site over its lifetime, thus archiving pages for many years (passive archiving).
Data sources
The data to make the above files was obtained from 3 sources:
1) Most relevant subsites of www.literatuurplein.nl : Page URLs and page content under the menu items Nieuws - Columns - Interviews - Literaire prijzen - Recensies - Canon - Excursies - Poezie - Literaire adressen, obtained via webscraping. 2) Most visited pages : URLs of pages that were requested 30 or more times over the last 5 years, obtained via Google Analytics. 3) Persons data : A data dump from the Literatuurplein CMS, containing the names, dates of birth & death and places of birth & death of 10.027 persons (mainly authors).
Steps taken
1) For webscraping source 1 I used the Chrome-plugin of Webscraper.io. With this tool you can specify which page URLs and HTML-elements (title, headers, bullet lists etc) you want to extract from a website. The result can be downloaded as a csv file for futher processing in Excel.
2) To get the URLs of the most visited pages (source 2), I used Google Analytics. This were 32K URLs in total, out of a total of 964K URLs that were requested in that time period (extreme long tail distribution).
3) In the data dump (source 3) I transfomed the ID in column 1 (e.g. 161934) into a Leesplein URL (https://www.literatuurplein.nl/persdetail?persId=161934). This data dump ended up in personen-allen and personen-namen-datums-plaatsen
4) I combined these three lists of URLs into a single list and did some deduplication (using Excel) to avoid any overlap, as the three sources are not necessarily disjunct.
5) Using a url-status-checker script I checked if all the Literatuurplein URLs actually worked (= status 200). This took many hours. I deleted the URLs giving 404s or other errors.
6) Once all the preparations were done, it was now time to actually archive all URLs to the Wayback Machine. For that I ran a Python script using waybackpy. This was not a 100% process, some URLs could not be captured correctly by the WBM and were thus omitted from further processing. See ../../scripts/wbm-archiver/ for current archiving scripts.
7) To make sure all generated WBM URLs actually work, I again ran the url-status-checker script, but now with the archived URLs as input. Once again this took many hours. I deleted the URLs giving 404s or other errors.
8) For improved overview I split up URLs list into 22 Excels, according to the file listing above.
9) I converted all Excels into open .tsv (tab separated value) files in plain text Unicode UTF-8. These can be readily imported/exported to other data formats.
Folder structure
Literatuurplein/
├── index.md # This page
├── README.md # Mirror of index.md
├── images/ # Screenshots of the website in the WBM
└── data/ # Archived URL data files organized by category
├── adressenbank/ # Literary organisations (3.464 URLs)
├── boeken/ # Book metadata (16.677 URLs)
├── canon/ # Canon of Dutch history books (3.006 URLs)
├── columns/ # Literary columns (36 URLs)
├── evenementen/ # Literary events (247 URLs)
├── excursies/ # Literary excursions (464 URLs)
├── genres/ # Books by genre (43 URLs)
├── interviews/ # Author interviews (365 URLs)
├── leestips/ # Reading tips (64 URLs)
├── nieuws/ # Literary news (4.793 URLs)
├── overige/ # Misc pages (16 URLs)
├── personen/ # Person/author data (31.002 URLs)
├── poezie/ # Poet profiles (44 URLs)
├── prijzen/ # Literary awards (4.622 URLs)
├── recensies/ # Book reviews (1.982 URLs)
├── themas/ # Themed pages (18 URLs)
├── trefwoorden/ # Books by keyword (439 URLs)
├── wereldkaart/ # Books by world location (680 URLs)
└── zoeken/ # Search pages (51 URLs)
Related projects
- Wikidata:WikiProject Dutch Literary Awards - Uses data from this archive
- KB GLAM Wikidata projects - Broader context





