API Reference¶
Scraper¶
This module carries out an efficient scrape of the WIT dataset using multiprocessing and minimal PNG chunk header handling.
- wikitransp.scraper.scrape_images(sample=False, resume_at=None, resume_after=None, decompress_tsv=False, fetch_async=True)[source]¶
Build a local dataset by scanning the WIT datatset (or a small sample of it) for suitable PNGs. Note: only pass one of
resume_at
orresume_after
.- Parameters
sample – Whether to only scrape the 1% sample dataset
resume_at – The image URL to resume at (if scraping was interrupted).
resume_at – The image URL to resume after (if scraping was interrupted).
decompress_tsv – Whether to decompress gzipped TSVs before filtering (not necessary, and will increase dataset file size on disk).