WEB SCRAPER TESTING GROUND

TEXT LIST (version 1)

Some web publishers do not trouble themselves by formatting their data using HTML elements and often simply put information on their web site as a plain text. What is even worse, they sometimes add supplementary notes in the same manner as the main information making it harder to separate them. But a good web scraper should overcome all these obstacles.

In this test, the web scraper needs to scrape a list of US cities with their population organized as a simple text. Specifically, it has to:

  1. Extract all the cities and their population, while skipping all the notes
  2. Scrape cities with their notes (if any)
  3. Scrape bold cities (with their population) only

There is a ver parameter (which varies from 1 to 5) to show different list versions (with different city numbers, bold cities and their notes).

For testing, you may use the following sample links. The scraper should sufficiently scrape all data from any link using the same project:

CITY          POPULATION
------------------------
New York      8,244,910
(City of New York)
Los Angeles   3,819,702
Chicago       2,707,120
change: +0.43%
Houston       2,145,146
Philadelphia  1,536,471
(City of Philadelphia)
Phoenix       1,469,471