Scraper Lesson 1 - Music JSON Builder
Learn how to build a valid scraper JSON payload for a music listing page
Lesson Goal
Build a JSON payload from scratch and scrape a music category listing.
By the end, you should understand:
- how
url,itemsSelector, andfieldswork together - how pagination is modeled in JSON
- how
limitsprotects your scraper from overloading a site
Step 1: Understand the JSON Shape
Your scraper request has five main parts:
url- where scraping starts.pagination- how to find the next page.itemsSelector- CSS selector for each repeated card/item.fields- what data to extract from each item.limits- guardrails (maxPages,timeoutMs, etc.).
Step 2: Build Your Music Payload
Use this as your baseline target:
{
"url": "https://books.toscrape.com/catalogue/category/books/music_14/index.html",
"pagination": {
"type": "nextLink",
"selector": "li.next a",
"attr": "href"
},
"itemsSelector": "article.product_pod",
"fields": {
"title": { "selector": "h3 a", "mode": "attr", "attr": "title" },
"detailLink": { "selector": "h3 a", "mode": "attr", "attr": "href" },
"price": { "selector": ".price_color", "mode": "text" },
"stock": { "selector": ".instock.availability", "mode": "text" }
},
"limits": {
"maxPages": 2,
"maxItems": 30,
"maxConcurrency": 4,
"timeoutMs": 15000
}
}
Step 3: Complete the Student Tasks
- Run the payload and confirm you get a list of music books.
- Add a new field named
ratingClassfromp.star-ratingwith modeattr. - Change
maxPagesto1and compare output size. - Explain why
itemsSelectormust point to each item card, not the whole page.
Example Runner
This runner is preloaded with the music template: