A friend of mine (Zach — see his blog at https://www.zacharyfouts.com/) wanted something better than a curl one-liner he'd thrown together to parse motorcycle listings across Texas cities. His original approach:
Zach's original bash version
curl 'http://austin.craigslist.org/search/mcy...' \
'http://collegestation.craigslist.org/search/mcy...' \
'http://houston.craigslist.org/search/mcy...' \
--silent | grep 'dc:title' | sed -e 's/<.*\[//g' -e 's/\&#.*$//g' \
| grep -v 'by owner search'
It works but falls apart the moment you want to filter by price, add more cities, or do anything with the results beyond printing them.
My Python rewrite
The original used pycurl and StringIO (Python 2 only) with broken parsing logic. Here's an updated version using requests and BeautifulSoup that actually works and handles deduplication across cities:
#!/usr/bin/env python3
"""
Scrape Craigslist motorcycle listings across multiple cities.
Note: scraping Craigslist is against their ToS — use at your own risk.
"""
import requests
from bs4 import BeautifulSoup
import time
CITIES = [
'austin',
'collegestation',
'houston',
'killeen',
'sanantonio',
'sanmarcos',
'waco',
]
SEARCH_PARAMS = {
'hasPic': 1,
'postedToday': 1,
'max_price': 2200,
'auto_title_status': 1, # clean title only
}
def fetch_listings(city, category='mcy', params=None):
"""Fetch listings from one Craigslist city."""
url = f'https://{city}.craigslist.org/search/{category}'
headers = {'User-Agent': 'Mozilla/5.0'}
try:
resp = requests.get(url, params=params or {}, headers=headers, timeout=10)
resp.raise_for_status()
return parse_listings(resp.text, city)
except requests.RequestException as e:
print(f' Error fetching {city}: {e}')
return []
def parse_listings(html, city):
"""Parse listing titles, prices, and links from search results page."""
soup = BeautifulSoup(html, 'html.parser')
results = []
for item in soup.select('li.result-row'):
title_tag = item.select_one('a.result-title')
price_tag = item.select_one('span.result-price')
if not title_tag:
continue
results.append({
'city': city,
'title': title_tag.get_text(strip=True),
'price': price_tag.get_text(strip=True) if price_tag else 'n/a',
'url': title_tag['href'],
})
return results
def main():
seen_urls = set()
all_listings = []
for city in CITIES:
print(f'Fetching {city}...')
listings = fetch_listings(city, params=SEARCH_PARAMS)
for listing in listings:
if listing['url'] not in seen_urls:
seen_urls.add(listing['url'])
all_listings.append(listing)
time.sleep(1) # be polite
print(f'\nFound {len(all_listings)} unique listings:\n')
for l in sorted(all_listings, key=lambda x: x['price']):
print(f" [{l['city']}] {l['price']:>6} {l['title']}")
print(f" {l['url']}")
if __name__ == '__main__':
main()
Requirements
pip install requests beautifulsoup4
Running it
python3 craigslist_search.py
Output looks like:
Fetching austin...
Fetching houston...
...
Found 47 unique listings:
[waco] $800 2003 Honda Shadow 750
https://waco.craigslist.org/mcy/...
[austin] $1200 2007 Kawasaki Ninja 500
https://austin.craigslist.org/mcy/...
Fork it, swap mcy for whatever category you're hunting in, and adjust the price filters. Just be aware scraping Craigslist is against their ToS, so don't hammer their servers.