Blog
October 19, 2015 Marie H.

Parsing craigslist for an item in multiple cities

Parsing craigslist for an item in multiple cities

Photo by <a href="https://unsplash.com/@markusspiske?utm_source=cloudista&utm_medium=referral" target="_blank" rel="noopener">Markus Spiske</a> on <a href="https://unsplash.com/?utm_source=cloudista&utm_medium=referral" target="_blank" rel="noopener">Unsplash</a>

A friend of mine (Zach — see his blog at https://www.zacharyfouts.com/) wanted something better than a curl one-liner he'd thrown together to parse motorcycle listings across Texas cities. His original approach:

Zach's original bash version

curl 'http://austin.craigslist.org/search/mcy...' \
     'http://collegestation.craigslist.org/search/mcy...' \
     'http://houston.craigslist.org/search/mcy...' \
     --silent | grep 'dc:title' | sed -e 's/<.*\[//g' -e 's/\&#.*$//g' \
     | grep -v 'by owner search'

It works but falls apart the moment you want to filter by price, add more cities, or do anything with the results beyond printing them.

My Python rewrite

The original used pycurl and StringIO (Python 2 only) with broken parsing logic. Here's an updated version using requests and BeautifulSoup that actually works and handles deduplication across cities:

#!/usr/bin/env python3
"""
Scrape Craigslist motorcycle listings across multiple cities.
Note: scraping Craigslist is against their ToS — use at your own risk.
"""
import requests
from bs4 import BeautifulSoup
import time

CITIES = [
    'austin',
    'collegestation',
    'houston',
    'killeen',
    'sanantonio',
    'sanmarcos',
    'waco',
]

SEARCH_PARAMS = {
    'hasPic': 1,
    'postedToday': 1,
    'max_price': 2200,
    'auto_title_status': 1,  # clean title only
}

def fetch_listings(city, category='mcy', params=None):
    """Fetch listings from one Craigslist city."""
    url = f'https://{city}.craigslist.org/search/{category}'
    headers = {'User-Agent': 'Mozilla/5.0'}
    try:
        resp = requests.get(url, params=params or {}, headers=headers, timeout=10)
        resp.raise_for_status()
        return parse_listings(resp.text, city)
    except requests.RequestException as e:
        print(f'  Error fetching {city}: {e}')
        return []

def parse_listings(html, city):
    """Parse listing titles, prices, and links from search results page."""
    soup = BeautifulSoup(html, 'html.parser')
    results = []
    for item in soup.select('li.result-row'):
        title_tag = item.select_one('a.result-title')
        price_tag = item.select_one('span.result-price')
        if not title_tag:
            continue
        results.append({
            'city':  city,
            'title': title_tag.get_text(strip=True),
            'price': price_tag.get_text(strip=True) if price_tag else 'n/a',
            'url':   title_tag['href'],
        })
    return results

def main():
    seen_urls = set()
    all_listings = []

    for city in CITIES:
        print(f'Fetching {city}...')
        listings = fetch_listings(city, params=SEARCH_PARAMS)
        for listing in listings:
            if listing['url'] not in seen_urls:
                seen_urls.add(listing['url'])
                all_listings.append(listing)
        time.sleep(1)  # be polite

    print(f'\nFound {len(all_listings)} unique listings:\n')
    for l in sorted(all_listings, key=lambda x: x['price']):
        print(f"  [{l['city']}] {l['price']:>6}  {l['title']}")
        print(f"           {l['url']}")

if __name__ == '__main__':
    main()

Requirements

pip install requests beautifulsoup4

Running it

python3 craigslist_search.py

Output looks like:

Fetching austin...
Fetching houston...
...

Found 47 unique listings:

  [waco]        $800  2003 Honda Shadow 750
               https://waco.craigslist.org/mcy/...
  [austin]     $1200  2007 Kawasaki Ninja 500
               https://austin.craigslist.org/mcy/...

Fork it, swap mcy for whatever category you're hunting in, and adjust the price filters. Just be aware scraping Craigslist is against their ToS, so don't hammer their servers.