Skip to main content
George Khananaev
Case Study

Building Google Reviews Scraper Pro

A resilient Python web scraper for multi-language Google Maps reviews

2 min read

Overview

Google Reviews Scraper Pro is a Python tool that extracts reviews from Google Maps listings, handles multiple languages, downloads review images, and stores the results in MongoDB. It was built to solve a real operational problem: manually collecting reviews for thousands of listings is not just slow, it is error-prone and impossible to scale.

The problem it solves

Review data is locked behind a JavaScript-heavy interface that actively resists scraping. Off-the-shelf tools break within weeks because Google rotates DOM selectors, throttles requests, and serves different markup to different user agents. This project takes the long view: it assumes the DOM will change and designs around that assumption.

Key features

  • Multi-language extraction. Reviews are captured regardless of their original language, with metadata preserved for later translation or classification.
  • Incremental scraping. On subsequent runs it picks up where it left off, only fetching new reviews. This makes daily cron runs cheap.
  • Image downloading. Reviews with photos get their images pulled into storage, with URLs rewritten to point at the local copies.
  • MongoDB integration. Built-in persistence means no CSV juggling. Queries are fast and the schema supports filtering by rating, language, date, and author.
  • Detection resilience. Rate limiting, user-agent rotation, and request shaping keep the scraper under the radar.

Tech stack

Python, Playwright for headless browsing, BeautifulSoup for parsing, MongoDB for storage, and Pillow for image processing. Dockerized so it runs anywhere with one command.

What I would do differently

If I were rebuilding this today, I'd move the image pipeline to a proper object store (Cloudflare R2 or S3) instead of local filesystem, and I'd split the scraping logic from the persistence layer so each can be tested independently. The current version couples them tightly, which makes unit tests awkward.

Takeaway

Scraping at scale is less about clever selectors and more about resilience. Every decision — rate limits, retries, checkpointing, logging — matters more than the HTML parsing itself. Build assuming things will break, and they break less.

Travel Panel: the core travel management platform
Travel Panel: the core travel management platform illustrationFeaturedMoon Holidays
11 minDec 2022 — Present

Travel Panel: the core travel management platform

FastAPI backend, Next.js operator portal, and B2B partner portal powering Moon Holidays end to end

Travel Panel is the core system at Moon Holidays. A FastAPI backend, a Next.js operator portal, a B2B partner portal, and the orchestrator for every downstream product: TravelOffer for end customers, Live Deck for call-center TVs, Vercel Controller for deployment cache, StaySync for allotment availability, and a WebSocket messenger for internal communication. Running on AWS with ALB, MemoryDB, CloudFront, S3, and more.

fastapinextjspythontypescript
py-image-compressor: batch image compression in Python
2 min36

py-image-compressor: batch image compression in Python

A lightweight CLI for compressing, converting, and resizing images in bulk

A small Python utility for compressing, converting, and resizing hundreds of images at once. Supports modern formats like WebP, handles recursive directory scans, and preserves structure so you can run it on a whole asset folder safely.

pythoncliimageswebp
Moon Support Hub: an enterprise ticketing platform
Moon Support Hub: an enterprise ticketing platform illustrationMoon Holidays
6 minDec 2025 — Present

Moon Support Hub: an enterprise ticketing platform

Next.js 16 with Prisma on MongoDB, 740+ source files, 186 React components, 135+ API endpoints, 60 models, and 14 scheduled background jobs

Moon Support Hub is a full-featured enterprise support system: ticketing with SLA management, a knowledge base with publishing workflow, role-based access control, MinIO attachments, and a customer portal. Built on Next.js 16 with Prisma on MongoDB, 740+ source files, 186 React components, 135+ API endpoints, 60 models, 14 scheduled background jobs, and 8 pre-built reports.

leadershipnextjsmongodbprisma