bags

What started as a script to monitor Coach bag sales turned into a number of twitter feeds announcing sales across various women's fashion retailers · source · devlog · bagsonsale · dressfwrd · dressiconic

v0.19.1 - Bug fixes and CI versioning

Feb 15, 2021 • via commits

I fixed several bugs discovered after the structured logging release. The Farfetch scraper had an infinite loop when pagination returned empty results - now it breaks out properly. The FWRD USD price parsing needed another tweak to handle edge cases.

I also improved the CI pipeline with proper versioning that passes build info to the Docker image, and ensured logs are flushed before the application exits so nothing gets lost.
v0.19 - Structured logging and CI pipeline

Feb 14, 2021 • via commits

I added structured logging using Serilog with a PostgreSQL sink. Logs now include enriched context like category information and are stored in the database for easier querying and debugging. This replaces the basic console logging with something much more useful for production monitoring.

I also set up a GitHub Actions CI pipeline to build and test the project on every push. The workflow builds the solution and runs the test suite.

Other fixes today include handling USD prices in the FWRD scraper (the site sometimes shows prices in different currencies), improved error handling that preserves more context when parsing fails, and version information displayed in the console output.
v0.18.2 - Debug logging for failed product parsing

Jan 29, 2021 • via commits

I added debug logging to the FWRD product parser. When parsing a product element fails, the raw HTML is now saved to a payload.txt file before re-throwing the exception. This makes it much easier to diagnose why a particular product couldn’t be parsed - I can see exactly what HTML structure caused the issue.
v0.18.1 - Conditional image quality check

Nov 7, 2020 • via commits

I made the image edge-pixel validation conditional based on the Edit type. The check that filters out images with too many distinct pixels along the edges was being too aggressive for some feeds. Now it only applies to the LegitBags edit where product images are expected to have clean, solid-color backgrounds.
v0.18 - Michael Kors retailer and command refactoring

Nov 1, 2020 • via commits

I added support for Michael Kors, which uses a JSON API similar to Farfetch and Ferragamo. The product data structure is well-defined, making it straightforward to deserialize into strongly-typed classes.

I also refactored the image handling commands. The previous GetMetadataCommand was doing too much - fetching metadata and processing Twitter images. I split it into separate GetMetadata and GetTwitterImages commands, each with a single responsibility. There’s also a new S3UploadImage command dedicated to uploading individual images.

As part of this work, I added logic to block bad images - some product pages return placeholder or error images that shouldn’t be included in tweets.
v0.17 - Prada retailer support

Oct 30, 2020 • via commits

I added support for Prada, starting with their bags category. Prada’s website uses a paginated HTML endpoint that returns product grid fragments, which I fetch directly and parse with AngleSharp.

The product detail pages use responsive images with srcset attributes, so I parse those to extract the highest resolution image URLs. Product tags are pulled from the details tab on each product page.
v0.16 - The Iconic retailer support

Oct 7, 2020 • via commits

I added support for The Iconic, an Australian fashion retailer, starting with their dresses category. This brings the total number of supported retailers to eight.

I also refactored the FWRD category implementation, extracting more common HTML parsing logic into the HtmlHelpers class. This keeps the category-specific code focused on what’s unique to each retailer while sharing the boilerplate.
v0.15 - Product edits for multiple feeds

Oct 6, 2020 • via commits

I introduced an “Edit” concept to support multiple product feeds. An edit is essentially a curated collection or feed - products can be assigned to different edits, and the export command can now generate separate JSON files for each edit. This allows running multiple themed Twitter accounts or website sections from the same product database.

I also fixed the product primary key to be a composite of ID and category, since the same product ID could appear across different retailers. This required a database migration to update the key structure.

The FWRD scraper got some improvements to better handle the dresses category, and all product page objects now include their edit assignment.
v0.14 - Three new retailers - Tory Burch, Ferragamo, Rebecca Minkoff

Oct 5, 2020 • via commits

I added support for three new retailers today: Tory Burch, Ferragamo, and Rebecca Minkoff. The polymorphic ProductCategory architecture from the recent refactor made this straightforward - each retailer just needs its own category class implementing the scraping and metadata extraction logic.

Ferragamo uses a JSON API similar to Farfetch, while Tory Burch and Rebecca Minkoff required HTML parsing. I extracted common HTML parsing utilities into an HtmlHelpers class to reduce duplication across the category implementations.

I also added a TestMetadataCommand for quickly testing that metadata extraction works correctly for a given product URL without going through the full tweet workflow. This is useful when setting up new retailers or debugging extraction issues.
v0.13.1 - Outnet shoes URL fix

Oct 2, 2020 • via commits

I fixed the Outnet shoes URL - the clearance section path had changed, so I updated it to point to the regular shoes/heels category instead. I also improved the logging when there’s nothing new to tweet by including the category name in the message.
v0.13 - ProductCategory polymorphic refactor

Oct 1, 2020 • via commits

I did a major architectural refactoring, replacing the Category enum with a polymorphic ProductCategory class hierarchy. Each retailer now has its own category class (CoachCategory, FarfetchCategory, FwrdCategory, OutnetCategory) that encapsulates retailer-specific behavior like scraping logic, URL construction, and metadata extraction.

This follows the “Enumeration” pattern - a type-safe alternative to enums that allows adding behavior to each value. The Enumeration base class provides ID-based lookup and display name mapping, while each concrete category class implements methods like GetProductMetadataFromUrl and GetProducts.

The refactoring significantly cleaned up the command handlers. Instead of switch statements scattered throughout the codebase checking which category we’re dealing with, the category object itself now knows how to perform its operations. I also added a TestAllCommand that runs through all categories to verify they’re working correctly.
v0.12 - Tweet hashtags and web gallery component

Sep 21, 2020

I integrated the hashtag generator into the tweet workflow. Tweets now include hashtags based on the brand name, making them more discoverable on Twitter. I also fixed a bug in the generator where spaces weren’t being stripped properly.

The scraper now extracts product tags from detail pages and stores them in a new database field. I refactored the GetImagesCommand into a more general GetMetadataCommand that returns both images and tags as ProductMetadata. This metadata can be used to add relevant hashtags beyond just the brand name.

I implemented a full reskin of the website view of the feed and most importantly it now works on mobile. To achieve the design I had to override the styling on strikethrough text which is not available natively. The approach I’ve used to have orange strikethrough is detailed here.

When using material-ui/makeStyles in React (JSS?) you can’t set content: '' as it produces an invalid style. Instead wrap in quotes: content: "''".

I built a new Gallery component with a proper image viewer and placeholder loading states. The gallery uses a flipOnIndex utility to handle image navigation, which I test-drove with unit tests. The component uses JSX and has been refactored for better separation of concerns with ProductDetail, ProductImage, and Placeholder components.
v0.11.1 - ScrapeUrl debugging command

Sep 19, 2020 • via commits

I added a ScrapeUrlCommand utility for debugging scraper issues. It takes a URL as input, navigates to it with Chrome, and logs the page source. This is useful when troubleshooting why a particular page isn’t being parsed correctly - I can see exactly what HTML the scraper is receiving.

I also removed the bundled ChromeDriver package reference since I’m managing the driver installation separately.
v0.11 - Hashtag generator and test project

Sep 18, 2020 • via commits

I added a HashtagGenerator class to convert brand names into Twitter hashtags. The generator strips punctuation, removes spaces, and lowercases the text - so “Off-White” becomes “#offwhite” and “P.A.R.O.S.H.” becomes “#parosh”. This will make tweets more discoverable by adding relevant brand hashtags.

I also set up a proper test project using xUnit and Shouldly. The hashtag generator has good test coverage with theory tests for various edge cases like brands with punctuation, spaces, mixed case, and special characters. Having a solution file now makes it easier to build and test everything together.
v0.10.2 - Responsive grid layout for web frontend

Sep 5, 2020 • via commits

I replaced the fullpage scrolling interface with a responsive grid layout. The fullpage approach was interesting but didn’t scale well when browsing many products - you had to scroll through each one individually.

The new layout uses a flexbox grid that adapts to screen size: two columns on mobile, three on tablet, and four on desktop. Product cards show inline with their images, making it much easier to scan through multiple products quickly.
v0.10.1 - Outnet shoes category

Aug 18, 2020 • via commits

I added a shoes category for The Outnet, specifically targeting heels from their clearance section. The existing Outnet scraping infrastructure made this straightforward - I just needed to add the new category enum value and URL mapping.

I also refactored the scraper to handle multiple Outnet categories more cleanly, making the category a parameter to the ToEntity method rather than hardcoding it. This same pattern was applied to the image source extraction and Twitter image sizing logic.
v0.10 - Farfetch scraper and JSON-based parsing

Aug 17, 2020 • via commits

I added support for Farfetch as a new retailer, with categories for both handbags and shoes. This brings the total number of supported sites to four. The Farfetch implementation parses product data from embedded JSON in the page rather than scraping HTML elements directly.

I also refactored the Outnet scraper to use the same JSON-based approach. Many modern e-commerce sites embed structured product data as JSON for SEO or JavaScript hydration, and parsing this is more reliable than scraping the rendered HTML. I created dedicated classes to deserialize these JSON structures into product entities.

Other improvements include a new GetImageSourcesFromPage command for extracting multiple product images from detail pages, regex helper extensions, and deduplication logic to handle products that appear in multiple category listings.
v0.9.1 - Outnet scraper pagination and AngleSharp switch

Aug 13, 2020 • via commits

I cleaned up the Outnet scraper following the performance comparison from earlier. Based on the results, I switched the OutnetProduct page object model to use AngleSharp for HTML parsing instead of Selenium, removing the duplicate implementation.

I also added pagination support so the scraper can fetch products across multiple pages until it reaches the requested count. This is important since product listings are paginated and a single page might not have enough items.
v0.9 - The Outnet scraper with AngleSharp comparison

Aug 11, 2020 • via commits

I added support for scraping The Outnet, starting with their coats category. This brings the total number of supported retailers to three (Coach, FWRD, and The Outnet).

As part of implementing the new scraper, I ran a performance comparison between extracting product data using Selenium’s element selection versus parsing the page HTML with AngleSharp. The code times both approaches and logs the results. I implemented both OutnetProduct (Selenium-based) and AngleOutnetProduct (AngleSharp-based) page object models to facilitate the comparison.

I also added a headless mode option for Chrome, which will be useful when running the scraper in production environments without a display.
v0.8.1 - Fullpage scrolling product display

Aug 6, 2020 • via commits

I built out the React web frontend with a fullpage scrolling interface using fullpage.js. Each product gets its own full-screen section showing the brand, name, pricing details, and product images in a horizontally scrollable gallery.

The UI displays the original price, sale price, and savings amount, along with a link back to the retailer’s site. I added Material-UI for styling and created helper utilities for currency formatting and extracting the domain name from product URLs using the psl library.

The app now loads its S3 bucket configuration from a config.json file, making it easy to switch between different environments without code changes.
v0.8 - React web frontend and LocalStack setup

Aug 5, 2020 • via commits

I started building a React web frontend to display the scraped products. The site will be hosted on S3 and pull product data from the JSON files generated by the export command. It’s a basic Create React App setup for now that I’ll flesh out with proper product display components.

I also added a combined ScrapeAndTweet command that orchestrates the full workflow - scraping products from a retailer and then tweeting a random one. This simplifies the automation since it’s now a single command instead of running multiple steps.

The export command now uploads the generated JSON files directly to S3 alongside the images. I extracted the S3 upload logic into its own reusable S3UploadCommand to keep things DRY.

For local development, I added a docker-compose configuration with LocalStack to simulate S3 locally, making it much easier to test the S3 integration without hitting AWS.
v0.7 - Export products to paginated JSON

Aug 3, 2020 • via commits

I added an ExportProductsCommand that generates paginated JSON files from the product database. This enables building a static website that can display all the tweeted products without needing a backend server.

The export uses a linked-list pagination approach where each page contains a reference to the next page’s filename, derived from the posting timestamp. The index page contains the most recent products, with subsequent pages containing older entries. There’s an option to export just the first couple of pages for testing, or all pages for a full export.

I also made some tweaks to the backfill command and image processing, including fixing the Twitter image sizing to work correctly across different product categories.
v0.6 - MediatR architecture and S3 image storage

Aug 2, 2020 • via commits

I did a significant architectural overhaul today, introducing the MediatR library to implement the command/handler pattern. This separates the application into distinct commands like ScrapeCommand, GenerateImages, and GenerateContent, each with their own handlers. The main program is now much cleaner with proper dependency injection for services.

The biggest new feature is S3 image storage. Instead of keeping processed images locally, they’re now uploaded to S3 with a date-based folder structure. This makes the images accessible for other uses and reduces local disk requirements.

I also migrated the database to use snake_case column naming conventions, added Dapper alongside Entity Framework for raw SQL queries in the content generation command, and split out brand information from product names to resolve an open issue. Finally, I added a backfill command to retroactively process images for existing products that were scraped before the S3 integration.
v0.5.1 - Handling products without sale prices

Jul 25, 2020 • via commits

I fixed a bug where the FWRD scraper would crash when encountering products that weren’t actually on sale. Some products in the sale category listings don’t have a sale price element, so I made SalePrice nullable and added filtering to skip those products during parsing.

I also replaced all the Console.WriteLine calls with proper ILogger usage, which will make debugging and monitoring the scraper much easier in production.
v0.5 - FWRD dresses and bags categories

Jul 22, 2020 • via commits

I added two new FWRD categories - dresses and bags - expanding the scraper to support four product feeds total. Each category has its own URL endpoint and image handling behavior.

The biggest change was supporting multiple images per tweet. Twitter allows up to four images, so for dresses I include three product views and for bags I include two. The ImageProcessor was extracted into its own class with category-specific image sizing - Coach bags use the standard 1200x628 Twitter card size, while FWRD products use larger 2400px dimensions to better showcase the higher-resolution product photos.

I also refactored the codebase to move database operations into the DatabaseContext class, added a price filter to exclude products over $1000, and cleaned up the main program flow. The different image selection logic per category (using main view vs alternate view) ensures the best product shots are displayed.
v0.4.1 - FWRD scraper improvements and configurable category

Jul 21, 2020 • via commits

I made several improvements to the FWRD scraper to make it more reliable. The main change was switching from Selenium’s element selection to AngleSharp for HTML parsing. FWRD loads products via a lazy-load API endpoint, so now I fetch that endpoint directly and parse the HTML response with AngleSharp rather than trying to interact with the dynamically-loaded page.

I also added a configurable category setting via environment variable, so I can easily switch between scraping Coach bags or FWRD shoes without code changes. The image processing was improved to handle variable-width product images instead of assuming a fixed 628x628 size - it now preserves the aspect ratio and centers the image appropriately in the Twitter card canvas.
v0.4 - Adding FWRD shoe scraping support

Jul 20, 2020 • via commits

I expanded the scraper to support a second retailer: FWRD (Forward). This involved creating a category system to distinguish between product sources, with a new Category enum that currently supports Coach bags and FWRD shoes.

I refactored the product scraping architecture by extracting the Coach-specific logic into a CoachBag page object model and creating a new ForwardProduct class for FWRD. The Product entity was moved out of the database context and enhanced with new Image and Category fields, supported by a database migration.

The product selection logic now filters by category, so each feed tweets independently from its own pool of available products.
v0.3.1 - Image processing for Twitter cards

Jul 18, 2020 • via commits

I added image processing using ImageSharp to prepare product images for Twitter’s card format. The images get resized to 628x628 and centered in a 1200x628 canvas, with the edge pixels extended to fill the gutters on either side. This ensures the product images display nicely in Twitter’s link preview cards.

I also fixed a bug where the posted status wasn’t being saved to the database after tweeting, added null checks for when there are no new products to tweet, and extracted some code into dedicated methods for cleaner organization.
v0.3 - Twitter integration for posting deals

Jul 17, 2020 • via commits

I integrated Twitter posting into the scraper using the Tweetinvi library. Now when the scraper finds sale products, it automatically tweets one with the product image, discount percentage, original price, sale price, and a link to buy.

I also refactored the application to use .NET’s generic host pattern with dependency injection and configuration from user secrets. This cleaned up the main entry point considerably and moved the Chrome driver setup and product fetching into dedicated methods. The configuration now pulls Twitter API credentials and connection strings from environment variables or user secrets rather than command line arguments.
v0.2 - Adding database persistence

Jul 16, 2020 • via commits

I added database persistence to the scraper using Entity Framework Core with PostgreSQL. The Products table tracks scraped items with fields for pricing, timestamps, and posting status.

The scraper now saves products to the database, updating existing entries or creating new ones based on product ID. I also added logic to randomly select an unposted product and download its image for posting. This sets up the foundation for automating social media posts about sale items.
v0.1 - Coach Bags Selenium Scraper

Jul 6, 2020 • via commits

Started a new project to scrape Coach Australia’s bags sale page using Selenium. The initial setup includes a C# console app that fires up Chrome with various anti-detection measures - custom user agent, disabled automation flags, and a patched navigator.webdriver property. I also containerized the whole thing with a Dockerfile so it can run headless without needing a local Chrome installation.