Learn how to extract and normalize product data from supplier websites using an AI-powered Chrome extension. Includes architecture, workflow, and real use cases.
If you run an e-commerce platform, marketplace, or any product-based system, you’ve likely faced this problem: supplier data exists, but not in a usable format. Instead of clean CSVs or APIs, you get product pages, inconsistent layouts, hidden specifications, and pricing that only appears after interaction. At that point, your team has two options: build custom scrapers for every supplier or fall back to manual copy-paste. Neither scales well.
In this article, we show how we solved this using an AI-powered Chrome extension that extracts and normalizes product data directly from supplier websites.
When a Chrome extension is the right tool
A browser extension is not always the best answer. But it is the right answer more often than teams expect, especially when a human already needs to inspect the page.
Approach | Best for | Main downside |
Supplier API or feed | Vendors with stable structured exports | Useless when suppliers do not provide one |
Headless scraper | Repeatable unattended extraction at scale | Fragile on inconsistent, login-heavy, or highly interactive portals |
AI Chrome extension | Human-in-the-loop extraction from live supplier pages | Not ideal for fully autonomous 24/7 crawling |
Manual copy-paste | Very low volume, one-off tasks | Slow, error-prone, impossible to scale |
Extension setup
Everything starts with the
manifest.json file - the ID card of the extension. It tells the browser what the extension can access, including active tabs, storage, and which websites it can run on.During development, we open
chrome://extensions, enable Developer Mode, and load our project folder to test changes instantly. Once everything works, we package it into a ZIP file and upload it to the Google Developer Dashboard. After a quick review and a one-time $5 fee, the extension is ready to go live.How the extraction workflow works
Here is the workflow we designed.
- Open the supplier page The operator goes to the product page or category listing they already use.
- Launch the extension side panel The panel becomes the working surface for extraction, review, batching, and export.
- Select one product or one field The user can click a product card, title, price, SKU, image, or specification field.
- Capture focused page context Instead of dumping the whole website into the model, the extension collects the selected element, nearby labels, useful DOM context, relevant image URLs, and the target schema.
- Run schema-based extraction The model returns structured fields instead of loose text.
- Validate the result Required fields, formats, and confidence thresholds are checked before data is accepted.
- Fix edge cases with manual pick mode If a field is wrong or missing, the operator clicks the exact value on the page and updates the record immediately.
- Export or sync Final records are pushed to the backend or exported as CSV / JSON.
The AI should not be responsible for inventing structure. It should be responsible for mapping messy inputs into a structure you already control.
The features that made the workflow usable in practice
Side panel instead of popup friction
A popup is fine for tiny actions. It is bad for data work.
When people are reviewing product information, they need to see the live supplier page and the structured output at the same time. A side panel makes the workflow feel less like “scraping” and more like a compact internal CMS.
That matters because extraction is not just collection. It is verification.

Multi-select for repeated product layouts
This is where the workflow becomes efficient.
If a category page contains repeated product cards, the extension can detect neighboring containers with similar structure and batch them together. That turns one manual selection into a full-page extraction job.
The important implementation detail is not just “find similar items.” It is batching them safely:
- detect repeated containers
- ignore empty wrappers and obvious non-product blocks
- process in configurable batches
- retry with backoff when the model or provider rate-limits
- show progress in the panel so the operator can monitor the run
Without batching and progress visibility, the workflow feels fragile. With them, it feels operational.

Pick mode for ugly edge cases
No serious extraction system gets every field right on the first pass.
Some websites are too inconsistent. Some fields only appear after interaction. Some prices include labels, ranges, or multiple currencies. Some attributes live in weird places the model cannot infer cleanly.
That is why pick mode matters.
The operator chooses a target field in the panel, clicks the exact value on the page, and the extension maps that value into the structured record. This is the difference between a toy extractor and a real workflow.

Data saving & security
Everything we collect is saved in
chrome.storage.local. If the page is refreshed or the browser crashes, the work is still there. When we are ready, we can export everything as a clean JSON object or a CSV file.Security is a priority, especially when dealing with proprietary supplier data. Access to the extension is gated by a login screen that authenticates against a private backend. API keys are never hardcoded into the source. The extension already delivers value out of the box with simple CSV or JSON export. You can also take this a step further and integrate it with your own API if needed, routing data directly into your internal systems without additional manual steps.
Example workflow: onboarding a new supplier catalog
Hypothetical, but realistic example:
A marketplace team needs to onboard 250 SKUs from a new supplier.
The supplier does not provide a feed. Their site is behind login. Category pages are rendered in React. Product specifications sit inside expandable sections, and some price information changes after the color is selected.
A typical workflow looks like this:
- An operator opens the category page.
- They select one product card.
- Multi-select detects the repeating product containers.
- The extension batches the visible products and extracts title, SKU, price, image, and product URL.
- For each product page, the operator opens the item and extracts long description plus specs.
- If the discount price is buried in a custom widget, pick mode captures it directly.
- Validation flags three products with missing SKU values and one with low-confidence material data.
- The operator fixes only those fields.
- The final output is pushed into the platform as structured product records.
The value here is not “AI scraping.”
The value is reducing supplier onboarding from a brittle copy-paste process into a controlled workflow with review, normalization, and predictable output.
Project context: ProRapp Estate
We, at Proga Tech, built this approach for ProRapp Estate to make supplier data onboarding easier.
The goal was not to create a flashy scraping demo. The goal was to reduce manual work when product information existed on supplier websites but did not exist in a clean import format.
Instead of building one custom ingestion flow per supplier, we designed a workflow where operators could:
- collect data from the live supplier page
- normalize it into a defined structure
- fix outliers immediately
- keep the final product data consistent inside the platform
That is the real business case for this kind of extension.


