Introduction

Installing autoextract-poet

autoextract-poet is a regular PyPI package that can be installed using pip: pip install autoextract-poet. It is also a dependency of scrapy-autoextract, and installed automatically if you use scrapy-autoextract.

Basic usage

You can use items defined by autoextract-poet just as regular Python objects, to standardize item definitions. They are implemented as attr.s classes, and can be used as Scrapy items directly, or converted to dictionaries (e.g. for serialization) via itemadapter. The full list of items can be seen here autoextract_poet.items.

scrapy-autoextract provides an automatic way to extract items defined here from any website, using Scrapy and Autoextract API. See its scrapy-autoextract documentation for more.

Compatibility with new fields added to the API

Eventually, some new fields could be added to the Autoextract API. When you’re creating autoextract-poet items from Autoextract responses, the library would ignore unknown fields by default, until you upgrade the library to a version containing the new field. But you might want to keep the unknown (new) fields even if you don’t update the autoextract-poet library.

If you’re using Scrapy (or itemadapter), you can make these unknown attributes exposed in the output by registering AutoExtractAdapter in itemadapter’s ADAPTER_CLASSES:

from autoextract_poet import AutoExtractAdapter
from itemadapter import ItemAdapter
ItemAdapter.ADAPTER_CLASSES.appendleft(AutoExtractAdapter)

For example, you can put this code to settings.py of your Scrapy project.