Introduction¶
Installing autoextract-poet¶
autoextract-poet
is a regular PyPI package that can be installed
using pip
: pip install autoextract-poet
. It is also a dependency
of scrapy-autoextract, and installed automatically
if you use scrapy-autoextract.
Basic usage¶
You can use items defined by autoextract-poet just as regular Python objects,
to standardize item definitions. They are implemented as attr.s
classes, and
can be used as Scrapy items directly, or converted
to dictionaries (e.g. for serialization) via itemadapter. The full list of
items can be seen here autoextract_poet.items
.
scrapy-autoextract provides an automatic way to extract items defined here from any website, using Scrapy and Autoextract API. See its scrapy-autoextract documentation for more.
Compatibility with new fields added to the API¶
Eventually, some new fields could be added to the Autoextract API.
When you’re creating autoextract-poet
items from Autoextract responses,
the library would ignore unknown fields by default,
until you upgrade the library to a version containing the new field.
But you might want to keep the unknown (new) fields even if you don’t update
the autoextract-poet
library.
If you’re using Scrapy (or itemadapter), you can make these unknown
attributes exposed in the output by registering
AutoExtractAdapter
in itemadapter’s ADAPTER_CLASSES:
from autoextract_poet import AutoExtractAdapter
from itemadapter import ItemAdapter
ItemAdapter.ADAPTER_CLASSES.appendleft(AutoExtractAdapter)
For example, you can put this code to settings.py of your Scrapy project.