ImportApi Module
Purpose
This module is used for importing content into the platform. It could also be called MigrateApi but that gets confusing with the Laravel definition of a migration.
That said, this module is mainly used for migrating content from other sources into district core.
Overview of how it works
A combination of plugins, held together with an ImportPlan model defines how to:
- Get the source data (from a remote source)
- Parse the data (per field)
- Save the data locally (to a model)
We then have import Jobs which apply all of the above to each row in the source data.
When an item is imported, a ImportItem model is created, this has a record of both the source ID and the destination model. This is used to see if an item has been imported, when that happened and gives us the ability to rollback any imported item.
Plugins in detail
All plugin management is done via ImportApiPluginService. There is a helper method on an ImportPlan model that returns this service $importPlan->getPluginService() which is generally the best way to instance it.
Each plugin may expose settings, this is done via $defaultSettings and $schema properties, the UI will automatically build forms using these where required.
An overview of each plugin type is described below, but it is advised you study an existing plugin to get the best picture of how it is structured.
ImportSources
@see ImportSourceInterface and BaseImportSource
An import source will get the source data, eg from a remote API or a CSV. It should extend BaseImportSource must provide a method getItems that will return a collection of source items.
IMPORTANT: Each source item should be an array of values as we assume any field can contain multiple values. It is always a good idea to run each row through normaliseRow() as it will ensure each field is an array.
An import source should define both a source ID field and a labelField. The source ID is used to track source items and the label field is used in the UI to identify items.
Source plugins will display on the edit or create pages in the UI for ImportPlan CRUD.
ImportFieldParsers
@see ImportFieldParserInterface and BaseImportFieldParser
An import field parser acts on a single field of a single source row. It is responsible for transforming the source value into destination value. You can chain multiple field parsers together and they will pass the source data down the chain, applying settings as they go. This is all done via the UI on the mappings page
As mentioned above, each value is assumed to be an array, so we have 2 key methods that a field parser should implement
handle()This expects a single source field as an array of values to act onhandleValue()By default, this is called on each array item inhandle()and is generally where the parser will do its thing.
Example of handling values:
- Source value (title) =
$sourceValue = [My item to be imported] - Passed to
handle($sourceValue)where it doeshandleValue('My item to be imported') - Passes the output to the next field parser
Example of a full flow for a single field: Source value -> string replace -> trim -> purify > destination ready for save
Saving relationships with ImportFieldParsers
The handle and handleValue methods both accept an optional second argument $destinationModel. This is only ever provided when an import is happening. If this exists, you can act on it in your parser and create related content. A good example of this is the Media plugin that will save an image as a media model.
Example
public function handleValue(mixed $value, ?Model $destinationModel = null)
{
if ($destinationModel) {
$destinationModel->addMediaFromUrl($value);
}
return $value;
}
ImportDestinations
@see ImportDestinationInterface and BaseImportDestination
The import destination defines how to save all the parsed source values. It defines the destination fields via getDefinition() which can then have source fields mapped to them and field parsers assigned.
Defining the definition
Example structure of getDefinition
public function getDefinition(): array
{
return [
'title' => ['label' => 'Title'],
'summary' => ['label' => 'Summary'],
'body' => ['label' => 'Body'],
'slug' => ['label' => 'Url path'],
'primary_image' => ['label' => 'Image', 'type' => 'media'],
'contact_details' => ['Contact details', 'multiple' => true]
];
Note in the above:
primary_imagehas set atype, if a type is not set (or set totext) it will be saved as a model attribute to the destination model. Anything else, we assume that the field parser will handle saving.contact_detalilshasmultipleset totrue, this indicates that all the final value will be an array. If not set (or set tofalse) only the first value of the array will be used.
Defining the source model
Most of the time, your destination plugin can just set $destinationModel to be the class name of the destination model and saving will be handled automatically. For finer grain control, you can override updateOrCreate() method.
Example
protected string $destinationModel = Content::class
ImportMappingPresets
Is NOT a php class, just a bunch of JSON files in the Plugins/ImportMappingPresets directory
This is different to all the other plugins, and is pretty basic, it is just a collection of .json files that represent the content of $importPlan->field_mappings. It is useful for quickly populating mappings based on similar migrations.
Eg We have a bang the table migration using crawler as a source, we can configure all the mappings then save to a json file and next time with do a bang the table migration, we can load in the settings used previously.
Just a bit of a time saver.
Queues and Jobs
Imports and Rollbacks are always executed via jobs. You can import a single row at a time but this just does a dispatchNow of the job. When importing all items you dispatch ImportAllItemsJobs or for a rollback us RollbackAllItems job. This will change the status of the ImportPlan to the appropriate ImportPlanStatus enum and trigger a complete notification.
Both single row imports and imports of all items can be triggered via the Items tab on an ImportItem.
If an import is running and you want it to stop, you can $importItem->update(['import_status' => ImportPlanStatus::STOP]) which will, skip any other import jobs that have not yet stared.
Security
When creating import plans you should always be mindful of security, particularly XSS and JS based attack vectors.
The ImportApi lets you take any source and put it in any destination, so if your source contains malicious content, nothing automated will stop you from importing it, The responsibility is on you to ensure you use the appropriate parsers to sanitize the content prior to saving.
Ways to sanitize content
- If your destination wants
plain text(eg title field) then use theStip Tagsfield parser. - If your destination wants
html(eg body field) then use theHTML Purifierfield parser
Always audit all imported content
All content that is imported should be reviewed for import errors and potential security issues.
Security @TODO
- Add validators and automated purifiers based on destination field type
- Add a "post import DB" scanner that checks all content for anything malicious