Content migration - How the ETL of the migrate module works

03/12/2020
Content migration - How the ETL of the migrate module works

In the previous posts we talked about how to carry out an analysis and a plan for a content migration and how to code different migrations. In this post we will see how to extend functionalities to these migrations by creating and extending Migrate API plugins.

ETL processes

As discussed in content migration series 2, the migrate API is based on extract, transform, and load processes. Therefore, in the source phase, a row data set is retrieved, it is sent to the process phase where it is transformed as necessary and this is launched to the destination phase where it is loaded or stored.

Plugins

Migration plugins specify the phases of the ETL at the individual level, such as the migration of a content type or a taxonomy.

During each of these phases, the plugins correspond to the sections that are added to the yml configuration files. The plugins for these are the following:

  • Source plugin is in charge of extracting the source data.
  • Process plugins are the ones that transform the data.
  • Destination plugin in charge of saving the data in drupal.

Source plugin

For the implementation of a source plugin we have to take into account the following:

Content migration - Series 03 - Source plugin example
Example of a source plugin for the extraction of an article table from mysql database.

The image above mentions a source plugin class which extends from the abstract SqlBase class provided by the Migrate module of the drupal core. This class makes a query to a source database on the "articles" table. This class must implement the following methods:

  • query (): Returns the query on the source database.
  • fields (): Returns the available fields from the table.
  • getIds (): It is where the source fields that uniquely identify the source row are defined, that is to say, it matches the primary key (s) (PK) of the source table.

The example also makes use of the prepareRow method, where we will have the data of the extracted row. This method is usually implemented for two cases:

  • Modifying data or fields according to our needs.
  • Skipping a row by returning FALSE.

The configuration yml for the “articles” migration example would be as follows:

Content migration - Series 03 - Articles config yml

From the yml you can also pass parameters to the plugin, such as the type of article (chairs, tables, cabinets, etc.) and collect it in the method we need through the $configuration attribute defined in the abstract PluginBase class. This attribute is an array that contains information about the plugin configuration. In this case, if we add a key in the yml, for example article_type: chairs, we can collect the value within the plugin making use of $this->configuration['article_type'] and we can use it for example to add a “where” clause in the query and filter by article type. Furthermore, it should be mentioned that this attribute also adds the constants that have been defined in the migration, retrieving it through $this->configuration['constants']['my_constant'] and thus achieving more versatility when performing extensions.

Process plugin

For the implementation of a process plugin we have to keep in mind the following:

Content migration - Series 03 - Process plugin example
Example of a process plugin to change the format of a date that is defined in timestamp format.

The image above mentions a process plugin class which extends from the ProcessPluginBase class provided by the Migrate module of the drupal core. This class changes the format of a timestamp through which we pass a parameter from the configuration yml. This class implements the transform method which is where we are going to modify the value. Within this method you can also use exceptions with MigrateException to throw errors.

The configuration yml for the example would be as seen below:

Content migration - Series 03 - Process config yml
Configuration yml that calls the custom plugin that we have created.

Destination plugin

These plugins are closely related to the site you are migrating to, in this case drupal 8/9. We will rarely have to create one, as drupal provides most of the target plugins. Similarly, many contributed modules also include these plugins for their own configuration and entity types.

For the implementation of a destination plugin we have to take into account the following:

This class must implement the following methods:

  • fields(): Returns the array of targeted fields.
  • getIds(): Gets the target IDs.
  • import(): Import the row.
Content migration - Series 03 - Destination plugin example
Example of a destination plugin.

Conclusions

We conclude that the most important thing in this post addresses the following:

  1. Source plugins.
  2. Process plugins.

We do not highlight the destination plugins since they usually come well defined in the drupal core or in contributed modules for any type of entity that needs to be created.

In the next post we will see how to manage migrations with migrate tools.