Alchemy modelgen: Database Migrations made Easier (Part 1)

Shrinivas Deshmukh

--

This is part 1 of the alchemy-modelgen series. In this post, we’ll explore the tool alchemy-modelgen and get to know the features and API’s of this tool.

alchemy-modelgen is tool which makes RDBMS (such as Redshift, PostgreSQL, Snowflake, MySQL, etc.) management easy by abstracting typical mundane tasks like writing and storing multiple SQL files, modifying tables and tracking these modifications, etc.

alchemy-modelgen enables database/warehouse construction either using YAML based schema-configuration files or through replication of existing database schema. It also facilitates schema versioning and tracking, at the same time enabling time travel back to earlier versions.

Underneath, alchemy-modelgen uses alembic for table creation, modification, and version management, and sqlacodegen for schema extraction from existing databases and warehouses.

What is Database Migration?

Database migration refers to movement of your data structure from one platform to another.

Let’s look at a typical use case for data migration. In an enterprise, the application was built on a relatively slower databases like MySQL or PostgreSQL. But as the volume of the data grew, the process of analyzing the data became cumbersome. In order to tackle this, the enterprise decided to shift from MySQL to a big data handler database like AWS Redshift for example.

To migrate your data to Redshift, we first need to migrate all the table structure from MySQL to Redshift. Generally, in an enterprise the number of tables are in two digits. Imagine the amount of time and precision this would require, especially when the tables have numerous columns and have constraints like primary key, unique or null constraint.

Let’s take another use case. Let’s say we want to setup a data warehouse using Redshift. This could be done in many ways:

  • Create the tables manually
  • Write big SQL files
  • Use an ORM tool like hibernate (for java), sqlalchemy (for python)

In all of the above approaches, a lot of coding and maintenance is involved. Also, we need to have specialized skills to create and maintain the tables like fluency with SQL, understanding of ORM tools and so forth. And what if we need to change the structure? Let’s say we want to add or delete a column, or change the data type of a column? What happens then? What if we change the structure now, and we want to move back to the old structure? How to maintain different versions of the table structure? We’ll try to answer these question in the following sections.

Table of Contents

Overview

alchemy-modelgen is a tool which simplifies typical database management tasks. This includes identifying tables, altering their schema, and versioning of these updates.

Tables can be constructed using two approaches:

  • Specifying the database model through a YAML file including tables, columns, constraints and relationships. (YAML files are easy to write, do not require programming skills, and can be stored in any storage location (local, cloud buckets, git, etc).
  • Replicating the database model of an existing database or warehouse. This is especially applicable for data migration workflows.

Schema changes are extremely common and typically require manual queries to be run in order to make these changes. Furthermore, recording these changes is needed in order to roll back these changes or return to a previous version of the schema. alchemy-modelgen handles this out of the box, preserving all iterations of the table schema and enabling roll back to any previous state.

Let’s look into the usage of alchemy-modelgen in the following sections.

Note: python version >= 3.8 is required

API

alchemy-modelgen can be installed from pypi using:

pip install alchemy-modelgen

modelgen, a command-line utility, is included in the kit. The command modelgen is used to carry out all of the executions.

Initializing modelgen

Before conducting any database operations, modelgen must be initialized. To begin, type the following command:

modelgen init -d ./modelgen_folder

./modelgen_folder refers to the folder that modelgen will build. The folder may be assigned any name; the only requirement is that it does not already exist.

(./modelgen_folder will be used as an example initialization folder in the following parts of this article. The folder name should be the same as the one used to initialize modelgen.)

The above command results in the following folder structure:

modelgen_folder
├── alembic_migrate
│ └── __init__.py
│ └── env.py
│ └── script.py.mako
│ └── versions
│ └── __init__.py
│ └── README
├── templates
│ └── example.yaml
└── alembic.ini

The tool includes a sample YAML file that shows how to write tables, define schema, and specify constraints, among other things. The file can be found in the modelgen_folder/templates/example.yaml folder.

The initialized directory (modelgen_folder in this case, essentially, cd modelgen_folder) should always be the working directory for all modelgen operations.

Defining schema in YAML file and registering it with modelgen

In order to register this YAML schema with modelgen, we run modelgen createmodel after it is specified in YAML:

modelgen createmodel -s yaml -p <path/to/your/template.yaml> -a

In the above command, the parameters -s , -p and -a are:

  • -s: whether it’s a yaml file or an existing database/warehouse to read schema from, should be database or yaml
  • -p: path of the YAML schema file
  • -a: enables support for database migration. It is disabled by default

The recommended location for the YAML file is modelgen_folder/templates/. However, the file can be stored at any path or location.

Some databases, such as AWS Redshift, may have additional parameters such as diststyle, distkey, and sortkey. Please read this document to learn more about this advanced use.

Pulling schema from existing database or datawarehouse

It is possible to extract the schema of all tables and views from an existing database or warehouse using modelgen. The procedure is as follows:

  1. Set modelgen_folder as current working directory
  2. Pull schema from database/warehouse
cd modelgen_foldermodelgen createmodel -s database --outfile models/<FILENAME>.py -p <SQLALCHEMY_DATABASE_URL> -a

In the above command, the parameters -s, --outfile , -p and -a are:

  • -s: whether it’s a yaml file or an existing database/warehouse to read schema from, should be database or yaml
  • --outfile: The schema from the database/warehouse should be saved in this file. models/<FILENAME>.py should always be the path, where <FILENAME> can be any name you like.
  • -p: The database/warehouse’s url. The sqlalchemy url syntax should be followed for this URL. The syntax of sqlalchemy is as follows:
    database+connector://username:password@host:port/database

Database connections supported by modelgen are:

  • MySQL: mysql+mysqlconnector://username:password@host:port/database
  • Postgresql: postgresql+psycopg2://username:password@host:port/database
  • Redshift: redshift+psycopg2://username:password@host:port/database
  • snowflake: snowflake://username:password@account, more info can be found here

Table creation in database or datawarehouse

It’s time to move the schema to our actual target database/warehouse, either through the YAML file or by pulling from an existing database/warehouse. Under the hood, modelgen uses a migration tool called alembic to accomplish this.

  • First, let’s make a migration revision. Revisions in migration are identical to git commits. Your updates are saved to the local repository using these revisions. Revisions can also be thought of as snapshots of the yaml schema written in a language that modelgen can understand. These revisions are saved in the alembic_migrate/versions/ folder by modelgen.
modelgen migrate revision --autogenerate -m '<COMMIT_MESSAGE>' -p <SQLALCHEMY_DATABASE_URL>

Depending on what the new revision is doing, the <COMMIT MESSAGE> may be contextual. If this is the first migration, the commit message may be initial migration, making our final command:

modelgen migrate revision --autogenerate -m 'initial migration' -p <SQLALCHEMY_DATABASE_URL>

The parameter -p corresponds to the sqlalchemy URL of the target database/warehouse.

  • Now the tables can be created in the target database/warehouse
modelgen migrate upgrade head -p <SQLALCHEMY_DATABASE_URL>
  • The tables should now be reflected in the database/warehouse

Table alteration in database or data warehouse

The addition of a new column, modifying the datatype of a column, increasing or reducing the length of a column, or imposing or eliminating restrictions on a column are all examples of table alterations. The procedure to alter tables is:

  • Make changes in the YAML schema file

Let’s take the YAML template from the earlier section of this article:

The original template:

Let’s change the length of column address from 200 to 350 and add a new column called country:

  • Create migration revision
modelgen migrate revision --autogenerate -m 'initial migration' -p <SQLALCHEMY_DATABASE_URL> 
  • Alter the tables in the database/warehouse
modelgen migrate upgrade head -p <SQLALCHEMY_DATABASE_URL> 

Version maintenance or schema versioning

Modelgen uses alembic to keep track of schema versions under the hood. To upgrade or downgrade to any old/new version, we begin by listing the versions, obtaining the version id, and then upgrading/downgrading using the version id.

  • List all the versions
modelgen migrate history -p <SQLALCHEMY_DATABASE_URL> 
  • Get the version id

The output of the above command would look something like this:

787cf0fee9c1 -> 16225b6adae3 (head), add new column country, change address length from 200 to 350<base> -> 787cf0fee9c1, initial migration 

The id 787cf0fee9c1 refers to the older migration, while the id 16225b6adae3 refers to the most recent migration.

The most recent migration is also the one that is active in the database/warehouse right now.

Let’s suppose we want to go back to the edition with the id 787cf0fee9c1. We can do so by running the following command:

modelgen migrate downgrade 787cf0fee9c1 -p <SQLALCHEMY_DATABASE_URL>

The database/warehouse tables should now be at the same state as 787cf0fee9c1.

If we want to go back to the most recent schema version again with the id 16225b6adae3, we can use the following command:

modelgen migrate ugrade 16225b6adae3 -p <SQLALCHEMY_DATABASE_URL> 

Conclusions

We looked at alchemy-modelgen, its functionality, and how to use it in this post. Click here to read Part 2 of this series.

Support for more databases and warehouses will be included in future releases. Please visit the GitHub repository if you want to learn more or contribute to the project.

--

--

No responses yet

Write a response