Alchemy modelgen: Database Migrations made Easier (Part 1)
This is part 1 of the alchemy-modelgen
series. In this post, we’ll explore the tool alchemy-modelgen
and get to know the features and API’s of this tool.
alchemy-modelgen
is tool which makes RDBMS (such as Redshift, PostgreSQL, Snowflake, MySQL, etc.) management easy by abstracting typical mundane tasks like writing and storing multiple SQL files, modifying tables and tracking these modifications, etc.
alchemy-modelgen
enables database/warehouse construction either using YAML based schema-configuration files or through replication of existing database schema. It also facilitates schema versioning and tracking, at the same time enabling time travel back to earlier versions.
Underneath, alchemy-modelgen
uses alembic for table creation, modification, and version management, and sqlacodegen for schema extraction from existing databases and warehouses.
What is Database Migration?
Database migration refers to movement of your data structure from one platform to another.
Let’s look at a typical use case for data migration. In an enterprise, the application was built on a relatively slower databases like MySQL or PostgreSQL. But as the volume of the data grew, the process of analyzing the data became cumbersome. In order to tackle this, the enterprise decided to shift from MySQL to a big data handler database like AWS Redshift for example.
To migrate your data to Redshift, we first need to migrate all the table structure from MySQL to Redshift. Generally, in an enterprise the number of tables are in two digits. Imagine the amount of time and precision this would require, especially when the tables have numerous columns and have constraints like primary key, unique or null constraint.
Let’s take another use case. Let’s say we want to setup a data warehouse using Redshift. This could be done in many ways:
- Create the tables manually
- Write big SQL files
- Use an ORM tool like hibernate (for java), sqlalchemy (for python)
In all of the above approaches, a lot of coding and maintenance is involved. Also, we need to have specialized skills to create and maintain the tables like fluency with SQL, understanding of ORM tools and so forth. And what if we need to change the structure? Let’s say we want to add or delete a column, or change the data type of a column? What happens then? What if we change the structure now, and we want to move back to the old structure? How to maintain different versions of the table structure? We’ll try to answer these question in the following sections.
Table of Contents
- Overview
- API
- Initializing modelgen
- Defining schema in YAML file
- Pulling schema from existing database or data warehouse
- Table creation in database or data warehouse
- Table alteration in database or data warehouse
- Version maintenance or schema versioning
- Conclusions
Overview
alchemy-modelgen
is a tool which simplifies typical database management tasks. This includes identifying tables, altering their schema, and versioning of these updates.
Tables can be constructed using two approaches:
- Specifying the database model through a YAML file including tables, columns, constraints and relationships. (YAML files are easy to write, do not require programming skills, and can be stored in any storage location (local, cloud buckets, git, etc).
- Replicating the database model of an existing database or warehouse. This is especially applicable for data migration workflows.
Schema changes are extremely common and typically require manual queries to be run in order to make these changes. Furthermore, recording these changes is needed in order to roll back these changes or return to a previous version of the schema. alchemy-modelgen
handles this out of the box, preserving all iterations of the table schema and enabling roll back to any previous state.
Let’s look into the usage of alchemy-modelgen
in the following sections.
Note: python version >= 3.8 is required
API
alchemy-modelgen
can be installed from pypi
using:
pip install alchemy-modelgen
modelgen
, a command-line utility, is included in the kit. The command modelgen
is used to carry out all of the executions.
Initializing modelgen
Before conducting any database operations, modelgen must be initialized. To begin, type the following command:
modelgen init -d ./modelgen_folder
./modelgen_folder
refers to the folder that modelgen will build. The folder may be assigned any name; the only requirement is that it does not already exist.
(./modelgen_folder
will be used as an example initialization folder in the following parts of this article. The folder name should be the same as the one used to initialize modelgen.)
The above command results in the following folder structure:
modelgen_folder
├── alembic_migrate
│ └── __init__.py
│ └── env.py
│ └── script.py.mako
│ └── versions
│ └── __init__.py
│ └── README
├── templates
│ └── example.yaml
└── alembic.ini
The tool includes a sample YAML file that shows how to write tables, define schema, and specify constraints, among other things. The file can be found in the modelgen_folder/templates/example.yaml
folder.
The initialized directory (modelgen_folder
in this case, essentially, cd modelgen_folder
) should always be the working directory for all modelgen operations.
Defining schema in YAML file and registering it with modelgen
In order to register this YAML schema with modelgen, we run modelgen createmodel
after it is specified in YAML:
modelgen createmodel -s yaml -p <path/to/your/template.yaml> -a
In the above command, the parameters -s
, -p
and -a
are:
-s
: whether it’s a yaml file or an existing database/warehouse to read schema from, should bedatabase
oryaml
-p
: path of the YAML schema file-a
: enables support for database migration. It is disabled by default
The recommended location for the YAML file is modelgen_folder/templates/
. However, the file can be stored at any path or location.
Some databases, such as AWS Redshift, may have additional parameters such as diststyle, distkey, and sortkey. Please read this document to learn more about this advanced use.
Pulling schema from existing database or datawarehouse
It is possible to extract the schema of all tables and views from an existing database or warehouse using modelgen. The procedure is as follows:
- Set
modelgen_folder
as current working directory - Pull schema from database/warehouse
cd modelgen_foldermodelgen createmodel -s database --outfile models/<FILENAME>.py -p <SQLALCHEMY_DATABASE_URL> -a
In the above command, the parameters -s
, --outfile
, -p
and -a
are:
-s
: whether it’s a yaml file or an existing database/warehouse to read schema from, should bedatabase
oryaml
--outfile
: The schema from the database/warehouse should be saved in this file.models/<FILENAME>.py
should always be the path, where<FILENAME>
can be any name you like.-p
: The database/warehouse’s url. The sqlalchemy url syntax should be followed for this URL. The syntax of sqlalchemy is as follows:database+connector://username:password@host:port/database
Database connections supported by modelgen
are:
- MySQL:
mysql+mysqlconnector://username:password@host:port/database
- Postgresql:
postgresql+psycopg2://username:password@host:port/database
- Redshift:
redshift+psycopg2://username:password@host:port/database
- snowflake:
snowflake://username:password@account
, more info can be found here
Table creation in database or datawarehouse
It’s time to move the schema to our actual target database/warehouse, either through the YAML file or by pulling from an existing database/warehouse. Under the hood, modelgen
uses a migration tool called alembic
to accomplish this.
- First, let’s make a migration revision. Revisions in migration are identical to
git
commits. Your updates are saved to the local repository using these revisions. Revisions can also be thought of as snapshots of the yaml schema written in a language thatmodelgen
can understand. These revisions are saved in thealembic_migrate/versions/
folder by modelgen.
modelgen migrate revision --autogenerate -m '<COMMIT_MESSAGE>' -p <SQLALCHEMY_DATABASE_URL>
Depending on what the new revision is doing, the <COMMIT MESSAGE>
may be contextual. If this is the first migration, the commit message may be initial migration
, making our final command:
modelgen migrate revision --autogenerate -m 'initial migration' -p <SQLALCHEMY_DATABASE_URL>
The parameter -p
corresponds to the sqlalchemy URL of the target database/warehouse.
- Now the tables can be created in the target database/warehouse
modelgen migrate upgrade head -p <SQLALCHEMY_DATABASE_URL>
- The tables should now be reflected in the database/warehouse
Table alteration in database or data warehouse
The addition of a new column, modifying the datatype of a column, increasing or reducing the length of a column, or imposing or eliminating restrictions on a column are all examples of table alterations. The procedure to alter tables is:
- Make changes in the YAML schema file
Let’s take the YAML template from the earlier section of this article:
The original template:
Let’s change the length of column address
from 200
to 350
and add a new column called country
:
- Create migration revision
modelgen migrate revision --autogenerate -m 'initial migration' -p <SQLALCHEMY_DATABASE_URL>
- Alter the tables in the database/warehouse
modelgen migrate upgrade head -p <SQLALCHEMY_DATABASE_URL>
Version maintenance or schema versioning
Modelgen uses alembic to keep track of schema versions under the hood. To upgrade or downgrade to any old/new version, we begin by listing the versions, obtaining the version id, and then upgrading/downgrading using the version id.
- List all the versions
modelgen migrate history -p <SQLALCHEMY_DATABASE_URL>
- Get the version id
The output of the above command would look something like this:
787cf0fee9c1 -> 16225b6adae3 (head), add new column country, change address length from 200 to 350<base> -> 787cf0fee9c1, initial migration
The id 787cf0fee9c1
refers to the older migration, while the id 16225b6adae3
refers to the most recent migration.
The most recent migration is also the one that is active in the database/warehouse right now.
Let’s suppose we want to go back to the edition with the id 787cf0fee9c1
. We can do so by running the following command:
modelgen migrate downgrade 787cf0fee9c1 -p <SQLALCHEMY_DATABASE_URL>
The database/warehouse tables should now be at the same state as 787cf0fee9c1
.
If we want to go back to the most recent schema version again with the id 16225b6adae3, we can use the following command:
modelgen migrate ugrade 16225b6adae3 -p <SQLALCHEMY_DATABASE_URL>
Conclusions
We looked at alchemy-modelgen
, its functionality, and how to use it in this post. Click here to read Part 2 of this series.
Support for more databases and warehouses will be included in future releases. Please visit the GitHub repository if you want to learn more or contribute to the project.