Object validation and conversion with Marshmallow in Python (2024)

Marshmallow is a Python library that converts complex data types to and from Python data types. It is a powerful tool for both validating and converting data. In this tutorial, we will be using Marshmallow to validate a simple bookmarks API where users can save their favorite URLs along with a short description of each site.

Prerequisites

To get the most out of the tutorial you will need:

Python version >= 3.10 installed on our machine
A GitHub account.
A CircleCI account.
Basic understanding of SQLite databases
Basic understanding of the Flask framework

Our tutorials are platform-agnostic, but use CircleCI as an example. If you don’t have a CircleCI account, sign up for a free one here.

Cloning the repository and creating a virtual environment

Begin by cloning the repository from this GitHub link.

git clone https://github.com/CIRCLECI-GWP/object-validation-and-conversion-marshmallow.git

Once you have cloned the repository the next step is to create a virtual environment and activate it to install our Python packages. Use these commands:

cd object-validation-and-conversion-marshmallowpython3 -m venv .venvsource .venv/bin/activatepip3 install -r requirements.txt

Note: Use the following commands for Windows

cd object-validation-and-conversion-marshmallowpy -3 -m venv .venv.venv\Scripts\activatepip3 install -r requirements.txt

Why Marshmallow?

Often when working with data, there is a need to convert it from one data structure to another. Marshmallow is a Python library that converts complex data types to native Python data types and vice versa.

The Python interpreter supports some built-in data types including integers, boolean, tuple, list, dictionary, floats, sets, and arrays. These are essential for developers who want to create complex programs that can handle different types of operations.

One advantage to Marshmallow is that it will work with any database technology. It is platform-agnostic, which is always a win for developers.

To extend Marshmallow even further, we will be using these technologies:

Marshmallow-sqlalchemy is an extension for SQLAlchemy, which is an SQL Object Relational Mapper.
Flask-marshmallow is a Flask extension for Marshmallow that makes it easy to use Marshmallow with Flask. It also generates URLs and hyperlinks for Marshmallow objects.

Understanding Marshmallow schemas

Understanding how Marshmallow schemas work is essential for working with it. Schemas serve as the core of Marshmallow by keeping track of the data through the declared schema. The schemas define the structure of the data and also the validation of the data.

An example of a schema for our bookmarks app would be:

class BookMarkSchema(ma.Schema): title = fields.String( metadata={ "required": True, "allow_none": False, "validate": must_not_be_blank } ) url = fields.URL( metadata={ "relative": True, "require_tld": True, "error": "invalid url representation", } ) description = fields.String(metadata={"required": False, "allow_none": True}) created_at = fields.DateTime(metadata={"required": False, "allow_none": True}) updated_at = fields.DateTime(metadata={"required": False, "allow_none": True})

This schema creates validations and also defines data types for the fields in our schema. With schemas out of the way, it is time to serialize and deserialize your data.

Implementing Marshmallow in a Flask application

To build our bookmark API, we will first build a BookMarkModel class. This class will connect to the database engine on the structure of our tables, relationship, and fields. We will also add a BookMarkSchema class to serialize and deserialize data from our model. These classes are available in the cloned repository in the /src/app.py file).

To show how Marshmallow parses data from Python types to serialized objects, we are using SQLAlchemy. The serialized objects can be stored in the database and can later be deserialized from the database to acceptable Python data types.

Start by creating a structure for both the model and schema definition classes.

# Adding SQLAlchemyapp.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = Falseapp.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///' + os.path.join(BASE_DIR, 'db.sqlite3')db = SQLAlchemy(app)# Add Marshmallowma = Marshmallow(app)app.app_context().push()# Create the API model (SQLAlchemy)class BookMarkModel(db.Model): pass# Create schema (marshmallow)class BookMarkSchema(ma.Schema): class Meta: passbookMarkSchema = BookMarkSchema()bookMarksSchema = BookMarkSchema(many=True)

This code snippet first connects SQLAlchemy to our application, using SQLite by default. When a URL is configured, it connects to that SQL database. The snipped then instantiates Marshmallow to serialize and deserialize the data as it is sent and received from our models.

The bookMark = BookMarkSchema() schema is responsible for deserializing one single dataset, (the POST, READ and UPDATE routes) when interacting with a single bookmark. In contrast, bookMarks = BookMarkSchema(many =True) is used to deserialize a list of items in the dataset, for example to get all requested bookmarks.

Serializing and deserializing data in Marshmallow

In the previous code snippet, we created a Marshmallow schema based on our BookMarkModel. In this section, we will use b Marshmallow to serialize data when saving to the database and deserialize data when retrieving from the database.

Serializing Python data

Serialization is the process of converting a Python object into a format that can be stored in a database or transmitted. In Flask we use SQLAlchemy to connect to our database. We need to convert the SQLAlchemy objects to JSON data that can then interact with our API. Marshmallow is a great tool to use for this process. In this section, we will use Marshmallow to return a JSON object once we create a bookmark. We will do this by adding a new bookmark to our SQLite database.

# CREATE a bookmark@app.route("/bookmark/", methods=["POST"])def create_bookmark(): title = request.json["title"] description = request.json["description"] url = request.json["url"] book_mark = BookMarkModel( title=title, description=description, url=url, created_at=datetime.datetime.now(), updated_at=datetime.datetime.now(), ) result = bookMarkSchema.load(json_input) db.session.add(book_mark) db.session.commit() return result, 201

This code snippet creates a new bookmark using the BookMarkModel class. It uses the db.session.add and db.session.commit methods to add and save the bookmark to the database consecutively. To serialize objects, the snippet uses the dump method of the BookMarkSchema class, which returns a formatted JSON object.

To validate that this works, we can add a bookmark to the database with Postman and retrieve it. First run the Flask app using this command:

FLASK_APP=src/app.py flask run

Once the application is running, we can now make a request to our API to create a new bookmark using Postman and the POST route /bookmark.

The request returns a response that is a JSON object. Success! Now that a bookmark has been created and serialized with Marshmallow, you can retrieve it from the database and deserialize it.

Deserializing JSON data back to SQLite

Deserialization is the opposite of serialization. To serialize, we converted data from Python to JSON. To deserialize, we are converting JSON data to SQLAlchemy objects. When deserializing objects from the SQLite database, Marshmallow automatically converts the serialized data to a Python object. Marshmallow uses the load() function for this.

book_mark = BookMarkModel( title=title, description=description, url=url, created_at=datetime.datetime.now(), updated_at=datetime.datetime.now(), ) try: json_input = request.get_json() result = bookMarkSchema.load(json_input) except ValidationError as err: return {"errors": err.messages}, 422

For deserialization, this snippet returns an SQLAlchemy object that has been converted from a JSON response from our API.

Now that some data has been serialized and deserialized, the next step is to write tests. The tests will make sure that the endpoints are returning the correct data. To make completely sure that everything is okay, we will also run these tests on CircleCI.

Testing Serialization

Testing inspires confidence in your applications by verifying your code is working as expected. In this section, we will create a test to make sure that our serialization is working as expected.

# Test if one can add data to the databasedef test_add_bookmark(): my_data = { "title": 'a unique title', "description": 'a bookmark description', "url": 'unique bookmark url', } res = app.test_client().post( "/bookmark/", data=json.dumps(my_data), content_type="application/json", ) assert res.status_code == 201

This test verifies that we can successfully create a new bookmark. It also tests that the response is the 201 status code we defined when we created our method. Now we can further verify success by adding the test to our CircleCI pipeline.

Setting up Git and pushing to CircleCI

To set up CircleCI, initialize a Git repository in the project by running this command:

git init

Then, create a .gitignore file in the root directory. Inside the file add any modules you want to keep from being added to your remote repository. The next step will be to add a commit, and then push your project to GitHub.

Log in to CircleCI and go to Projects, where you should see all the GitHub repositories associated with your GitHub username, or your organization. The specific repository that you want to set up for this tutorial is object-validation-and-conversion-with-marshmallow. On the Projects dashboard, select the option to set up the selected project, then use the option for an existing configuration.

Note: After initiating the build, expect your pipeline to fail. You still need to add the customized .circleci/config.yml configuration file to GitHub for the project to build properly. We’ll do that next.

Setting Up CircleCI

First, create a .circleci directory in your root directory. Add a config.yml file for the CircleCI configuration for every project. On this setup, we will use the CircleCI Python orb. Use this configuration to execute your tests.

version: 2.1orbs: python: circleci/python@2.1.1workflows: sample: jobs: - build-and-testjobs: build-and-test: description: "Setup Flask and run tests" executor: python/default steps: - checkout - python/install-packages: pkg-manager: pip - run: name: Run tests command: pytest -v

Using third-party orbs

CircleCI orbs are reusable packages of reusable yaml configurations that condense multiple lines of code into a single line. To allow the use of third party orbs like python@2.1.1 you may need to:

Enable organization settings if you are the administrator, or
Request permission from your organization’s CircleCI admin.

After setting up the configuration, push the configuration to Github. CircleCI will start building the project.

Voila! Go to the CircleCI dashboard and expand the build details. Verify that the tests ran successfully and were integrated into CircleCI.

Now that you have your CI pipeline set up, you can move on to validating data using Marshmallow.

Object validation using Marshmallow

Marshmallow provides a simple way to validate object data before sending it to the database. Marshmallow schemas use the validate() method in the schema for creating a bookmark. In this step, we will add validations to make sure that we allow only strings, and no other type, for the title of the bookmark.

class BookMarkSchema(ma.Schema): title = fields.String( metadata={ "required": True, "allow_none": False, "validate": must_not_be_blank } ) ...

When the rules have been passed on to the schema, we can use the validate() method to verify the data on the method that creates a new bookmark:

def create_bookmark(): title = request.json["title"] description = request.json["description"] url = request.json["url"] # Validate the data from request before serialization error = bookMarkSchema.validate({"title": title, "description": description, "url": url}) if error: return jsonify(error)

In the code snippet above, we are using the validate() method to check that the returned data matches our described schema validations and in the event of an error, we will return the error to the user.

To verify that this is working, make a POST request to Postman with an integer value in the title. Your API should throw an error.

You will know your validations are working properly when an invalid title sent with the request results in an error.

Adding tests for more endpoints

This tutorial does not cover all the endpoints used for the cloned repository. If you want to continue on your own, you can add tests for endpoints like fetching all bookmarks or fetching a single bookmark. Use this code:

# Test if all bookmarks are returneddef test_get_all_bookmarks_route(): res = app.test_client().get("/bookmarks/") assert res.headers["Content-Type"] == "application/json" assert res.status_code == 200# Test if a single bookmark is returneddef test_get_one_bookmark_route(): res = app.test_client().get("/bookmark/1/") assert res.headers["Content-Type"] == "application/json" assert res.status_code == 200# Test json data format is returneddef test_get_json_data_format_returns(): res = app.test_client().get("/bookmarks/") assert res.status_code == 200 assert res.headers["Content-Type"] == "application/json"

These tests verify that we can retrieve our created bookmarks, whether it is all of them or just one. The tests also verify that the data received is a JSON object, consistent with the serialization process of Marshmallow.

Before we can call this a party, we will need to save and commit our tests and push them to GitHub. A successful pipeline run signifies that everything went well.

Conclusion

In this article we have explored the power of using Marshmallow to deserialize and serialize data and also carry out validation. Through the article we have gone through the processes of creating models, creating schemas, and connecting them. We also learned how to use validations to allow only specific types of responses.

I hope this tutorial was helpful, and that you understand more about how serialization and deserialization work using Marshmallow. Get the rest of your team involved by adding tests for more endpoints, and applying what you have learned to your own projects.

Waweru Mwaura is a software engineer and a life-long learner who specializes in quality engineering. He is an author at Packt and enjoys reading about engineering, finance, and technology. You can read more about him on his web profile.

FAQs

What is the marshmallow method in Python? ›

marshmallow is an ORM/ODM/framework-agnostic library for converting complex datatypes, such as objects, to and from native Python datatypes. In short, marshmallow schemas can be used to: Validate input data. Deserialize input data to app-level objects.

Learn More ›

What is the difference between serialization and deserialization in marshmallow? ›

Deserialization is the opposite of serialization. To serialize, we converted data from Python to JSON. To deserialize, we are converting JSON data to SQLAlchemy objects.

Read On ›

Why use Python marshmallow? ›

Consistency meets flexibility.

Marshmallow makes it easy to modify a schema's output at application runtime. A single Schema can produce multiple output formats while keeping the individual field outputs consistent. As an example, you might have a JSON endpoint for retrieving all information about a video game's state.

What is the difference between require and allow_none in marshmallow? ›

required – Raise a ValidationError if the field value is not supplied during deserialization. allow_none – Set this to True if None should be considered a valid value during validation/deserialization. If missing=None and allow_none is unset, will default to True . Otherwise, the default is False .

Get More Info ›

What is marshmallow Python vs Pydantic? ›

While Pydantic returns a Python object right away, marshmallow returns a cleaned, validated dict . With marshmallow, the conversion from that cleaned dict to an instance of complex Python class (e.g. one of your custom-made classes) is an optional step.

Get More Info ›

What is marshmallow SQLAlchemy? ›

Marshmallow “converts” (deserializes) dicts to SQLAlchemy models or serializes SQLAlchemy models to dicts. SQLAlchemy is an ORM. It maps database schema (tables) and data to Python objects. The two packages complement each other.

Learn More ›

Why use marshmallows? ›

Marshmallow leaf and root are used for pain and swelling (inflammation) of the mucous membranes that line the respiratory tract. They are also used for dry cough, inflammation of the lining of the stomach, diarrhea, stomach ulcers, constipation, urinary tract inflammation, and stones in the urinary tract.

Tell Me More ›

What happens when you serialize and deserialize an object? ›

Serialization is the process of converting the state of an object into a form that can be persisted or transported. The complement of serialization is deserialization, which converts a stream into an object. Together, these processes allow data to be stored and transferred.

Discover More Details ›

Which method is used for object deserialization? ›

For serializing the object, we call the writeObject() method of ObjectOutputStream class, and for deserialization we call the readObject() method of ObjectInputStream class.

View Details ›

What does Flask-Marshmallow do? ›

Flask-Marshmallow is a thin integration layer for Flask (a Python web framework) and marshmallow (an object serialization/deserialization library) that adds additional features to marshmallow, including URL and Hyperlinks fields for HATEOAS-ready APIs. It also (optionally) integrates with Flask-SQLAlchemy.

Read The Full Story ›

Why is Python so popular in AI? ›

Python is the major code language for AI and ML. It surpasses Java in popularity and has many advantages, such as a great library ecosystem, Good visualization options, A low entry barrier, Community support, Flexibility, Readability, and Platform independence.

Discover More ›

Why Python is best for big data? ›

Python is well-suited for large datasets due to its powerful data processing capabilities and extensive library ecosystem. Libraries like Pandas and NumPy enable efficient data manipulation, while Dask allows for parallel computing to manage larger-than-memory datasets.

Learn More Now ›

What does flask marshmallow do? ›

Get More Info Here ›

What is the Hello World method in Python? ›

print('Hello, world!') Hello, world! In this program, we have used the built-in print() function to print the string Hello, world!

Learn More Now ›

How to do data hiding in Python? ›

In python, if we want to hide any variable, then we have to use the double underscore() before the variable name. This makes the class members inaccessible and private to other classes.