Pandera: Protect your data and ML products from low-quality data.

The open-source framework for precision data testing for data scientists and ML engineers.

Copied to clipboard!

$ pip install pandera

Install Pandera and get started.

Quickstart guide

Build confidence in the quality of your data by defining schemas for complex data objects.

Pandera provides a simple, flexible and extensible data-testing framework for validating not only your data, but also the functions that produce them.

Copied to clipboard!

import pandas as pd
import pandera as pa

from pandera.typing import Series, DataFrame

# Define a schema
class Schema(pa.SchemaModel):
	item: Series[str] = pa.Field(isin=["apple", "orange"], coerce=True)
	price: Series[float] = pa.Field(gt=0, coerce=True)

# Validate at runtime
@pa.check_types(lazy=True)
def transform_data(data: DataFrame [Schema]):
	...

transform_data(
	pd.DataFrame.from_records([
		{"item": "applee", "price": 0.5}, 	# invalid item name
		{"item": "orange", "price": -1000}, 	# negative price
	])
)

Copied to clipboard!

import hypothesis
import pandera as pa

from pandera.typing import Series, DataFrame

# Define an input schema
class Schema(pa.SchemaModel):
	item: Series[str] = pa.Field(isin= ["apple", "orange"], coerce=True)
	price: Series[float] = pa.Field(gt=0, coerce=True)

# Define an output schema
class OutputSchema(Schema) :
	item: Series[str] = pa.Field(isin=[ "apple"])

# Implement a function that filters out oranges
@pa.check_types (lazy=True)
def transform_data(data: DataFrame[Schema]) -> DataFrame [OutputSchema]:
	return data.query("item =='orange'") # 🐛 Incorrect implementation

# Test the function
@hypothesis.given(Schema.strategy(size=10))
def test_transform_data(data):
	transform_data(data)

# Run Unit Tests
test_transform_data()

Testing Data Transformation Functions in CI Output

“Pandera is a great data-validation toolkit! It's fast, extensible and easy to use. The community behind it is very helpful and responsive. Pandera is a must for data-intensive applications.”

Ayazhan Zhakhan

Co-Founder, Dropbase.io

A simple, zero-configuration data testing framework for data scientists and ML engineers seeking correctness.

Write complex schemas with speed and ease

Leverage Pandera’s zero-configuration API for defining schemas using modern Pythonic idioms.

Read the docs

Validate critical points of your pipeline

Identify the critical points in your data pipeline, and validate data going in and out of them.

Read the docs

Quickly bootstrap schemas with trusted data

Overcome the initial hurdle of defining a schema by inferring one from clean data, then refine it over time.

Read the docs

Easily create custom validation checks

Access a comprehensive suite of built-in tests, or easily create your own validation rules for your specific use cases.

Read the docs

Synthesize fake data to validate pipelines

Validate the functions that produce your data by automatically generating test cases for them.

Read the docs

“On our team, we’re using Pandera in every project that touches pandas DataFrames. As a programming tool, it lets us automatically check our DataFrames at runtime and in unit tests. As a tool for thought, it forces clarity on the purpose of DataFrames that we use and make in our projects.”

Eric J. Ma

Principal Data Scientist, Moderna Therapeutics

Integrate seamlessly with the Python ecosystem.

Supported data frameworks

Supported integrations

Supported orchestrators

Suggest integration

“Before Pandera, I had trouble with validating the data that I was pulling in from various databases. Pandera has saved me numerous times from the consequences of using poor-quality data. When Pandera data checks determine that something is incorrect, I can react quickly to resolve the situation or send a note out to my internal customers. Thanks a lot for such a great tool!”

John Kang

Director, Cox Automotive

Start today and scale with confidence.

Chat with an engineer

Pandera: Protect your data and ML products from low-quality data.

Install Pandera and get started.

Build confidence in the quality of your data by defining schemas for complex data objects.

“Pandera is a great data-validation toolkit! It's fast, extensible and easy to use. The community behind it is very helpful and responsive. Pandera is a must for data-intensive applications.”

A simple, zero-configuration data testing framework for data scientists and ML engineers seeking correctness.

Write complex schemas with speed and ease

Validate critical points of your pipeline

Quickly bootstrap schemas with trusted data

Easily create custom validation checks

Synthesize fake data to validate pipelines

“On our team, we’re using Pandera in every project that touches pandas DataFrames. As a programming tool, it lets us automatically check our DataFrames at runtime and in unit tests. As a tool for thought, it forces clarity on the purpose of DataFrames that we use and make in our projects.”

Integrate seamlessly with the Python ecosystem.

Supported data frameworks

Supported integrations

Supported orchestrators

Start today and scale with confidence.

Get updates on new features and releases

Solutions

Resources

Company

Build confidence in the quality of your data by defining schemas for complex data objects.

“Pandera is a great data-validation toolkit! It's fast, extensible and easy to use. The community behind it is very helpful and responsive. Pandera is a must for data-intensive applications.”

A simple, zero-configuration data testing framework for data scientists and ML engineers seeking correctness.

Write complex schemas with speed and ease

Validate critical points of your pipeline

Quickly bootstrap schemas with trusted data

Easily create custom validation checks

Synthesize fake data to validate pipelines

“On our team, we’re using Pandera in every project that touches pandas DataFrames. As a programming tool, it lets us automatically check our DataFrames at runtime and in unit tests. As a tool for thought, it forces clarity on the purpose of DataFrames that we use and make in our projects.”

Integrate seamlessly with the Python ecosystem.

Supported data frameworks

Supported integrations

Supported orchestrators

Start today and scale with confidence.

Solutions

Resources

Company

Start today and scale with confidence.