Deductive Pipeline API: Reporting and monitoring data quality

Deductive Pipeline API: Reporting and monitoring data quality

Previous: Working with session data | Next: Full API reference


The run() method returns a dictionary of ReportEntry objects. The report entry object contains a number of different objects:

  • - a python datetime object for when the entry was created
  • ReportEntry.field - a string representing the name of the field which was processed
  • ReportEntry.number_records - the number of records effected
  • ReportEntry.category - a string categorizing the type of action applied to the rows
  • ReportEntry.description - a string with more detail on the action taken
  • ReportEntry.df - a pandas DataFrame containing any records that did not pass validation, and any values they were replaced with

The ReportEntry class serializes to a human readable log file, but it is more common for it to be post-processed into a machine readable format and for the DataFrames to be saved to disk for later review.

Custom reporting

By default the FileProcessor class uses the DefaultReportWrite() class that aggregated ReportEntry() classes and returns them in a list at the end of the project.

In order to write your own custom reporting class you need to implement a class with two interfaces: log_history, and get_report.

Here is an example that simply logs all information to stdout:

class MyReportWriter():

    def log_history(self,
        print("{0}, Field {1}: #{2} {3}/{4}".format(date,

    def get_report(self):
        return None

fp = FileProcessor(report_writer = MyReportWriter())

df = pd.read_csv(my_file)

report =,
                config={"rules": [
                     "rule_type": "String",
                     "field": "name"
                     "params": {
                         "fallback_mode": "remove_record",
                         "regex": "[\w\s]*"

Related documentation

  • Deductive Pipeline API on AWS - The Deductive Pipeline API is available through the AWS marketplace (more)
  • Deductive Pipeline Python Client - Deductive Tools includes a client for the Pipeline API (more)
  • Deductive Pipeline API: Sample Data - Sample files to demonstrate usage of the Deductive Pipeline API (more)
  • Deductive Pipeline API: Validating basic data types - Validating incoming datasets for basic string, number, and date type formatting and range checks using the Deductive Data Pipeline API (more)
  • Deductive Pipeline API: Anonymizing data - The Deductive Pipeline API support tokenization, hashing, and encyrption of incoming datasets for anonymisation and pseudonymization (more)
  • Deductive Pipeline API: Referential Integrity - Using the Deductive Pipeline API to validate data against other known good datasets to ensure referential integrity (more)
  • Deductive Pipeline API: Handling invalid data - Invalid data can be quarantined or automatically fixed by the Deductive Data Pipeline API (more)
  • Deductive Pipeline API: Working with session data - The Deductive Pipeline API can check for gaps and overlaps in session data and automatically fix them (more)
  • Deductive Pipeline API: Full API reference - A field by field breakdown of the full functionality of the Deductive Data Pipeline API (more)

Need help? Get in touch...

Sign up below and one of our data consultants will get right back to you

Deductive is a global consulting firm providing data consulting and engineering services to companies that want to build and implement strategies to put data to work. We work with primary data generators, businesses harvesting their own internal data, data-centric service providers, data brokers, agencies, media buyers and media sellers.

145 Marina Boulevard
San Rafael
California - 94901
+1 (415) 843 1774

Registered in Delaware

Thames Tower
Station Road

Registered in England & Wales, number 8170657