cancel
Showing results for 
Search instead for 
Did you mean: 
suxinji
Employee Alumni
Employee Alumni

Overview

This article will give instructions for deploying Incorta Profiler.

Please read the Introduction to Incorta Profiler before deploying it. 

Deployment Overview

To use the Incorta Profiler, you have to install these four components:

  • IncortaDataPrep Python Package (Required)
  • Sample Datasets(Optional)
  • Schema(Required)
  • Dashboard(Required)

Sample Data

The sample data and data folder is an optional step since you will need to replace the data source in the materialized views with your own schema tables.  However, we recommend you deploy the sample data to verify your deployment and understand how it works by trying it out with the sample data.

Two data sets are used as samples

  • Titanic Survivors
  • House Price Prediction 

Installation

Step 1: Install the DataPrepAPI Wheel File

Here is how to install the IncortaDataPrep wheel file in the on-premises environment:

First, copy the file to your Incorta on-premise server.

scp -i remove.key ~/Download/IncortaDataPrep-0.0.1-py3-none-any.whl incorta@00.000.000.000:/home/incorta

Then, run Python pip to install the wheel file on the python env used by the Incorta instance.

/usr/bin/python3 -m pip install --user IncortaDataPrep-0.0.1-py3-none-any.whl

Step 2: Upload the Data_Profiler_Datasets data files

The Titanic Survivors and House Price data sets are packaged as it is used as the sample data.

This step is optional. You can instead use your own datasets. 

e.png

 

Please note that the data files are stored under the folder Data_Profiler_Datasets.

Step 3: Import Schema

In the Schema, we have five MVs: 

  • table_info
  • summary_table
  • freq_item
  • correlation_table
  • histogram

All these MVs call the DataPrepAPI Python package functions so installing the Python package is required for validating and saving the MVs.

 

ee.png 
 

Note:

If you don’t deploy the sample datasets Data_Profiler_Datasets as mentioned in the prior step, You need to open the MVs and change the full load and the incremental load logic by pointing to your data source.

For example:

In freq_item, both the full load and incremental load, in the screenshot highlight part change to your [SCHEMANAME].[TABLENAME]

Edit Full load logic: 

eee.png

Spoiler
Please switch off the incremental logic, run the schema load job before editing the incremental logic.

Edit Incremental logic:

If you have multiple data sources, you can add them in incremental logic using unionAll

eeee.png

Note:

Make the schema table name change with the format [SCHEMANAME].[TABLENAME] for both the full load and incremental load for all five MVs: table_info, summary_table, correlation_table, histogram, and freq_item

Step 4: Import Dashboard 

k.png

Step 5: Add Session Variables (Optional)

The Session Variable is used to display the default table in the list of Table Names.

q.png

qq.png

query(
Data_Profiler.table_info.table_name,
rowNumber() = 1
)

Related Materials

Best Practices Index
Best Practices

Just here to browse knowledge? This might help!

Contributors
Version history
Last update:
‎09-23-2022 03:34 PM
Updated by: