Skip to main content

Import Model

Definition

In Holistics, Import Model is a data model created when you want to import data from other data source into SQL database (EL in ELT).

Under the hood, an Import Model is a data model coupled with an Extract-Load mechanism to help you load data from other data sources (CSV, Google Sheets, etc).

Definition of Import model

Note: Holistics's Import Model is designed for you to quickly get started with your reporting needs, and do not intend to replace full-fledged data integration solutions.

How It Works

When creating an Import Model, this is what happens behind the scene:

  1. Holistics connects to your third-party source via its API and extracts the requested data.
  2. The downloaded data is then inserted into a destination table in your data warehouse.
  3. When loading data into data warehouse, you can create a new table or append to or overwrite an existing table.
  4. That table is then exposed as a data model in the Holistics modeling layer.

Import model mechanism

Supported Data Sources

Import Model currently supports the following data sources:

  • Application: Google Sheets, CSV files
  • SQL database: SQL Server, PostgreSQL, MySQL, Google BigQuery, Amazon Redshift.

Supported Data Sources

Creating Import Model

First, make sure your SQL data warehouse has write permission, so that Holistics can write the data into the DW.

Then, follow these steps:

  1. Go to Data Modeling page, click Create -> Add Import Model, or click the (+) next to the folder that you want to put your Import Model in.
  2. Select a data source type. If there is no available data source of that type, you will be prompted to connect a relevant source (visit Data Sources for more details)
  3. Select the data you want to import.
  4. Change Destination Settings and Sync Configuration to your liking.
  5. Click Create to finish the process.
Note:

We do not support loading multiple flat files into one data model, each file will create a new import model

Advanced Settings

Advance Settings section gives you more granular control of your data import:

Destination Settings

Table Destination

Here you can specify the schema and the table name to write your data to.

  • The Schema Name will default to the Default Schema that you selected when you first connected your data source.
  • The Table Name by default will be prefixed with persisted_ . If a table with the same name already existed in the schema, the new name will be suffixed with a random number to differentiate it. However, you can choose to overwrite the existed table.

For example, your Source Table name is Users and the Default Schema in your Destination is public, then your corresponding table in the destination will be public.persisted_users.

Refresh Schedule

By setting the Refresh Schedule, the persisted table can be automatically updated with new data. The default option is Daily at 7:00.

Import Modes

Currently Holistics support Full, Append and Incremental/Upsert modes. For more details on how these modes work, please refer to our docs on Storage Settings.

Sync Configuration

Destination Column Name

Source Column names could be arbitrary if the data source is not a standardized one (for example CSV files, or Google Sheets). By default, Holistics will normalize the source column names (use all lowercase alpha-numeric characters and underscores). However, users should still pay attention to the different naming conventions supported by databases and make changes accordingly.

Destination Data Type

To best assist users when importing data, Holistics will:

  • Map your data to one of the Generic Data Types first (Whole Number, Decimal, TrueFalse, Date, DataTime, Text)
  • Then select the suitable data type in your destination database.

For example, you want to import data from SQL Server and your destination Data Warehouse is Google BigQuery. The source table in SQL Server has columns in BIGINT, INT, SMALLINT, TINYINT. What will happen in Holistics:

  1. The integer columns in SQL Server are mapped to our Whole Number data type
  2. Next, the Whole Number data type is mapped to INT64 type in BigQuery

Please refer to the Data Type Mappings section for more details.

In most cases, Holistics can interpret the data being loaded in and map your fields to data types supported by the destination database. However, in more complicated cases you can manually map data types by using the Custom type selection:

Other config

Nullable If this is checked, the column is allowed to have NULL. If unchecked, the loading operation will fail if there is a row in the column with no value. This particularly is useful when you want to validate your data logic.

By default, all the columns in Sync Configuration will be Nullable

Delete Column If you want to exclude any columns from being loaded to the Destination, you can remove them here. Currently, this option is not available for no-SQL Data Sources (Spreadsheet, CSV,...)

Data Type Mapping

Source Data Types and corresponding Generic Data Types

SourceWhole NumberDecimalTrueFalseDateDateTimeText

Postgres

smallint., int., serial., smallserial.

double., real., decimal., numeric.

boolean

date

timestamp.*

varchar., char., enum., text., binary., bigint., bigserial.*

BigQuery

integer

numeric., float.

bool

date

datetime, timestamp

string, bytes, int64

MySQL

tinyint., smallint., mediumint., int.

decimal., float., double.*

n/a

date

timestamp, datetime

varchar., char., text., longtext., enum., binary., blob., varbinary., bigint.*

SQLServer

tinyint, int, integer, smallint

decimal., dec., double precision, float., real, numeric.

bit

date

datetime, datetime2, datetimeoffset, smalldatetime

text, nchar., varchar., nvarchar., ntext, xml, uniqueidentifier, char., character.*, bigint

Google, CSV

n/a

n/a

n/a

n/a

n/a

always text

Oracledb

number.*

float.*

n/a

date

timestamp.*

varchar., nvarchar., varchar2., char., nchar., nvarchar2., long., blob., raw., long raw.

Generic Data Types and corresponding data types in Destination

DestinationWhole NumberDecimalTrueFalseDateDateTimeText

Postgres

integer

double precision

boolean

date

timestamp without timezone

text

BigQuery

int64

float64

bool

date

timestamp

string

MySQL

integer

double precision

tinyint(1)

date

datetime

text

Oracledb

number

number

number(1, 0)

date

date

nvarchar2(1000)

SQLServer

int

real

bit

date

datetime2

ntext

Notes:
  • The suggested data type is based on a sample of your data, so in some cases, it could fail if there are unexpected values in your data (for example, in a Google Sheet the first few rows can have numeric values but in a later row a string value can be mixed in.)
  • If the data type cannot be interpreted, it will be mapped to Text type.

Editing Import Model

After creating the import model, you can still adjust its settings (destination and sync configuration).

You tend to edit the import model whenever:

  • You have set a field's Data Type incorrectly and the import job fails
  • You want to change your Destination Settings (Refresh Schedule, Import Mode...)
  • ...

In those cases, click on Edit in Import Settings and the Import Model Editor view will appear.

Editing Import model

Refresh Source Structure of Import Model

When your sources' structure has some changes (columns are added, deleted or renamed...):

  • In the case of Google Spreadsheet imports, we do not automatically update the Source structure for you - you will need to click on the Re-validate button to get the updated structure. Note that this will also revert all of the column names and data types to default.

  • In the cases of Database Table imports (from PostgreSQL, MySQL, SQL Server, BigQuery), we will automatically update the Source structure for you. If there are new columns in your Source, they will appear in the Sync Configuration's field list but are not enabled. To include them in the sync, you will need to toggle them on.


Let us know what you think about this document :)