Provide users with meaningful names

Models & Datasets naming

This applies to both model and dataset files:

General rules

Naming:
- File names and model names can reuse the name defined in the dbt project.
- use_lower_case_and_dash For i.e: Use first_name, not "First Name".
- Entity name in the name of files should be in plural form
- If verb is used in the name, they must follow the forms
  - verb (active voice) - entity: map_users_visitors
  - entity name - verb (passive voice): search_results_unnested
Labeling: Since the label is what the end-user sees, it should be descriptive and easy to understand
- Avoid too much abbreviation
- Remove prefixes and suffixes (like dim, facts...)
- Use Upper Case And No Dash

Staging models

Staging models used to produce a final model. Multiple actions can be done here.
These tend to be convenient models used for reporting purpose, not for exploration.
Format: stg_<some useful names>
Example:
- stg_query_reports
- stg_dashboards
- stg_map_holistics_landing_visitors_trials

Assisting models

Assisting models should not be exposed to end-user, commonly perform one specific action.

The general name format is verb (present tense)- entity (plural). Some common verbs:

map_: use for identity stitching tasks
dedup_: use to deduplicate records
unnest_: Use to unnest the nested fields, and use the result to do other things
Examples
- map_exchange_rates
- map_visitors_trial
- map_pipedrive_deals_holistics_tenants
- dedup_trial_submission_emails
- unnest_holistics_app_global_search_results
- unnest_trial_submissions_what_you_like
- unnest_trial_submissions_reporting_data_sources

Another format is entity (plural) - verb (passive voice).

For example:
- visitors_trials_mapped
- trial_emails_deduped

Cleaned and reporting models

These data models are where we start applying business logic, and as a result, typically have heavier transformations than staging models. Here is a guide of naming models which adapt Kimball's dimensional modeling techniques

`dim_` models

Dim models represent certain objects/entities in our products. These models tend to be short and wide (few rows, many fields).
Format: dim_<object name in plural>
Example:
- dim_query_reports
- dim_dashboards.

`fact_` models

contains information about "transactions" or "interactions".
Have measurements, is not necessary a raw, atomic event table
Tend to be a long, narrow table
Examples:
- fact_jobs
- fact_business_metrics_monthly

Field naming & labeling

A model field has two things: Field name and field label.

Field names should follow stricter rules compared to field labels. This is applicable for anything that is analyst-facing instead of business user-facing.

Field name

Case: use snake_case
Name fields consistently and represent values consistently across the model. For example:
- user vs. customer vs. member vs. account
- company vs. account vs. organization
- registration vs. creation
In denormalized tables, should try to retain the original name of the table. For example, id, created_at, updated_at field in table users should still be id, created_at, updated_at instead of user_id, user_created_at, user_updated_at

In a normalized table (where you joined different tables to get info of an entity into another entity's model), ambiguous field names should be made clear: For example, in a derived query_report model (that create a normalized table):

select
    query_reports.id as query_report_id
    , query_reports.name as query_report_name
    , query_reports.user_id
    , users.name as user_name -- `name` exists in both tables

from query_reports 
left join users on query_reports.user_id = users.id

Dimensions

Numeric Dimensions

General form [aggregation name]_object_number
If the aggregated dimension is monhubspot_customer_statusetary: [aggregation name]_value_amount
Aggregation name is optional
Ratios: x_per_y
Percentage: pct_x_over_y

For example:

user_stats: {
    dimensions: {
        user_id
        , user_name
        , query_reports_number -- Number of reports belong to this user
        , avg_reports_per_day_number
        , pct_models_in_dataset_number
        , total_revenue_amount
        , total_payment_this_monnth_amount
        , total_active_days_number
    }
    , measures: {}
}

Boolean Dimensions

Dimensions that have boolean (TRUE/FALSE) type

General form: is_..., has_..., clicked_button_a, created_multi_ds_report
Verb tense depends on the context.

Measures

General form: [aggregation name]_object

Average: avg_
Sum: total_ (Use Total instead of Sum because it sound more natural)
Count: count_
Ratios: x_per_y
Percentage: pct_x_over_y

Example:

user_stats: {
    dimensions: {
        user_id
        , user_name
    }
    , measures: {
            count_created_users: @sql count(users.id)
            , count_existing_users: @sql count(users.id if is not deleted)
            , avg_activity_days: @sql avg(total_active_days_number)
    }
}

Field label in Holistics modeling layer

Case: Please Use Capital
Field label is what the end-user see when exploring the dataset, so it should be descriptive
Fields that have the same base name across multiple models (id, created_at, updated_at...) should be labeled explicitly. For example:
- Field jobs.created_at should have label Job Created At
- Field users.created_at should have label User Created At
- Field query_reports.created_at should have label Report Created At

Field label in Holistics modeling layer

Models & Datasets naming​

General rules​

Staging models​

Assisting models​

Cleaned and reporting models​

dim_ models​

fact_ models​

Field naming & labeling​

Field name​

Dimensions​

Measures​

Field label in Holistics modeling layer​

Models & Datasets naming

General rules

Staging models

Assisting models

Cleaned and reporting models

`dim_` models

`fact_` models

Field naming & labeling

Field name

Dimensions

Measures

Field label in Holistics modeling layer