Provide users with meaningful names
Models & Datasets naming
This applies to both model and dataset files:
General rules
- Naming:
- File names and model names can reuse the name defined in the dbt project.
use_lower_case_and_dashFor i.e: Usefirst_name, not"First Name".- Entity name in the name of files should be in plural form
- If verb is used in the name, they must follow the forms
- verb (active voice) - entity:
map_users_visitors - entity name - verb (passive voice):
search_results_unnested
- verb (active voice) - entity:
- Labeling: Since the label is what the end-user sees, it should be descriptive and easy to understand
- Avoid too much abbreviation
- Remove prefixes and suffixes (like dim, facts...)
Use Upper Case And No Dash
Staging models
- Staging models used to produce a final model. Multiple actions can be done here.
- These tend to be convenient models used for reporting purpose, not for exploration.
- Format:
stg_<some useful names> - Example:
stg_query_reportsstg_dashboardsstg_map_holistics_landing_visitors_trials
Assisting models
Assisting models should not be exposed to end-user, commonly perform one specific action.
The general name format is verb (present tense)- entity (plural). Some common verbs:
map_: use for identity stitching tasksdedup_: use to deduplicate recordsunnest_: Use to unnest the nested fields, and use the result to do other things- Examples
- map_exchange_rates
- map_visitors_trial
- map_pipedrive_deals_holistics_tenants
- dedup_trial_submission_emails
- unnest_holistics_app_global_search_results
- unnest_trial_submissions_what_you_like
- unnest_trial_submissions_reporting_data_sources
Another format is entity (plural) - verb (passive voice).
- For example:
visitors_trials_mappedtrial_emails_deduped
Cleaned and reporting models
These data models are where we start applying business logic, and as a result, typically have heavier transformations than staging models. Here is a guide of naming models which adapt Kimball's dimensional modeling techniques
dim_ models
- Dim models represent certain objects/entities in our products. These models tend to be short and wide (few rows, many fields).
- Format:
dim_<object name in plural> - Example:
dim_query_reportsdim_dashboards.
fact_ models
- contains information about "transactions" or "interactions".
- Have measurements, is not necessary a raw, atomic event table
- Tend to be a long, narrow table
- Examples:
- fact_jobs
- fact_business_metrics_monthly
Field naming & labeling
A model field has two things: Field name and field label.
Field names should follow stricter rules compared to field labels. This is applicable for anything that is analyst-facing instead of business user-facing.
Field name
- Case: use
snake_case - Name fields consistently and represent values consistently across the model. For example:
- user vs. customer vs. member vs. account
- company vs. account vs. organization
- registration vs. creation
- In denormalized tables, should try to retain the original name of the table. For example,
id,created_at,updated_atfield in tableusersshould still beid,created_at,updated_atinstead ofuser_id,user_created_at,user_updated_at - In a normalized table (where you joined different tables to get info of an entity into another entity's model), ambiguous field names should be made clear: For example, in a derived
query_reportmodel (that create a normalized table):select
query_reports.id as query_report_id
, query_reports.name as query_report_name
, query_reports.user_id
, users.name as user_name -- `name` exists in both tables
from query_reports
left join users on query_reports.user_id = users.id
Dimensions
Numeric Dimensions
- General form
[aggregation name]_object_number - If the aggregated dimension is monhubspot_customer_statusetary:
[aggregation name]_value_amount - Aggregation name is optional
- Ratios:
x_per_y - Percentage:
pct_x_over_y
For example:
user_stats: {
dimensions: {
user_id
, user_name
, query_reports_number -- Number of reports belong to this user
, avg_reports_per_day_number
, pct_models_in_dataset_number
, total_revenue_amount
, total_payment_this_monnth_amount
, total_active_days_number
}
, measures: {}
}
Boolean Dimensions
Dimensions that have boolean (TRUE/FALSE) type
- General form:
is_...,has_...,clicked_button_a,created_multi_ds_report - Verb tense depends on the context.
Measures
General form: [aggregation name]_object
- Average:
avg_ - Sum:
total_(Use Total instead of Sum because it sound more natural) - Count:
count_ - Ratios:
x_per_y - Percentage:
pct_x_over_y
Example:
user_stats: {
dimensions: {
user_id
, user_name
}
, measures: {
count_created_users: @sql count(users.id)
, count_existing_users: @sql count(users.id if is not deleted)
, avg_activity_days: @sql avg(total_active_days_number)
}
}
Field label in Holistics modeling layer
- Case:
Please Use Capital - Field label is what the end-user see when exploring the dataset, so it should be descriptive
- Fields that have the same base name across multiple models (id, created_at, updated_at...) should be labeled explicitly. For example:
- Field
jobs.created_atshould have label Job Created At - Field
users.created_atshould have label User Created At - Field
query_reports.created_atshould have label Report Created At
- Field
