One of the most valuable dimensions to report on in any business intelligence context is date/time. In this example, I have the data feed that I found for the City of Toronto’s Parking Ticket activity and I would like to convert some of the raw data to a real date/time column.
I have loaded the entire data feed into a SQL Server database and imported it into PowerPivot (See my previous blog post on how to import text files). There are about 14 million rows in my model.
My PowerPivot model looks like this:
The raw data has a “date_of_infraction” field which is a text field containing the date of infraction and a “time_of_infraction” field which is a text field containing the time of the infraction. What we really want is a combined date/time field with a properly structured date time. If we have this column, it will be extremely valuable in the future for grouping, sorting, filtering, etc.
PowerPivot provides the ability to create a calculated column, so let’s do some basic text parsing to create the appropriate date and date/time columns.
CALCULATING the Infraction Date
We can create an Infraction Date column based on our original “date_of_infraction” field. Right click and select “Insert Column” to create a calculated column.
You can rename the column by clicking on the header of the column and replacing the “CalculatedColumn1” with a more user friendly name. I used “Infraction Date”.
Next, we need to change the field type to a date/time field. This is done by clicking on the column and picking from the Data Type in the ribbon. Strangely, Date isn’t available by default – you have to change it to Text first and then to Date.
You can also now change the format of the date to a simple date format, since in this column we won’t have any time values.
Now we will use a DAX formula to calculate the date from the original “date_of_infraction” field. DAX is the language for creating calculated values in Power Pivot and is very similar to Excel formula syntax.
The formula to extract this data looks like this:
=Date(mid([date_of_infraction], 1, 4), mid([date_of_infraction], 5, 2), mid([date_of_infraction], 7, 2))
The Date() function takes 3 parameters – a year, a month, and a day. The Mid() function is used to extract a sub-string from a field. In this case, the year comes from the first 4 characters, the month comes from 2 characters starting at the 5th position and the date comes from the 2 characters starting at the 7th position.
Once the formula is provided, PowerPivot calculates every row in the column and caches the result.
For my model containing about 14 million rows, this took only about 10-15 seconds!
CALCULATING the Infraction Date/Time
Let’s now create a fully qualified date/time based on a combination of our newly created Infraction Date column and parsing the “time_of_infraction”. Again, we’ll create a new column and set it to Date but with a long format containing both date and time.
Now for the formula. Let’s first calculate a time value based on our text column. Here is my first attempt:
=Time(mid([time_of_infraction], 1, 2), mid([time_of_infraction], 3, 2), 0)
However, this generates an error for two reasons: 1) we have some legitimate times that are 3 characters not 4 (e.g. 437 = 4:37am) and 2) we have some bad data where the time is only 1 character.
So our revised formula will check the length of the field to properly convert the 3 and 4 character times and to create a time value of 0 for any value less than three characters.
The formula looks like this:
if(len([time_of_infraction])<3, time(0,0,0), if(LEN([time_of_infraction]) = 4, time(mid([time_of_infraction], 1, 2), mid([time_of_infraction], 3, 2),0), time(mid([time_of_infraction], 1,1), mid([time_of_infraction],2, 2), 0)))
We have used essentially the same parsing ideas but with a bit of checking on the length of the field value.
So based on this formula, we now have some calculated time values:
However, the date isn’t right – its set by default to 12/30/1899. We can add the date we calculated from Infraction Date to our calculated column for Date Time and this will add the date and times together to create a properly formatted date:
=[Infraction Date]+if(len([time_of_infraction])<3, time(0,0,0), if(LEN([time_of_infraction]) = 4, time(mid([time_of_infraction], 1, 2), mid([time_of_infraction], 3, 2),0), time(mid([time_of_infraction], 1,1), mid([time_of_infraction],2, 2), 0)))
When we do this, PowerPivot returns the appropriate date/time!
Our date/time column is now available to be used in any Excel PowerView reports or Pivot Tables we want to design in the future!