Self-service reporting on the cheap
Only you can prevent forest fires
In the quest to be more data driven and to harness the power of data science, more and more is being asked from our data reporting technologists. These are the DBAs, database developers, etc whose charge it is to stand up our data warehouses, data marts, and pull ad hoc data from our transactional systems.
Reporting often ends up being a bottleneck in analytics pipelines but with the perennial problems of time, capacity, and budget constraints, it is difficult to take the time today to do the work that will increase efficiency tomorrow.
We are continually fighting fires because we didn't have time 2 months ago to clear the brush that would have prevented the flare ups. One of these brush clearing operations is self-service reporting: tools for our analytics end users to pull high quality data on their own.
Self service reporting acts as a force multiplier for data teams by freeing up technical capacity for truly complex reporting needs and building data tools. But it also makes our analytics more efficient because they can have the truly useful "what if" experimental conversations with data without the hassle of waiting for reports to be adjusted.
Here are 4 steps illustrating a more pragmatic approach to self-service that won't take too much of your immediate technical capacity to implement but will give you a good return on your investment right away.
1) Preach to the converted
This is actually probably the key to success. While there is always a high demand for data across an organization, trying to rollout something to everyone at once will be too much.
It will behoove you to make your life easier by starting with the team or set of individuals on the business/analytics side who are the most data saavy and/or more open to change.
Once you identify these folks, involve them in the process in a deep way so that they feel ownership of the end product. They are at once your requirements analysts, beta testers, and evangelists.
2) Don't boil the ocean
Look at the various metrics and reports that have previously been created and identify two or three important or commonly requested metrics.
The idea is not to be innovative but rather to choose something familiar so that people can focus brain space on appreciating the process rather than understanding the data. We're selling an approach, not an analysis.
The smoother the learning curve of the new self-service experiment, the more likely users will actually use it and familiarity is a big piece of that. Familiar data and metrics along with immediate usefulness, even if imperfect, will increase utilization as well as naturally help make the case for more capacity/budget to spend on developing better self-service tools.
3) Create immediate value
Create a standard report in whatever form you are already using (eg, excel). This standard report will list these metrics identified above along with any relevant demographics.
Again the goal here is usefulness, not perfection. And since most requests are from people who are taking an educated guess as to what they need, thier requirements are probably not set in stone so they can generally make do with an already existing standard report.
Just create a script than runs on a scheduled basis and dumps the data somewhere easily accessible. One thing to be careful of is to be clear about business rule definitions.
For example, in my former school district we had a specific definition for truancy. Not all absence types counted against that. So when we created our absence count metrics we defined two: overall absences and absences that count against truancy and made sure to include both in the report.
At this point you've already gained by reducing the load on reporting capacity. But wait, there's more!
4) Force multiply
Regularly updated standard reports are a good start but they are only useful for the latest data. It's hard to do any really interesting analytics without looking at multiple cuts of data.
You could go old school and try and correlate data between all these excel spreadsheets piling up each week but that is tedious and you can only work with aggregations or very small individual samples across files.
But just because we've pulled data out into excel doesn't mean it has to stay there. By loading the reports back into a database, we can leverage the work we've already done without the expense and effort of a more formal data warehouse.
So the final step is to set up a weekly/daily job to load a copy of each standard report into a set of database tables having the same row/column structure as the excel sheet but with the addition of a "report date" column.
Slap some kind of visualization tool in front of that or even just some simple SQL scripts and you've got a straightforward but powerful analytics tool. Not only that but you can also now easily make "last year to date" vs "current year to date" comparisons in your regular reporting.
Once our analytics and business users become comfortable looking at a standard report it is a huge leap in effectiveness to easily be able to analyze multiple pulls of these reports in a systematic way.
From fighting fires to forest management
You can think of these "quickie data marts" as pilots for a more formal, centralized data warehouse. Not only will they immediately provide richer data conversations for analytics but they'll also help improve your organizations overall data sophistication so that you can have a much richer conversation about what you really need data wise.
You'll be surprised at how much use you can get from just this simple architecture. In many cases, it may be all you need.