OLAP Sprint, Day 327: From Prototype to Production

cube_sketch

In November of 2012, I embarked on a Quixotic attempt at building a BI prototype in 7 days. I dubbed the experiment ‘OLAP Sprint’, and blogged about it here.

The plan was to leverage everything great about the PASS Summit (access to experts, time away from the office to focus, a stimulating environment) and build a functional multi-dimensional cube from scratch. I had taken some courses and done some reading, but this was to be my first time putting everything together. By the end of my 7-day sprint in Seattle, I had my scotch-tape-and-paper-clips multidimensional cube built. It was a totally engrossing experience: frustrating, exhilarating, stimulating, and ultimately, totally rewarding.

Aside from the invaluable learning experience, the OLAP Sprint also produced a proof-of-concept that I could share with the business leaders at my office, to show them the light of BI, and set things in motion to upgrade our relational reporting solution to something more scalable and dynamic using SQL Server Analysis Services.

I am happy to report that, almost a year later, we have transformed that prototype into a production-level BI solution, and are primed to release our first set of reports.

What follows are some of my observations, on the stretch of development that has lead me to this point – from Prototype to Production.

Learning Methodology

I underestimated the challenge of going from thinking about the world using two-dimensional OLTP objects to thinking about the world in a multi-dimensional OLAP cube with facts and dimensions. The deep-seated rules of normalization, referential integrity, and constraints, made it awkward and uncomfortable when we first started building out the cube’s dimensions.

In order to get myself in the right frame of mind to deal with that transition, I tried to replicate some of the environmental aspects of the PASS Summit – most importantly the focus and concentration – by blocking most of my days for BI work only, and taking care of operational duties en-masse on other days. I also attended the inaugural PASS Business Analytics conference, which helped to solidify the new themes and concepts that I was learning.

I also alternated between conceptual learning and technical learning. On some days, I’d carve out time to read from “The Data Warehouse Lifecycle Toolkit” (Ralph Kimball), which built my understanding of BI themes and concepts. On other days, I’d focus on executing a certain technical task – for example creating a simple report in SSRS that reported on the cube, from “Professional MS SQL Server 2012 Reporting Services” (Turley, Bruckner, et al). The combination of the two approaches kept my learning balanced.

Expert Help

Before we got started, we had a ‘buy vs. build’ debate internally, to assess the time and risk of doing the build ourselves versus using a consultant to help us. In the end, we engaged a consulting company that specializes in the SQL Server product stack to help us. This was a huge time-saver for us, and also reduced the risk involved, because they were responsible for the major architecture. Our consultant was not only technically very astute, but also a good teacher. The experience was stressful at times, but enjoyable, and we were totally engaged in the development along the way, which means we’re comfortable owning it now.

Development Methodology

When it came time to actually build the SSIS packages, or attribute dimensions, or reports, or whatever, we diagrammed things out on a whiteboard beforehand. It allowed us to make mistakes, and when we sat down to do the work in SSDT, we had a goal to work toward. We also prioritized our deliverables, and in many cases, sought to simplify whatever it was we were working on. We gave ourselves realistic, achievable targets and made sure we hit our marks. If we got stuck, we simplified the deliverable. A good example was user hierarchies: when we first began, we envisioned them as robust and complex, but they were difficult to reproduce. In the end, in order to stay on track, we reduced their complexity quite a bit, in order to make sure that we could deliver something working. This is Agile development methodology 101, and it worked great for us.

I also tried to respect my limits. There were days when I’d want to put in more time, but my brain stopped being able to digest or assimilate the information. Knowing when to push a little harder to solve a stubborn bug and when to throw in the towel and come back fresh in the morning was vital. More often, a fresh mind solved problems faster than a stubborn one.

The ETL Takes a While

I had heard estimates that building the ETL/Data warehouse tiers of a BI project could consume as much as 80% of total time spent on the project. That number always sounded exaggerated to me, but when I counted up the days spent doing development on that tier, it was totally accurate.

In my prototype, I used views based on the OLTP source system to act as my OLAP fact and dimension tables, so I effectively skipped the ETL tier. That obviously wasn’t a feasible model for our Production solution. We spent a lot of time designing and testing packages in SSIS, to make sure that the data coming out of the source system made it to the data warehouse properly.

Like many others, we assumed that our data was clean. Boy were we wrong, especially with our geographic/address data. As we tried to build out our attribute relationships, we would be notified of duplicate keys in our dimension attributes, and it led to quite a period of data investigation and cleaning. The Data Profiling task in SSIS helped with that.

Don’t even get me started on MDX

Having progressed nicely from building the data warehouse and ETL tiers, I ran into MDX like a steel wall. I was at first confident that my knowledge of TSQL would help me make the transition, but was sorely disappointed. In fact, my background in TSQL actually interfered with my learning MDX, because some of the familiar constructs (SELECT, FROM, WHERE) have substantially different applications in MDX.

The biggest difference, aside from syntax nuances, is of course that TSQL deals with the 2-dimensional world of columns and rows, but MDX isn’t limited to two dimensions. Visualizing the multi-dimensional result set that I wanted to produce was difficult, so writing the code to get there was challenging. The life-saver is the query wizard that Microsoft sprinkles throughout SQL Server to help you get what you want (in SSAS, SSMT, SSDT, etc). You canbuild the query visually, and then hit the ‘Design Mode’ button to see the results in MDX.

A life-saver for learning MDX

A life-saver for learning MDX

Oh My God is it Ever Fast

The performance results have been unbelievable. We were giggling like school girls when we ran our first set of reports, and hugely relieved to see just how well the BI solution performed. Queries that had taken minutes to run on OLTP was taking fractions of a second on OLAP. Having been the one to pioneer and advocate for a BI solution, this was immensely satisfying for me. There was a huge investment of time, money, and effort, and to see it all pay off when we ran those reports made everyone breathe a sigh of relief, and smile at the possibilities.

Conclusion

One of the more illuminating quotes I heard from a presenter at the PASS BA conference in Chicago was the following: “BI projects aren’t projects. They’re Programs”. With this in mind, I’m excited about the future potential of this program, fully aware that just because they build is over, it doesn’t mean the work is done. Hopefully, the best is yet to come.

PASS BA Chicago

I’m in Chicago this week, attending the inaugural PASS Business Analytics conference.

It’s been a great experience so far, so I’ll share some of my thoughts and experiences (in no particular order).

-1- Amir Netz (Technical Fellow at Microsoft) is easily the most engaging keynote speaker that I’ve seen. I’ve seen him speak several times, and his energy and enthusiasm are infectious every time. He makes Steve Ballmer – who I saw at a CRM conference in the late 90s – look like Bob Ross composing a mountain landscape. This morning was no different – full of energy and enthusiasm for BI.

During Amir’s Keynote, he shared a great story from his middle-school-aged son, regarding Business Intelligence. My wife is a middle school teacher, and she often comes home with hilarious excerpts of 14-year-old wisdom, so I appreciated the anecdote. Here it is:

Upon hearing that his father would be presenting in his school’s auditorium on the topic of Business Intelligence, Amir’s son did what any sensible kid that age would do – he tried to stop it. When asked why, his son replied with skepticism that such a dry topic would resonate with his classmates (and also drive down his social status). Pressed further, he replied of his schoolmates: “They know nothing of business, and their intelligence is suspect”. The room erupted with laughter as people immediately conjured to mind those to whom the description also applies.

Despite his son’s skepticism, Amir was able to compel the middle-schoolers with a dazzling BI presentation that had the kids wanting more. The takeaway from the anecdote, and from the Keynote in general, was that BI – especially self-service BI – can be fun, and fun can translate to creativity, engagement, and problem-solving.

-2- There is definitely more of a business focus here at the conference than at the PASS Summits that I have been to in the past. There are many sessions dedicated to the strategy, decision making, methodology, etc. involved in BI. These sessions refer to the technology for its strengths and weaknesses, but do not necessarily cover the technical engineering going on under the hood. More focus on the “why” rather than the “how”.

This seems to fit with the challenge that many organizations now face, with the plethora of BI tools available, both within the Microsoft product line (SQL Server, Office, SharePoint, etc), and beyond.

-3- The event is smaller. I read that there are about 900 attendees, and it definitely doesn’t feel like the great horde of geeks (myself included) that descends on Seattle at the PASS Summit each year. I haven’t yet decided if that’s a good thing or not. It just feels different.

-4- The PASS energy is alive and well here. PASS does such a wonderful job – through its board members and volunteers – of propagating a friendly and collaborative community environment. Although there is a newness to this conference that has some attendees moving around and socializing cautiously, you can feel PASS doing its utmost to make sure people engage with each other in the spirit of the PASS axioms: Connect, Share, Learn.

-5- People seem to be better dressed/groomed. Dress shirts are outnumbering t-shirts and wingtips are outnumbering sneakers. I’ve seen blazers, well-ironed slacks, and fewer scraggly, I’ve-been-up-all-night-coding facial hair. (See above, re: point #2 about this being more of a business conference.) I did see one guy wearing a “Save Water, Drink Beer” t-shirt, but he’s the edge case. This has caused me to have to do more ironing that I would have liked, but oh well.

-6- The weather – like in Seattle – is perfectly conducive to staying indoors and learning, writing code, networking, etc. Cool, overcast, rainy weather seems to be a prerequisite that the PASS committee uses when selecting venues. Accordingly, I have repeated my pattern of not seeing any sights, or spending more than 8 total minutes outside (I went to Wallgreens for shaving cream – see point #5 re: clothing and grooming). Would it kill them to have a PASS conference somewhere with some sun? I hear San Francisco is nice.

-7- Access to technical resources is outstanding. Last night, at the Exhibitor Welcome Reception, I walked-up to Lara at the Microsoft BI booth, who introduced me to Chuck, who introduced me to Kay, who spent 30 minutes with me reviewing authentication options for SSRS, SharePoint, and Excel over a hosted platform. Over beers, he diagrammed some options for me in PowerPoint (I know – weird – but it works), and even offered up his email for any follow-up questions. That whole experience would have been very expensive and very time-consuming through the regular channels. There is no SQL Server clinic, but there are several CAT engineers here, as well as many other experts to chat with over breakfast, lunch, or coffee. The people who I have chatted with have been candid about where they’ve had success, and where they’ve had problems, and I find it very helpful. There’s none of the arrogance and hubris that sometimes comes with discussing things in online forums – very refreshing.

-8- Big Data is… what everyone is talking about, but in a weird way. Definitions are vague (the best I’ve heard is “Volume, Velocity, and Variety”), not many people seem to be actually working with it in a meaningful way. There is a coolness factor to it that has everyone’s attention, though, despite the seeming lack of actual experience with it. Some of the conversations that I’ve overheard remind me of listening to my parents talk about “the internet” 10 years ago: the nomenclature is vague or incorrect, concepts are loose, and experience is limited, but they know it’s out there, and they know it’s cool.

I’ll sign off now – there’s a shirt at the bottom of my bag that needs ironing for tomorrow.

shirt_to_iron

Happy Bushwhacking!

This afternoon I read a thought-provoking article by Andy Warren (@SQLAndy), on the place of the PASS BA Conference within the greater PASS organization. In it, Mr. Warren questions whether the conference is a good idea for PASS, whom it intends to serve, and how it will be judged for success. Tough questions, but fair, and thoughtful.

NB: though I have benefited greatly from his contributions to SQLServerCentral.com, and feel as though I ‘know’ him in that sense, I don’t feel comfortable calling him Andy. :)

Among his many good points, Mr. Warren articulates the conundrum that the PASS board faces: the responsibility to grow in order to remain relevant in the support of its members, without alienating them by tacking in the wrong direction. Though I agree with Mr. Warren in his call for more transparency from PASS, their challenge seems to be less about transparency and more about prescience: how to illuminate a path they’ve never been down.

The emerging technologies behind BI/BA are new to us all, and it would seem that PASS as an organization faces the same opportunities and challenges that database professionals do – an industry opportunity/demand for a new technology, and a natural curiosity feeding the urge to grow in that direction, but no lamp to light the way.

Reading the article got me thinking about my own career path. My introduction to database technology came via a CRM software called GoldMine that used Microsoft SQL Server at the back-end (SQL Server 7, at the time). I started as a sales consultant, and then trainer, then in-house tech support, and finally on-site technical implementation specialist. After an extended hiatus to teach and travel in Asia, I cut out the middle-man and dove headlong into SQL Server. I have been working exclusively with SQL Server for the past 7 years, picking up an MCTS certification, and some departmental management responsibilities along the way. Each transition in my career has been in response to a combination of opportunities from outside forces (the job market, namely), and natural curiosity on my part. Each of those transitions has demanded time spent bushwhacking a new path.

My most recent bit of bushwhacking involved diving into SQL Server Analysis Services, which I did in response to that familiar combination of influences: curiosity and opportunity. A major project at work seemed to be screaming for an OLAP solution, and yet I lacked any real experience in that domain. So at the 2012 PASS Summit- I completely immersed myself in it. The experience was a mixed bag similar to other times where I’ve stretched to learn: frustrating, exhilarating, and exhausting, but ultimately: nourishing (read about it here). I’ve realized that those are the stimulants that I thrive of off, and I think many other IT professionals do as well.

After my OLAP Sprint at the PASS Summit last November, I registered for the PASS BA Conference, not because I knew that it was the right move, or that it would deliver exactly what I needed to continue along my path, but because I was curious, and BI/BA seems to have great opportunities and momentum right now.

If PASS is meant to represent us database professionals – as a sort of macrocosm or amalgam of each of us – maybe I shouldn’t be surprised that their growth feels similar to mine: imperfect, for sure, but nourishing, and filled with curiosity and opportunity. Happy Bushwhacking!

Database Documentation

A Chore Complete (Almost).
I have just finished documenting most of my database and I have mixed feelings about it. There’s a feeling of satisfaction and relief, but also a strong sense of time wasted. It was a chore.

The Grisly Details.
Thus far, I have documented 140 tables and 117 stored procedures. I wrote a total of 9,748 words, which is an average of 38 words per object, or about 2 -3 sentences. When I first took a look at the task it seemed insurmountable, so I took the incremental approach, documenting somewhere between 5 and 15 objects every morning with my cup of coffee. The challenge was staying focused, and not getting lazy. I’ve yet to document functions, triggers, and views, but their numbers are small, and I’m not intimidated anymore. Continue reading

OLAP Sprint, Day 7: 9:00pm: Parting Shots (of Glenlivet 12 Y/O)

PASS Summit 2012 Day 3
Up at the crack of dawn for Lynn Langit’s great session on Amazon’s cloud offerings, called AWS. They’ve got several managed service offerings of interest for DBAs, including:

  • EC2: a VM service
  • S3: a storage service
  • Glacier: a backup service
  • RDS: a managed database service

The most compelling for me was obviously RDS, the managed database service. According to Lynn, AWS has been in the market the longest, is usually the most cost-effective, and is most often used in production (highest level of confidence). That’s a compelling trifecta. Continue reading

OLAP Sprint, Day 6: 11:20pm: Dimension Security for the Common Man

PASS Summit 2012 Day 2
Unfortunately, I accidentally slept through this morning’s Keynote address. I was sorry to have missed it, but I think I needed the bit of extra sleep. I’ve been putting in a lot of hours each day, both on my OLAP project, and soaking in all that the PASS Summit has to offer, so it was nice to get an extra hour of sleep. I wish I could say that I woke up to the sun streaming in through the windows, but I don’t think I’ve seen the sun once this week. Nobody at the PASS Summit seems to mind, though – myself included. Clearly we’re a sheltered people.

I attended some great sessions today. My favourite was a session on the ColumnStore index, delivered by Klaus Aschenbrenner. As I noted yesterday, the access to high-level expertise is energizing and inspiring, and Klaus’ session was no different. I went in to the session knowing just about nothing about ColumnStore indexes, and came out wanting to try them out. From what I gathered, they’re a new type of non-clustered index suitable for Fact tables in data warehouses. The indexes work on the column level, rather than the row level, so the level of compression that they can achieve is impressive, and it parlays into ridiculous query speed. Klaus took us through an exploration of the storage and memory footprint, the execution plan, restrictions and optimizations, and a highlight reel of ‘watch this’ moments where the ColumnStore index ripped through 2 billion records of data in seconds.

I also attended a Power Pivot for MOLAP session, and got to see the coveted tool in action, popping a multidimensional cube to life. I spoke with a Microsoft Engineer at the end of the session, and he hinted that it might be released as early as December. He did caution that “the QA team will kill us if we announce an official date”, so I guess they’re not quite ready for release yet.

Buffet Innovation
Lunch was dynamite once again (salad, beans, tomatoes, mashed potatoes and salmon), and while standing in the buffet line, I thought of a great idea: a buffet fly-over. It would be something similar to the fly-overs that you see watching golf on TV – a camera that swoops over the hole and gives the viewer a sense of how it plays (hazards, fairways, optimal lies, etc). The challenge at the front end of the buffet line is knowing what lies ahead, and how to ration the real estate on your plate. A buffet fly-over video playing on a loop at the front of the line would solve that.

Okay – back to reality. Continue reading

OLAP Sprint, Day 5: 11:30pm: The 34 Were Heard!

PASS Summit 2012 Day 1
Day 5 of my OLAP sprint – which I can now more accurately term a MOLAP sprint (or maybe MOLAP meander) – began with the official opening of PASS Summit 2012. DBAs, BI Developers, etc. from near and far packed into the ballroom at the Seattle Convention Center to hear Ted Kummert (VP of the Business Platform Division at Microsoft) give a keynote address, highlighting some of the new features of SQL Server 2012, as well as giving a sense of Microsoft’s vision for the future of the SQL Server product stack. There were some good demos of things like columnstore indexes, and a few interesting announcements, including:

  • SQL Server 2012, SP1 is now available for download
  • Project ‘Hekaton’, which will introduce in-memory processing for OLTP databases. This is scheduled to be released in ‘the next major release’ of SQL Server.

You can catch the Keynote and all the demos here if you missed it.

Amir Netz and Angelina Jolie
The unequivocal star of the show this morning was Amir Netz, a Technical Fellow at Microsoft, who works as an architect on their BI products. The title ‘Technical Fellow’ is one I enjoy. It’s refreshingly clear and unpretentious compared to the litany of acronyms that often follow people’s names (myself included, an MCTS) and yet it’s also somehow humorous in an old-timey, British way. The weight that the title bears, however, is nothing to smirk at – Technical Fellows are the highest level of technical leadership at Microsoft, and apparently there aren’t very many of them at all.

The most engaging thing about Amir is his passion and exuberance for BI. He took us through a Power View demonstration that brought the following conclusion to bear: if you want to make profitable movies with Angelina Jolie, cast her in animated movies where you can’t see her on screen.

Angelina Jolie in “Shark Tales”

Continue reading