I am playing around in my head with the idea of automated regression testing of a cube or part of a cube or via a cube. We have a fairly complex import of production data into a small data warehouse. Sometimes we or the users spend significant time manually checking values in the cube or reports, and I was thinking of trying to automate some of the checking. I poked around online and didn't find anything so I am posting here.
Can anyone point me to articles or examples of automated regression testing of SSAS cubes? By "regression testing" I mean comparing current results with expected results and recognizing differences between them.
Here are some ideas I've had. The ideas properly reflect my ignorance and lack of clarity.
Import results into Excel and use some Excel comparison utility
Import results into Excel, save as XML, and use a diff utility
Use a diff utility on some XML representation of cube values
Automate some sort of programmatic query (SSIS script task? PowerShell?)
This is not a high-volume application and the cube is not very complex.
you could try to create a measure in your cube that uses linear regression to forecast your last period (e.g. the month or day you are importing) and then use this value to check against your real ones.
look at moshas blog (http://sqljunkies.com/WebLog/mosha/archive/2004/12/21/5689.aspx) to get a starting point for linear regression.
kind regards
markus
|||
I think that idea (an interesting one, by the way) addresses a different problem from the one I'm trying to ask about. So I will try to be clearer.
We have a complex SSIS process that imports data into our database and populates our cube. When we change parts of the process or the cube definition I want to reassure myself that we haven't broken other things we didn't intend to change. Also I'd like to be able to validate the setup when it's first deployed on a new server. My notion was that for a standard set of input data I'd somehow capture results. Then I'd run the same set of data through the modified or newly deployed setup and capture results from that. Then by some automated process (analogous to diff for simple text files) I'd compare the standard saved results with the newly captured results and identify differences. In the case of a new deployment there ought not to be any differences; in the case of changes there ought to be only differences intended as a result of the changes. In all cases of course one has to handle (by exclusion or substitution) things which look different but really aren't -- e.g., newly assigned sequential integer keys, or server names, or stuff like that.
So my list of alternatives in my first message is speculating about automated (or semi-automated) ways to capture results sets and automated ways to compare results sets. Importing defined parts of the cube into Excel seemed a possible way to reliably and consistently grab cube results, and XML files seemed a static, human-inspectable way to store results. Smart XML diff engines (that do not do mere text comparisons but pay attention to the XML elements and hierarchies) seemed a possible automated way to compare results sets.
No comments:
Post a Comment