Freitag, 29. Januar 2010

The difference between Order By Name and Order By Key in trivial attributes

This is a straigthforward follow-up to the last post.

Suppose you have a trivial attribute, i.e., its key is mapped to a single column of a relational data source. Its name and value has been ommitted, i.e., corresponds to a representation of the key.

Now Order By Name (which should be the default) will lead to a lexicographic order based on the representation while only Order By Key will use the natural ordering of the underlying data type.

I´m sure you or the various books of your choices did know this already. But I didn´t.

Dienstag, 26. Januar 2010

The difference between Names and descriptive Attributes

Depending on the configuration of your SSAS landscape (DB/OS/Locale of the data source, OS/Server Locale and Column Collation of the SSAS machine/cluster), you can get very strange processing errors in your dimensions especially related to character-based attributes/names.

For example, the default collation settings in SSAS (ignore whitespaces, case-insensitivity) will usually unify more (European/German-rooted) strings than a "distinct" selection in the database of your choice. NULL entries which SSAS cannot stand at all in dimension processing are also a constant cause of problems in that respect ...

If you now wonder, why this leads to processing errors in certain attributes (duplicate key error) and not in others, please remind the setup of each attribute into a key and a name column (the latter pointing to the former in the standard-case).

One of the fundamental differences between the two is that a "select distinct" of the key columns MUST NOT have two rows which will be unified by SSAS collation processing while this is not necessarily true for name columns in which case an arbitrary value is taken as the representative.

Quasi-Cursor-Reading of Large Cellsets

One annoyance especially when batch-testing Cubes/Dimensions against the relational data source is that the MDX/XMLA-Architecture has no concept of continuation or open-cursor-reading.

And we all know how soon SSMS or MDX Studio present an OutOfMemoryException!

One particularly useful construct in that regard is the
SubSet(Set,Position,Length)
MDX-function which can be easily used to technically partition large result spaces into manageable junks.

Montag, 25. Januar 2010

Grouping in SQL and MDX

In his Blog MDX equivalent of a filtered GROUP BY in SQL Darren Gosbell shows how to translate a SQL query (with a restriction in the WHERE-clause and a GROUP BY aggregating elements of a higher hierarchy level) into a corresponding MDX query. Although MDX looks like SQL it's something completly different, so every translation is beneficial.

Freitag, 22. Januar 2010

Strange Perspectives

In SSAS Persepctives are a very useful tool to control the visibility of objects (dimesions, measures, hierarchies etc.). They represent subsets of cubes and define Views with the relevant data for a specific audience. In the Perspectives tab of the cube designer in BIDS I can exclude objects, that I want to hide from my users. But obviously sometimes SSAS does not believe that I know what I do:

In the following example I defined a simple cube with only one measure group and some dimensions. In addition I created a calculation TestQuantity. In the Perspectives tab I choosed to include all the measures and the additional calculation:











In the browser we see the expected elements:















Now I exclude some of my measures from the Perspective:











And again I get the expected selection:












But when I exclude all the basic measures and want to include only my calculation, SSAS apparently starts to question my accountability:











The result again contains all the measures of the group, I wanted to exclude:















Perhaps SSAS believes that a cube (or a cube view) without measures is not very beneficial, but since the calculation TestQuantity was not excluded, I see no reason for this implicit addition of objects.

For the sake of completeness a link to William Pearson's article on Perspectives in the Database Journal and another one to the (books online) documentation.

Mittwoch, 20. Januar 2010

Official release date for SQL Server 2008 R2

According to the Data Platform Insider Blog ''SQL Server 2008 R2'' received an official release date and will be available by May 2010.

Sonntag, 10. Januar 2010

Dynamic Management Views (DMV) in SSAS

Vincent Rainardi wrote in his Blog SSAS DMV (Dynamic Management View) on a topic we tried already to cover here some weeks ago. The article provides some interesting links to Blogs that explain who to join the results of different DMVs together.

Mittwoch, 6. Januar 2010

Debugging Attribute Relationships in the ROLAP Basis

I guess that this is quite obvious and could as well be published in our ROLAP blog, but just for the sake of completeness:

The following SQL-Statement lists all values of a source attribute
src
in your dimension source table/view
dim
which violate an attribute relationship (path) to a destination attribute
dest
.

select src from dim group by src having count(distinct dest)>1

Extending this a bit delivers all the bad rows:

select * from dim where src in (select src from dim group by src having count(distinct dest)>1)

Dienstag, 5. Januar 2010

Excel and MDX multiselect problems revisited

Multiselect problems ins SSAS 2005 ff come in several forms. Today, I want to take a look at several forms of multiselect problems, how they interact with Excel as a MDX client and how they can be avoided. Let's first look a problem that can be easily detected.

The current coordinate is a set


In his blog article Multiselect friendly MDX for calculations looking at current coordinate
Mosha Pasumansky explains how multiselect problems that use a set in the current coordinate occur and how they can be avoided. Let's show an example for the classical Adventure Works Cube. I use SSAS 2008 and Excel 2007. Let's define the sales amount for the year before as follows:

create member [Measures].[Prior Year Sales Amount] as ([Measures].[Sales Amount],ParallelPeriod([Date].[Calendar].[Calendar Year],1,[Date].[Calendar].CurrentMember));

Let's look at a simple example in Excel 2008:






So far, it works fine, now let's filter on the dates, first on calendar year 2002.

It still works fine. Now let's filter on two months, say May and June 2002. Here's the result.
It doesn't work anymore because now, the current coordinate is a set. Here's the query:

SELECT {[Measures].[Sales Amount],[Measures].[Prior Year Sales Amount]} ON COLUMNS , NON EMPTY Hierarchize({DrilldownLevel({[Product].[Product Categories].[All Products]})}) ON ROWS FROM (SELECT ({[Date].[Calendar].[Month].&[2002]&[5],[Date].[Calendar].[Month].&[2002]&[6]}) ON COLUMNS FROM [Adventure Works])

The set occurs in the subselect that Excel 2007 uses to implement the filter. If we do the same in Excel 2003, we get a similar result. it doesn't work but we get a least an error.

Now, the query uses a where clause and a calculated member to represent the set. Here's the query:

WITH MEMBER [Date].[Calendar].[XL_QZX] AS 'Aggregate({[Date].[Calendar].[Month].&[2002]&[6],[Date].[Calendar].[Month].&[2002]&[5]})' SELECT {[Measures].[Sales Amount],[Measures].[Prior Year Sales Amount]} ON COLUMNS , NON EMPTY Hierarchize(AddCalculatedMembers({DrilldownLevel({[Product].[Product Categories].[All Products]})})) ON ROWS FROM [Adventure Works] WHERE ([Date].[Calendar].[XL_QZX])

The error basically states the the CurrentMember function, which is used inside the ParallelPeriods function of our calculated measure [Prior Year Sales Amount] now contains a set.
A simple try of the same in Excel 2010 beta also shows that is currently does not work (even if you use the new SSAS 2008 R2 CTP).

Let's summarize: Calculations that use CurrentMember (or functions that use CurrentMember) are not multiselect-safe in Excel 2003, 2007 and ,at least up to now, in Excel 2010.

A possible workaround is described in Korrekte MDX-Berechnungen mit den Analysis Services trotz multiselect. The idea is to add a named calcuation in the data source view and to scope this calculation later on. Alternatively, you can just live with it, you just get no result or an error.

subselects and the current coordinate´problem



While the first problem can be easily detected Excel(you get no results or an error), the following problem is more subtle and cannot easily be detected. It occurs in Excel version 2007 (at up to now in 2010 beta). Let us again show an example in the adventure works cube. We define

create member [Measures].[Number of Products]

as

Count(Existing([Product].[Product].[Product].MEMBERS));



Here is a simple excel 2007 example:




Let's now filter on the products dimension.We restrict to Accessories and Bikes and also leave out the touring bikes. Here's the result:



Well nice try, but unfortunately a wrong result. The visual totals are wrong, the sum of bikes should be 103 and the total 138. Let's look at the query:

SELECT NON EMPTY
Hierarchize(DrilldownMember({{DrilldownLevel({[Product].[Product Categories].[All Products]})}}, {[Product].[Product Categories].[Category].&[1]})) DIMENSION PROPERTIES PARENT_UNIQUE_NAME,[Product].[Product Categories].[Subcategory].[Category] ON COLUMNS
FROM
(SELECT ({[Product].[Product Categories].[Subcategory].&[2], [Product].[Product Categories].[Subcategory].&[1],
[Product].[Product Categories].[Category].&[4]}) ON COLUMNS
FROM [Adventure Works]) WHERE ([Measures].[Number of Products]
)


The filter step in Excel 2007 is implemented by a subselect. Unfortunately, the subselect does not change the current coordinate and thus the query execution context is the [All] member of the product dimension. And indead, the totals and subtotals are those for an unfiltered query.

A solution is to rely on a feature of SSAS2008, dynamic sets. The following solves the problem for SSAS 2008 and Excel 2007 and 2010.

create dynamic set Products as [Product].[Product].[Product]; create member[Measures].[Number of Products]
as
count(existing(Products));

Here's the result:


Although Excel 2003 doesn't have subselect problems (it does not generate queries that use subselect), this solution does not work for Excel 2003 (the generated query does not benefit from the dynamic set, i.e., the results are still wrong.

Additionally, the problem still exists if we use different measures like Adventure Works
[Measures].[Ratio to Parent Product] because here, dynamic sets cannot help to solve the problem. Let's show this in the next example:

Now let's filter again:


Wow, amazing, all ratio to parent values are wrong, column real ratio gives you the correct values. Again, this is the effect of the subselect, that implements the filter. The current coordinate does not change, thus you see values of the form sum of filtered bikes / unfiltered parent of bikes, i.e. the 73,15% in the bikes row is equal to $80.324,94 / $109.809.274,20 (unfiltered total). The real ratios are in column "real ratio".

This also does not work in Excel 2010, but Excel 2003 computes correct values:


A solution to solve this problem in Excel 2007 and 2010 uses the same idea above, adding a hard measure (named calculation) together with scoping.

Let's summarize our results:

"The current coordinate is a set"-problem occurs in Excel versions 2003, 2007 and 2010 beta. You can either live with it (your get no result or an error) or you have to define a hard measure and scope it.

The "subselect and current coordinate"-problem occurs only in Excel versions 2007 and 2010 beta as these Excel versions rely on subselect to implement filters. Sometimes this problem can be circumvented by the use of dynamic sets (if your calculated measures relies on dynamically counting a set). In other cases, you have to define a hard measure and scope it.