Montag, 30. November 2009

duplicate key error

Hilmar Buchta explains in his Blog A duplicate attribute key has been found when processing… some common reasons for duplicate key errors in processing error messages. The most likely reasons are:
  • NULL values in the attribute key column.
  • data that violates the attribute relationships defined in the dimension.
  • A third reason is that there are separate key and name source fields for an attribute, so there is more than one name corresponding to a key.
  • A related issue results from collation effects: distinct string values from an oracle source are considered as duplicates by SSAS because they only differ in capitalization (upper/lower case), trimming or some strange characters (as the german "ß" that will be replaced by "ss" or "ä", "ü", "ö" etc.).

Freitag, 20. November 2009

MDX Query Performance in SSAS

there are a few documents you should know if you want to optimize the performance of the MDX queries on your SSAS cubes:

The document SQL Server 2008: Analysis Services Performance Guide (download here) provides information on the tuning of server resources and shows strategies to enhance the performance of cube processing and query access.

The paper Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services (download here) discusses the troubleshooting tools integrated into Microsoft Windows and SQL Server and shows how to use these tools to identify and resolve MDX query performance bottlenecks with individual MDX queries. Most of the given advices are still true for SSAS 2008 (as far as I can see...)

Some interesting papers and articles can be found at the web site of the Microsoft SQL Server Development Customer Advisory Team.

Unfortunatly there are many advices which were of interest in SSAS 2005 but are no longer relevant in SSAS 2008 - so you have to be very careful when adopting the strategies shown in various blogs and newsgroups in the Net.

Donnerstag, 19. November 2009

Centrally Managing Formatting Properties in MDX Scripts

This is a quite current goody already sketched by Marc Russo:

Formatting is indeed in the twilight zone between server/analytic logic and client display issues. Originally you would want your cube to be agnostic of all that representational stuff which may change with the eye of the beholder.

On the other hand, it is unbearable to walk through the conditinal formatting result of ~100 affected measures * 30 affected reports once a formatting rule changes.

A good compromise between the "this-is-client-stuff" and the "incorporate-it-in-each-and-every-measure' is using the power of MDX scripting. The scoping approach that Marc Russo describes (and from which I learned the invaluable [Measure].CurrentMember trick) will not work, as soon as
you need to address targetted measures (such as "all currencies" or "all indexes") across calculated and real measures (an explicit measure set in a scope may only contain non-calculated measures).

So here is how it works in these cases (and similar for FORE_COLOR, BACK_COLOR, etc):

FORMAT_STRING({
 [Measures].[REAL_MEASURE_1]
 ,[Measures].[CALCULATED_MEASURE_2]
 ,...
 })=// YOUR MDX CALCULATION using [Measures].CurrentMember and returning a format string
 ;

When the Scoping gets tough ...

... the tough gets scoped.

There is a hairy observation to make that I would usually classify under the catgory "bug".

In arrogant ignorance of Microsofts issue tracking systems and even more, in absence of enough terminological background to even state a reasonable issue search query, we currently file it under "phenomenon with a workaround".

Imagine that you have developed an elaborate date hierarchy with two "branching" hierarchies, one for calendar weeks and one for months, quarters, years andsoforth. Branching means that from the point of view of attribute relationships, there are two outgoing relation branches, one in the attributes of the week hierarchy and one into the attributes of the month hierarchy.

From our experiments, calculations specified "purely" for the month scope will nevertheless also hit the week level of the parallel hierarchy just like if SSAS would rather parse the statement as "operate on (all) successor members of date".

SCOPE([DIM_DATE].[HIER_YQM].[LVL_MONTH].Members);
 // Do the usual nasty overrides and calculations
END SCOPE;

To get the behaviour right, we are now used to rather write such temporal scopes as

SCOPE([DIM_DATE].[HIER_YQM].[LVL_MONTH].Members,
      [DIM_DATE].[HIER_WEEK].[LVL_WEEK].[All],
      [DIM_DATE].[HIER_WEEK].[LVL_WEEK_YEAR].[All]);
 // Do the usual nasty overrides and calculations
END SCOPE;

where it is interesting that leaving the second all restriction away would reinforce the overrides at the calender year level nevertheless.

Still the behaviour (under the many strange behaviours of SSAS) leaves us puzzled. But I´m sure that there is a tough explanation for this in some other universe.

Mittwoch, 18. November 2009

BI Methodology

Marco Russo and Alberto Ferrari have written an interesting paper presenting their Methodology for Microsoft BI (September 2008) which can be downloaded from here. In the paper they discuss their approach in the context of the methodologies of Bill Inmon and Ralf Kimball and draw a picture of the interaction of OLTP database, Data Warehouse and OLAP Cube.

MDX Studio

As Mosha Pasumansky states MDX Studio is a
tool which helps users of Microsoft Analysis Services to analyze complex MDX expressions, monitor performance characteristics of MDX queries and get insight into how MDX interacts with other UDM features such as attribute relationships. MDX Studio provides unique visualization representation of MDX expression and allows the user to interact with different stages of MDX execution.
It's indeed an extremly useful tool: it shows valuable information that is not included in profiler traces such as the number of cache hits, calculated cells, memory usage etc. Another important feature is the ability to show a hierarchical tree of execution steps (which Mosha describes here). Very helpful is also the ANALYZE option that (- it's not a surprise -) analyzes a given query or mdx script, shows possible issues, and provides links to wider explanations (mainly from Moshas Blog).

The current version of the tool (a work in progress) can be downloaded from here.

Dienstag, 17. November 2009

EXISTING keyword

As the documentation states the EXISTING keyword forces a specified set to be evaluated within the current context instead of being evaluated within the context of the cube that contains the members of the set. Mosha Pasumasky declares "that Exists function and EXISTING operator are generalization of hierarchy related functions such as Descendants. Indeed, the process of building an hierarchy is nothing more then applying Exists between attributes which build individual levels of that hierarchy." So EXISTING shows only elements that are actually relevant in the given context.

If we are looking for the number of cities in each country in the Adventure Works Cube and use the following query:

WITH
MEMBER [CITY_COUNT] AS COUNT([GEOGRAPHY].[CITY].[CITY].MEMBERS)
SELECT {[CITY_COUNT]} ON 0
,[GEOGRAPHY].[GEOGRAPHY].[COUNTRY].MEMBERS ON 1
FROM [ADVENTURE WORKS]

we do not get the intended result:

CITY_COUNT
Australia 587
Canada 587
France 587
Germany 587
United Kingdom 587
United States 587

That's the total count of cities for every country - apparently the evaluation took place in the context of the cube and not in the current context. With EXISTING we get what we want:

WITH
MEMBER [CITY_COUNT] AS COUNT(EXISTING([GEOGRAPHY].[CITY].[CITY].MEMBERS))
SELECT {[CITY_COUNT]} ON 0
,[GEOGRAPHY].[GEOGRAPHY].[COUNTRY].MEMBERS ON 1
FROM [ADVENTURE WORKS]

provides the result:

CITY_COUNT
Australia 40
Canada 50
France 36
Germany 51
United Kingdom 35
United States 375

As already mentioned here EXISTING can also be used to solve some problems with multiselects.

In his Blog Chris Webb wrote about the performance of EXISTING in combination with NONEMPTY and stated "if you are using NonEmpty and Existing together, do the NonEmpty first and then the Existing" - but admitted that he has no sound explanation for the observed effect.

Multiselect

Multiselects are a source of constant discomfort in SSAS 2005 ff. When there are multiple members in the slice of a MDX query the currentmember function no longer defines a single element and that's the reason for many problems with calculated members. Some of these problems can be solved by the use of special syntax elements (most notably the EXISTING function) but client tools like Excel 2007 generate their own queries so the developer can not control the design (and Excel makes things even worse by using subselects which bring even more annoyance than simple where-clause multiselects).

At least there are some solutions to avoid the most unpleasent consequences - and most of these solutions come from Mosha Pasumansky:

In his blog articles Writing multiselect friendly MDX calculations and Multiselect friendly MDX for calculations looking at current coordinate Mosha Pasumansky shows a way to avoid the multiselect problems by using the function EXISTING and sum up the results. Another solution (using a stored procedure) is shown in How to detect subselect inside MDX calculations (aka Multiselect in Excel 2007) - but Mosha warns "that the solution is neither elegant nor efficient, and probably is mostly of the theoretical interest." In his article I need your help or how to make multiselect work seamlessly in MDX Mosha suggests a solution for all multiselect problems - but it seems he kept that secret...

The blog Korrekte MDX-Berechnungen mit den Analysis Services trotz Multiselect makes the suggestion to define "empty" measures (initially filled with NULL values) and then to overwrite them with scoped assignments on every relevant level - so correct values can be aggregated for higher levels.

All these solutions have an impact on performance, so Mosha writes: "Therefore, it is certainly a choice of cube designer - keep the simpler and more performance friendly expression, but accept that multiselect on the Time dimension will cause errors, or allow more functionality and enable multiselects on Time, but be prepared for slower performance in all cases."

Nested Scopes and Attribute Relationships

Nested Scopes are a convenient syntax to structure subcube/crossjoin calculations in your MDX scripts. Most often, you would want to use more than one attribute from the same dimension in such a context and, in addition, navigate the associated hierarchies in the scoped calculations:

SCOPE({[DIMENSION].[HIERARCHY_1].[LEVEL_OR_ATTRIBUTE_1].&[KEY_1]},...);
   SCOPE({[DIMENSION].[HIERARCHY_2].[LEVEL_OR_ATTRIBUTE_2].&[KEY_2]},...);
        // Your calculation involving [DIMENSION].[HIERARCHY_1].CurrentMember
        // will result in navigating [DIMENSION].[ATTRIBUTE_1].[All]!
   END SCOPE;
END SCOPE;

However, with nested scopes it seems that similar overriding rules than with concatenating ordinary tuples apply: Suppose the attribute ATTRIBUTE_2 associated with the LEVEL_OR_ATTRIBUTE_2 level is related to the attribute ATTRIBUTE_1 (i.e., KEY_2 is unique for KEY_1, in contrast to the related from relationship where KEY_1 is unique for KEY_2), the example calculation context above would result to the All member of ATTRIBUTE_1. Duh.

Writing the nesting the other way round (which would seem semantically equivalent from only superficially reading the specs, as we usually do) the delivers the expected result

SCOPE({[DIMENSION].[HIERARCHY_2].[LEVEL_OR_ATTRIBUTE_2].&[KEY_2]},...);
    SCOPE({[DIMENSION].[HIERARCHY_1].[LEVEL_OR_ATTRIBUTE_1].&[KEY_1]},...);
        // Your calculation involving [DIMENSION].[HIERARCHY_1].CurrentMember
        // will finally navigate [DIMENSION].[ATTRIBUTE_1].&[KEY_1]!
    END SCOPE;
END SCOPE;