Montag, 21. Dezember 2009

Changing results with IIF hints

Recently we saw some strange effects with calculations using a conditional IIF-logic: the calculation should show the difference between two measures if the first measure is greater than 0:
IIF([Measures].[FirstMeasure]=0,NULL,
        ([Measures].[FirstMeasure]-[Measures].[SecondMeasure])
A query using the calculation returned the expected results for the [All]-node of a given hierarchy but wrong results for other levels. Since the given measures are involved in some scoped assignments we assumed an error in our logic, and were quite suprised when we discovered that a CASE-block returned the correct results:
CASE [Measures].[FirstMeasure] 
         WHEN 0 THEN NULL
         ELSE [Measures].[FirstMeasure]-[Measures].[SecondMeasure]
    END
Further research showed that IIF also brought the correct results when we used the eager hint in the calculation. The hint is a new feature in SSAS 2008 and documented here. The SQLCAT team explains the hint in the blog IIF Function Query Hints in SQL Server Analysis Services 2008:
the IIF mdx function is a commonly used expression that can be very costly to evaluate. It takes 3 arguments:

iif(< condition >, < then branch >, < else branch >)

Where the condition evaluates to true, the value from the “then branch” is used otherwise the “else branch” expression is used.

Note the term “used” – one or both branches may be evaluated even if its value is not used. It may be cheaper for the engine to evaluate the expression over the entire space and use it when needed (termed an eager plan) rather than chop up the space into a potentially enormous number of fragments and evaluate only where needed (a strict plan).

Apparently the different execution pathes can bring different results - and that is something I would not expect...

Freitag, 18. Dezember 2009

Subtotals for Named Sets

Although still in its infancy, named sets could really help you to satisfy your numerous customers ad-hoc aggregation requirements ...

... if they only would show sub-totals in excel and if you only could mount non-disjunct instances of those beasts one after the other.

Here is a tip that makes use of the implicit property of named sets to represent intersections when confronted with members of different levels from a common hierarchy.

In its extreme application, adding the [All] member to a set will deliver you a sub-total that is at the same time correctly named "Gesamtergebnis" or "Total" when displayed in excel.

Do not call a dimension attribute "DESCRIPTION"!

At least the Office Web Components (OWC) browser integrated into BI Development Studio and Management Studio will not show the values as a member property when the attribute (Name, not key nor translation) is called "DESCRIPTION".

Donnerstag, 17. Dezember 2009

Federating a Secondary SQL Server Data Source into a Primary-Oracle Data View

It is a common, though tricky, thing to host some, but not all of your analytical data that is to be fed into the cubeware of your trust in the nearby, type-consistent and easy-to-maintain SQL Server instance.

This methodology has also the nice side-effect that it will not give the .. say Oracle-minded ... datawarehouse people much to reason about such they will not get distracted from their ... um ... daily slee.., ah, work was the political correct term.

The idea will however bite you the first time you notice that you cannot mix dimensions and facts between two seperate Oracle/SQL Server data views. So you will read diverse blog entries trying to mix the two data sources within a single view.

Oracle will be your primary data source. "Add table" in the view allows you weel to choose a registered SQL Server as a secondary data source, but the table chooser stays empty!

If you read above-mentioned thread to its end, there will be the definitive hint, which is "Add Named Query" and .... here you are ... nearly.

Be aware that this will operate on some "Remote view" feature of the Oracle ODBC driver. For example, Oracle tends to lower-case only the key columns, which can fool you when switching an existing dimension from a seperate data view to the now merged view.

I would not recommend trying it the other way round (Oracle secondary, SQL Server primary data source) unless your primary data source will become bigger than the secondary one ... ok, we still hope that ... since it is to expect that using this kind of federation for practical volumes of fact data will not be funny.

Mittwoch, 9. Dezember 2009

How to translate the [All] member

I was always hesitating on that issue because I did not want to affect a thousand lines of MDX-Code in which [Dimension].[Hierarchy].[All] is a common construct, e.g., to exclude unwanted detail-calculations from a calculation scope.

But per default, there is no translation row in the dimension editor that allows one to override the display name of that ubiquitous and ugly (though, comprehensible) node.

Here comes the (straightforward) step: Simly set the property "AttributeAllMemberName" of the relevant dimensions explicitely to the default value "All" and - voilà - you have the translation rows for your pleasure without having to reinterpret your scripts.

Please be aware that the member must be also (re-)translated for all the hierarchies (these rows will also automatically appear). Similarly, you can treat the "UnknownMemberName" ...

Montag, 7. Dezember 2009

Non-Pivot-Friendly MDX through Excel 2007 [Updated]

I am sooooo glad for Greg Galloway´s tiny little comment to this post.

Because we were looking for a solution to integrate MDX queries into Excel which are not very pivot-table friendly.

Why do you ever want to do something esoteric like that? Suppose you want to compare very specific data-points in your tuple-space with each other (a monthly real margin from one year in one store with a daily plan margin from a region in a completely different year). Then, your pivot table will necessarily cross-join all the individual coordinate components (year/month/day(s) of both years, affected region/store(s), plan & real measures) and come with a solution set that is orders of magnitude too large and cannot be reasonably filtered without going nuts.

That is where you should rather build up a small (remember: SMALL in the sense of aggregated fact data) pivot table, do a drill-though and then save the resulting query table as a nice template for your future experiments.

I have not been able to create that query table by any other means in Excel 2007!

Update:

There is another way for executing and displaying mdx described by Hilmar Buchta which might be more useful when you need full (VBA) control over the connection/command behaviour, e.g., when you first need to check whether the result is empty (and hence would destroy the layouted table structure).

Freitag, 4. Dezember 2009

Monitoring current SSAS acitivities

A question DBA's are always interested in is: who is connected to my database and what are they doing? For SSAS 2008 there are some Dynamic Management Views (DMV) to answer this question(s) (and many others). The query:
SELECT TABLE_NAME
  FROM $system.dbschema_tables
 WHERE TABLE_SCHEMA = '$SYSTEM'
   AND LEFT(TABLE_NAME,8) = 'DISCOVER'
 ORDER BY table_name
provides the following list of tables (I added links to and descriptions from the documentation; some of the linked pages are not very extensive):

- DISCOVER_COMMAND_OBJECTS: Provides resource usage and activity information about the objects in use by the referenced command.
- DISCOVER_COMMANDS: Provides resource usage and activity information about the currently executing or last executed commands in the opened connections on the server.
- DISCOVER_CONNECTIONS: Provides resource usage and activity information about the currently opened connections on the server.
- DISCOVER_DB_CONNECTIONS: Provides resource usage and activity information about the currently opened connections from the server to a database.
- DISCOVER_DIMENSION_STAT: This schema rowset returns statistics on the specified dimension.
- DISCOVER_ENUMERATORS: Returns a list of names, data types, and enumeration values of enumerators supported by the Microsoft XML for Analysis (XMLA) provider for a specific data source.
- DISCOVER_INSTANCES: Describes the instances on the server.
- DISCOVER_JOBS: Provides information about the active jobs executing on the server.
- DISCOVER_KEYWORDS: Returns information about keywords reserved by the Microsoft XML for Analysis (XMLA) provider.
- DISCOVER_LITERALS: Returns information about literals, including data types and values, supported by the Microsoft XML for Analysis (XMLA) provider.
- DISCOVER_LOCKS: Provides information about the current standing locks on the server.
- DISCOVER_MASTER_KEY: This schema rowset returns the server's master encryption key (the key that is used by the server to encrypt sensitive data on disk).
- DISCOVER_MEMORYGRANT: This rowset returns a list of internal memory quota grants that are taken by jobs that are currently running on the server.
- DISCOVER_MEMORYUSAGE: This schema rowset returns the DISCOVER_MEMORYUSAGE statistics for various objects allocated by the server.
- DISCOVER_OBJECT_ACTIVITY: This schema rowset provides resource usage per object since the start of the service. Some further information on this view can be found in Vidas Matelis' Blog.
- DISCOVER_OBJECT_MEMORY_USAGE: Provides information about memory resources used by objects.
- DISCOVER_PARTITION_DIMENSION_STAT: This schema rowset returns statistics on the dimension associated with a partition.
- DISCOVER_PARTITION_STAT: This schema rowset returns statistics on aggregations in a particular partition.
- DISCOVER_PERFORMANCE_COUNTERS: This schema rowset returns the value of one or more specified performance counters.
- DISCOVER_PROPERTIES: Returns a list of information and values about the standard and provider-specific properties that are supported by the Microsoft XML for Analysis (XMLA) provider for the specified data source.
- DISCOVER_SCHEMA_ROWSETS: Returns the names, restrictions, description, and other information for all enumeration values and any additional provider-specific enumeration values supported by the Microsoft XML for Analysis (XMLA) provider.
- DISCOVER_SESSIONS: Provides resource usage and activity information about the currently opened sessions on the server.
- DISCOVER_TRACE_COLUMNS: This schema rowset contains the DISCOVER_TRACE_COLUMNS schema rowset.
- DISCOVER_TRACE_DEFINITION_PROVIDERINFO: This schema rowset returns basic information about the trace provider, such as its name and description.
- DISCOVER_TRACE_EVENT_CATEGORIES: This schema rowset contains the DISCOVER_TRACE_EVENT_CATEGORIES schema rowset.
- DISCOVER_TRACES: This schema rowset contains the DISCOVER_TRACES schema rowset.
- DISCOVER_TRANSACTIONS: This schema rowset returns the current set of pending transactions on the system.

Some of these views have to be queried with the function SYSTEMRESTRICTSCHEMA to query restricted schema rowsets - you find some examples for the use of this function in Vidas Matelis' Blog.

To answer the initial question - who is on my db? - you could use the (simple) query
SELECT *
  FROM $system.DISCOVER_SESSIONS

Another option to get these information is to use the small GUI-application Activity Viewer (download from codeplex).

Dienstag, 1. Dezember 2009

Editions

The msdn site Features Supported by the Editions of SQL Server 2008 shows the differences between the SQL Server editions. Regarding SSAS the Standard Edition (SE) misses the following features compared to the Enterprise Edition (EE):

Data Warehouse Scale and Performance
- Auto parallel partition processing
- Partitioned cubes
- Distributed partitioned cubes

Multi-Dimensional Analytics
- Financial aggregations
- Custom rollups
- Semi-additive measures
- Writeback dimensions
- Linked measures and dimensions
- Binary and compressed XML transport
- Account intelligence
- Perspectives
- Analysis Services shared, scalable databases

Data Mining (in this category SE is quite weak)
- Cross validation
- Models on filtered subsets of mining structure data
- Time series: custom blending between ARTXP and ARIMA models
- Time series: prediction with new data
- Unlimited concurrent data mining queries
- Advanced configuration and tuning for algorithms
- Algorithm plug-in API
- Parallel model processing
- Time series: cross-series prediction
- Unlimited attributes for association rules
- Sequence prediction
- Multiple prediction targets for naïve Bayes, neural network, and logistic regression

Of course you can build cubes without these features but they are very helpful in many situations. On the other side there's of course a difference in the licence fees... - here are some additional links to licence and pricing information:

- SQL Server 2008 Pricing
- SQL Server 2008 Licensing FAQ

Montag, 30. November 2009

duplicate key error

Hilmar Buchta explains in his Blog A duplicate attribute key has been found when processing… some common reasons for duplicate key errors in processing error messages. The most likely reasons are:
  • NULL values in the attribute key column.
  • data that violates the attribute relationships defined in the dimension.
  • A third reason is that there are separate key and name source fields for an attribute, so there is more than one name corresponding to a key.
  • A related issue results from collation effects: distinct string values from an oracle source are considered as duplicates by SSAS because they only differ in capitalization (upper/lower case), trimming or some strange characters (as the german "ß" that will be replaced by "ss" or "ä", "ü", "ö" etc.).

Freitag, 20. November 2009

MDX Query Performance in SSAS

there are a few documents you should know if you want to optimize the performance of the MDX queries on your SSAS cubes:

The document SQL Server 2008: Analysis Services Performance Guide (download here) provides information on the tuning of server resources and shows strategies to enhance the performance of cube processing and query access.

The paper Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services (download here) discusses the troubleshooting tools integrated into Microsoft Windows and SQL Server and shows how to use these tools to identify and resolve MDX query performance bottlenecks with individual MDX queries. Most of the given advices are still true for SSAS 2008 (as far as I can see...)

Some interesting papers and articles can be found at the web site of the Microsoft SQL Server Development Customer Advisory Team.

Unfortunatly there are many advices which were of interest in SSAS 2005 but are no longer relevant in SSAS 2008 - so you have to be very careful when adopting the strategies shown in various blogs and newsgroups in the Net.

Donnerstag, 19. November 2009

Centrally Managing Formatting Properties in MDX Scripts

This is a quite current goody already sketched by Marc Russo:

Formatting is indeed in the twilight zone between server/analytic logic and client display issues. Originally you would want your cube to be agnostic of all that representational stuff which may change with the eye of the beholder.

On the other hand, it is unbearable to walk through the conditinal formatting result of ~100 affected measures * 30 affected reports once a formatting rule changes.

A good compromise between the "this-is-client-stuff" and the "incorporate-it-in-each-and-every-measure' is using the power of MDX scripting. The scoping approach that Marc Russo describes (and from which I learned the invaluable [Measure].CurrentMember trick) will not work, as soon as
you need to address targetted measures (such as "all currencies" or "all indexes") across calculated and real measures (an explicit measure set in a scope may only contain non-calculated measures).

So here is how it works in these cases (and similar for FORE_COLOR, BACK_COLOR, etc):

FORMAT_STRING({
 [Measures].[REAL_MEASURE_1]
 ,[Measures].[CALCULATED_MEASURE_2]
 ,...
 })=// YOUR MDX CALCULATION using [Measures].CurrentMember and returning a format string
 ;

When the Scoping gets tough ...

... the tough gets scoped.

There is a hairy observation to make that I would usually classify under the catgory "bug".

In arrogant ignorance of Microsofts issue tracking systems and even more, in absence of enough terminological background to even state a reasonable issue search query, we currently file it under "phenomenon with a workaround".

Imagine that you have developed an elaborate date hierarchy with two "branching" hierarchies, one for calendar weeks and one for months, quarters, years andsoforth. Branching means that from the point of view of attribute relationships, there are two outgoing relation branches, one in the attributes of the week hierarchy and one into the attributes of the month hierarchy.

From our experiments, calculations specified "purely" for the month scope will nevertheless also hit the week level of the parallel hierarchy just like if SSAS would rather parse the statement as "operate on (all) successor members of date".

SCOPE([DIM_DATE].[HIER_YQM].[LVL_MONTH].Members);
 // Do the usual nasty overrides and calculations
END SCOPE;

To get the behaviour right, we are now used to rather write such temporal scopes as

SCOPE([DIM_DATE].[HIER_YQM].[LVL_MONTH].Members,
      [DIM_DATE].[HIER_WEEK].[LVL_WEEK].[All],
      [DIM_DATE].[HIER_WEEK].[LVL_WEEK_YEAR].[All]);
 // Do the usual nasty overrides and calculations
END SCOPE;

where it is interesting that leaving the second all restriction away would reinforce the overrides at the calender year level nevertheless.

Still the behaviour (under the many strange behaviours of SSAS) leaves us puzzled. But I´m sure that there is a tough explanation for this in some other universe.

Mittwoch, 18. November 2009

BI Methodology

Marco Russo and Alberto Ferrari have written an interesting paper presenting their Methodology for Microsoft BI (September 2008) which can be downloaded from here. In the paper they discuss their approach in the context of the methodologies of Bill Inmon and Ralf Kimball and draw a picture of the interaction of OLTP database, Data Warehouse and OLAP Cube.

MDX Studio

As Mosha Pasumansky states MDX Studio is a
tool which helps users of Microsoft Analysis Services to analyze complex MDX expressions, monitor performance characteristics of MDX queries and get insight into how MDX interacts with other UDM features such as attribute relationships. MDX Studio provides unique visualization representation of MDX expression and allows the user to interact with different stages of MDX execution.
It's indeed an extremly useful tool: it shows valuable information that is not included in profiler traces such as the number of cache hits, calculated cells, memory usage etc. Another important feature is the ability to show a hierarchical tree of execution steps (which Mosha describes here). Very helpful is also the ANALYZE option that (- it's not a surprise -) analyzes a given query or mdx script, shows possible issues, and provides links to wider explanations (mainly from Moshas Blog).

The current version of the tool (a work in progress) can be downloaded from here.

Dienstag, 17. November 2009

EXISTING keyword

As the documentation states the EXISTING keyword forces a specified set to be evaluated within the current context instead of being evaluated within the context of the cube that contains the members of the set. Mosha Pasumasky declares "that Exists function and EXISTING operator are generalization of hierarchy related functions such as Descendants. Indeed, the process of building an hierarchy is nothing more then applying Exists between attributes which build individual levels of that hierarchy." So EXISTING shows only elements that are actually relevant in the given context.

If we are looking for the number of cities in each country in the Adventure Works Cube and use the following query:

WITH
MEMBER [CITY_COUNT] AS COUNT([GEOGRAPHY].[CITY].[CITY].MEMBERS)
SELECT {[CITY_COUNT]} ON 0
,[GEOGRAPHY].[GEOGRAPHY].[COUNTRY].MEMBERS ON 1
FROM [ADVENTURE WORKS]

we do not get the intended result:

CITY_COUNT
Australia 587
Canada 587
France 587
Germany 587
United Kingdom 587
United States 587

That's the total count of cities for every country - apparently the evaluation took place in the context of the cube and not in the current context. With EXISTING we get what we want:

WITH
MEMBER [CITY_COUNT] AS COUNT(EXISTING([GEOGRAPHY].[CITY].[CITY].MEMBERS))
SELECT {[CITY_COUNT]} ON 0
,[GEOGRAPHY].[GEOGRAPHY].[COUNTRY].MEMBERS ON 1
FROM [ADVENTURE WORKS]

provides the result:

CITY_COUNT
Australia 40
Canada 50
France 36
Germany 51
United Kingdom 35
United States 375

As already mentioned here EXISTING can also be used to solve some problems with multiselects.

In his Blog Chris Webb wrote about the performance of EXISTING in combination with NONEMPTY and stated "if you are using NonEmpty and Existing together, do the NonEmpty first and then the Existing" - but admitted that he has no sound explanation for the observed effect.

Multiselect

Multiselects are a source of constant discomfort in SSAS 2005 ff. When there are multiple members in the slice of a MDX query the currentmember function no longer defines a single element and that's the reason for many problems with calculated members. Some of these problems can be solved by the use of special syntax elements (most notably the EXISTING function) but client tools like Excel 2007 generate their own queries so the developer can not control the design (and Excel makes things even worse by using subselects which bring even more annoyance than simple where-clause multiselects).

At least there are some solutions to avoid the most unpleasent consequences - and most of these solutions come from Mosha Pasumansky:

In his blog articles Writing multiselect friendly MDX calculations and Multiselect friendly MDX for calculations looking at current coordinate Mosha Pasumansky shows a way to avoid the multiselect problems by using the function EXISTING and sum up the results. Another solution (using a stored procedure) is shown in How to detect subselect inside MDX calculations (aka Multiselect in Excel 2007) - but Mosha warns "that the solution is neither elegant nor efficient, and probably is mostly of the theoretical interest." In his article I need your help or how to make multiselect work seamlessly in MDX Mosha suggests a solution for all multiselect problems - but it seems he kept that secret...

The blog Korrekte MDX-Berechnungen mit den Analysis Services trotz Multiselect makes the suggestion to define "empty" measures (initially filled with NULL values) and then to overwrite them with scoped assignments on every relevant level - so correct values can be aggregated for higher levels.

All these solutions have an impact on performance, so Mosha writes: "Therefore, it is certainly a choice of cube designer - keep the simpler and more performance friendly expression, but accept that multiselect on the Time dimension will cause errors, or allow more functionality and enable multiselects on Time, but be prepared for slower performance in all cases."

Nested Scopes and Attribute Relationships

Nested Scopes are a convenient syntax to structure subcube/crossjoin calculations in your MDX scripts. Most often, you would want to use more than one attribute from the same dimension in such a context and, in addition, navigate the associated hierarchies in the scoped calculations:

SCOPE({[DIMENSION].[HIERARCHY_1].[LEVEL_OR_ATTRIBUTE_1].&[KEY_1]},...);
   SCOPE({[DIMENSION].[HIERARCHY_2].[LEVEL_OR_ATTRIBUTE_2].&[KEY_2]},...);
        // Your calculation involving [DIMENSION].[HIERARCHY_1].CurrentMember
        // will result in navigating [DIMENSION].[ATTRIBUTE_1].[All]!
   END SCOPE;
END SCOPE;

However, with nested scopes it seems that similar overriding rules than with concatenating ordinary tuples apply: Suppose the attribute ATTRIBUTE_2 associated with the LEVEL_OR_ATTRIBUTE_2 level is related to the attribute ATTRIBUTE_1 (i.e., KEY_2 is unique for KEY_1, in contrast to the related from relationship where KEY_1 is unique for KEY_2), the example calculation context above would result to the All member of ATTRIBUTE_1. Duh.

Writing the nesting the other way round (which would seem semantically equivalent from only superficially reading the specs, as we usually do) the delivers the expected result

SCOPE({[DIMENSION].[HIERARCHY_2].[LEVEL_OR_ATTRIBUTE_2].&[KEY_2]},...);
    SCOPE({[DIMENSION].[HIERARCHY_1].[LEVEL_OR_ATTRIBUTE_1].&[KEY_1]},...);
        // Your calculation involving [DIMENSION].[HIERARCHY_1].CurrentMember
        // will finally navigate [DIMENSION].[ATTRIBUTE_1].&[KEY_1]!
    END SCOPE;
END SCOPE;