Windows


I have been invited to present at the Directory Experts Conference in Chicago in March, hosted by NetPro Computing, Inc.. I’ll be discussing how we recently integrated dozens of Linux servers into our 300+ server Windows 2000 Native Mode forest. I’m excited, but it’s taking away from the time to update a few things here I have in “unpublished” state.

Of note is a response for T. Colin Dodd regaring his short and sweet post regarding Red Hat Flaws according to Secunia. In short, Mr. Dodd (please correct me if the address is wrong), yes, Red Hat should be proud of what they’ve accomplished, but…

Well, that’s 2 pages of text that’s not yet finished.

As I mentioned in my last clustering post, there are some Exchange problems we’ve been working on over the past few weeks.  One of the simpler problems has a complex answer, so I thought I’d explain a bit.

As any good Exchange administrator knows, Exchange stores its data (for a store) in 2 files, the EDB file, and the STM file.  However, there’s not a really great explanation of the differences between the two files – the best I’ve found so far is at MessagingTalk.org, but they only explain that the STM is MIME formatted, and the EDB is MAPI content. Why, though, and how does it affect the end users? This is what we’ll explore. (more…)

If you are setting up a cross-forest trust with selective authentication (which requires a Windows Server 2003 Native mode level forest and domain), don’t forget to grant the “Allowed to Authenticate” right to the users from the trusted domain to the servers they’ll need access to in your domain. The error messages you’ll get back (replicated here in my test VM domains) don’t really say much helpful.

System Error 317 has occurred. The system cannot find message text for message number 0x*** in the message file for ***.

System Error 317

Further information about adding the “Allowed to Authenticate” right to the trusted users is available at Microsoft TechNet. If you have the opportunity to raise your forest and domain functional levels to take advantage of this, I highly recommend it. But I recommend also (even more strongly) documenting precisely what you set.

I’ve been very busy with clients over the past 2 weeks, troubleshooting Clustering problems, Exchange issues, and planning a new trust relationship, on top of normal maintenance and design. As I solve each issue, I’ll be posting what I can about them. This week we were able to solve the odd clustering problem…

We’ve seen some issues over the past approximately 2 months, particularly with MS SQL 2000 clusters (1 Exchange 2003 cluster), where the cluster group fails on one node, and the other node (or nodes) fails to pick up the group, leaving the complete cluster group offline. In each of the cases (on both HP and Dell hardware) the first striking piece of evidence in the logs is that all nodes that fail to bring up the cluster report that the Cluster IP Address resource couldn’t be brought online, because of an IP address conflict on the network

Making this issue particularly fun is that most of the information we used to solve the problem, is a lack of information.  In particular, there is absolutely nothing interesting at all in any nodes’ cluster.log file.You see the disks negotiate from node to node, but nothing that makes the failover look any different than if you had right-clicked the group and chosen “Move Group” from Cluster Administrator.

What starts the problem off is Event ID 1228 from source “ClusNet”, which says that the “ClusNet driver couldn’t communicate with the ClusSvc for 60 seconds, the Cluster service is being terminated.” Most of the time, you might even miss that this event is there, because it causes so many Event Source Tcpip, ID 4199; Source ftdisk, ID 57; and Source ntfs event ID 50 events, that it’s easy to look over 1 little error. Especially when monitoring systems like Microsoft Operations Manager (MOM), or Idera SQLDiagnostics Manager (SQLDiag) or HP Systems Insight Manager (SIM) all report the cluster as having issues 30-60 seconds after the CluNet 1228 event is written (timing which corresponds exactly to the Tcpip 4199 events (IP address conflict) or the ftdisk 57 events (failed to flush transaction data). So, here’s what happens, based on conversations with Microsoft, training with Microsoft and HP, and a LOT of reading. (more…)

I have been working with a client and Microsoft on a very difficult issue with their Exchange 2003 system.  A few months ago, a particular store started exhibiting Event ID 623 errors from source ESE – the Extensible (or Exchange) Storage Engine.  Since this error was coming up on a server that was in the process of being decommissioned, the suggestion to “move the users to a new store” was extremely feasible.

But the problem came back 22 days later on one of the 2 stores that the users were moved to, so we knew something else must be up.  I’ll cut to the chase and explain that Microsoft now is very positive of what is happening, just not who is causing it or why it’s happening.

What’s frustrating about this is that all the tools that can be used to look deeper into this problem aren’t available to me as a technician outside of Microsoft.  All I’ve been able to do for my client is set up triggers to cause “Exchange store.exe dumps” which are essentially process freezes followed by private memory dumps to disk.  The good thing is that the end users don’t notice, nor does the Windows 2003 Cluster service.  Also, our Microsoft support team has been great at sharing information with us.

But the problem still remains, that there is nothing at all that I can do to fix this problem.  I can’t run the debug programs (I can run a debug against the process, but not to the same level of detail, due to a lack of published information) that Microsoft has available, despite a very deep understanding of how the ESE runs the EDB, STM, and LOG files (for an outside consultant who just reads voraciously).  This inability to better service my customers frustrates me to no end, whether Microsoft’s technicians are fantastic or not (there have been other times…).

So, while I wait for them to get back to me on yet another dump that has been generated, looking for a very elusive fSearch() operation against one of my client’s many Exchange 2003 stores, I sit on my hands in anticipation, wishing to be able to do more.

« Previous PageNext Page »