Hi there,
We've recently set up a Principle, Mirror and Witness configuration with the Mirror and Witness in a separate building to the Principle. All three are part of the same domain (DMZ) and are different servers, the buildings are connected via a fiber optic cable. All servers and SQL Server instances are logged in with the same domain admin account DMZ\esAdmin.
Mirroring is all set-up and the databases are synchronized. Every once in a while some (not all, normally 6 out of 15) databases will switch roles and become active on the mirror. The SQL Server mirroring monitor job then reports:
Date 25/01/2007 12:37:01
Log Job History (Database Mirroring Monitor Job)
Step ID 1
Server DMZSQL01
Job Name Database Mirroring Monitor Job
Step Name
Duration 00:00:02
Sql Severity 16
Sql Message ID 32038
Operator Emailed
Operator Net sent
Operator Paged
Retries Attempted 0
Message
Executed as user: DMZ\esadmin. An internal error has occurred in the database mirroring monitor. [SQLSTATE 42000] (Error 32038). The step failed.
I have no idea, what causes the failover, it could be a slow network or a bad set-up, can anyone give me some ideas of what to do to track down the problem or any experience of what could be causing this, it happens randomly every day or three. No warning and if I go to the mirror and failover back to the principle again then it's all just fine. However I don't want half my databases working on 1 server and half on the other.
Any ideas?
Thanks
Ed
UPDATE:
I've just been looking at the logs on my Mirror and at the same time it reports in this order
Error: 1479, Severity: 16, State: 1.
The mirroring connection to "TCP://DMZSQL01.dmz.local:5022" has timed out for database "WARCMedia" after 10 seconds without a response. Check the service and network connections.
Database mirroring is inactive for database 'WARCMedia'. This is an informational message only. No user action is required.
Recovery is writing a checkpoint in database 'WARCMedia' (41). This is an informational message only. No user action is required.
The mirrored database "WARCMedia" is changing roles from "PRINCIPAL" to "MIRROR" due to Failover.
Database mirroring is inactive for database 'WARCMedia'. This is an informational message only. No user action is required.
...
This looks like a time out, is there any way to set the TimeOut threashold for Database mirroring or set retry intervals?
Have you implemented network monitoring? I suggest you compare the failover times to the network monitoring history.
-Matt
|||No, but any pointes on what to monitor would be usefull.
Thanks
Ed
|||Couple of things to consider:
1. run with the safety OFF (then you wouldn't need the witness server). This won't allow automatic failover from with a real failure or a false failure. The false failures are what you seem to be seeing now, but it will allow your system to run in a more predictable manner, while you understand what is going on.
2. Consider moving the Witness (if you MUST run with the witness) to the building with the Principal. All things being equal, (i.e. if the probability of both building having a disaster is equal); then having the Witness and Principal share a "more reliable" connection should provide better reliability. Assuming: 1. servers closer together physically is more reliable. 2. You can't put the Witness in a third, equally reliable data center (who has that kind of $$$) 3. Your business needs dictate that you can put the Witness with the Principal
3. Increase the timeout "ping" for mirroring. ALTER DATABASE db SET PARTNER TIMEOUT = xx. (see BOL) By default, it is 10 seconds.
|||Thank you all for your help.
I modified the timout pint to 60 seconds and I've not had any problems since. I have a feeling either there is something on the network causing a delay or the server was just under too much load at one off times. This has solved 1 problem so I'll continue to look into the cause over the coming weeks.
Thanks
Ed
No comments:
Post a Comment