Sunday, March 11, 2012

Automatic Failover Problem


Setup Configuration:

3 servers
- PRINCIPAL IP: 10.2.5.31 - DNS Lookup: db-server-2.mosside.choruscall.com
- MIRROR IP: 10.2.5.30 - DNS Lookup: sql-mirror.mosside.choruscall.com
- WITNESS ip: 10.2.5.32 - DNS Lookup: sql-witness.mosside.choruscall.com

Each Server is running Windows Server 2003 Enterprise Edition with SQL Server 2005 Enterprise Edition.
All server instances are enabled for remote connections(By default they are not).
All servers have the flag 1400 traceon and have been restarted.
PORT 5022 is unrestricted on network.

The server instances are connecting via certificates. Each server has an endpoint for the certificates to to connect on.

Certificate Setup Proceedure:

Principal_Host:
1. Create Master Key with Password

2. Create certificate with subject

3. Create endpoint for certificate (Listener_Port = 5022, Listener_ip = all)
to connect on for database_mirroring

4. Backup Certificate (principal_cert.cer)

5. Take backed up certificate to Mirror_Host

(Reapeat Steps 1-5 for Witness and Mirror)


Mirror_Host: Create Certificate on Mirror_Host for inbound connections from Principal:

6.(On Mirror_Host) Create Login for Principal using same password in step 1 (principal_login)

7. Create user for login just created. (principal_user)

8. Create local certificate for Principal on Mirror using certificate generated by principal.

ex: Create Certificate Principal_cert Authorization Principal_user FROM FILE='c:\principal_cert.cer'

9. (If an endpoint has been created already on the mirror)Grant connectiion to the login:

ex: Grant connect on endpoint::mirror_endpoint to principal_login

Repeat Steaps 6-9 for Principal and Witness Servers accordingly.


10. Import Database to SQL Server 2005 Principal Instance

11. Backup Database to disk with format

12. Backup Database log file to disk with format

13. Copy backups to mirror

14. Restore Database and log file with norecovery on Mirror_Host

15. Configre Database for Database Mirroring on Principal Server
There are two ways to do this. Via the wizzard or via the Transact-SQL window.
Using the wizzard appears to work since I started using FQDN.

PROBLEM:

After configuration, everythig appears to be correct. That is, the principal displays
that it is the principal and it is synchronized with the mirror. The mirror also displays that it is the
mirror and it is synchronized with the principal and it is in recovery. If I failover manually, the mirror
becomes the principal and the principal becomes the mirror (They form a quarum). If I disconnect the principal
from the network, the mirror is supposed to form a quarum with the witness and promote itself to principal status.
This is not what is happening. The witness recognizes that the principal is down and logs that info into its log file.
The Mirror attempts to contact the witness but cannot log onto the machine. The Mirror Logs the following:

Error: 1438, Severity: 16, State: 2.
The server instance Witness rejected configure request; read its error log file for more information.
The reason 1451, and state 3, can be of use for diagnostics by Microsoft.
This is a transient error hence retrying the request is likely to succeed.
Correct the cause if any and retry.


<<<<<<<MIRROR SERVER >>>>>>>>

2007-09-06 15:08:45.32 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:08:45.32 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:09:05.32 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:09:05.32 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:09:25.33 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:09:25.33 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:09:45.34 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:09:45.34 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:10:05.35 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:10:05.35 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:10:25.36 spid23s Error: 1438, Severity: 16, State: 2.

<<<<<<< WITNESS SERVER >>>>>>>>

2007-09-06 14:19:55.90 spid52 The Database Mirroring protocol transport is now listening for connections.
2007-09-06 15:07:11.64 spid24s Error: 1479, Severity: 16, State: 1.
2007-09-06 15:07:11.64 spid24s The mirroring connection to "TCP://db-server-2:5022" has timed out for database "APS_SQL_DEV" after 10 seconds without a response. Check the service and network connections.
2007-09-06 15:07:43.20 Server Error: 1474, Severity: 16, State: 1.
2007-09-06 15:07:43.20 Server Database mirroring connection error 4 '64(The specified network name is no longer available.)' for 'TCP://db-server-2:5022'.
2007-09-06 15:08:06.03 spid9s Error: 1474, Severity: 16, State: 1.
2007-09-06 15:08:06.03 spid9s Database mirroring connection error 2 'Connection attempt failed with error: '10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)'.' for 'TCP://db-server-2:5022'.

hello,

from the logs it looks like you are not using the fully-qualified domain name (fqdn) when you are establishing the mirroring sessions. this is most likely the reason for the lack of connectivity.

when all the partners are up, you can use the sys.database_mirroring views on the principal and the mirror to figure out whether they both can talk to the witness (when all three nodes are up). if all connections are up, then automatic failover should happen. in your case most probably the mirror's connection to the witness will be shown as "disconnected", but if you use the fqdns it should work.

hth,

kaloian.

|||Trying giving the fully qualified domain name and see if it works as shown below,

IPCONFIG /ALL
Concatenate the "Host Name" and "Primary DNS Suffix". If you see something like:
Host Name . . . . . . . . . . . . : A
Primary Dns Suffix . . . . . . . : corp.mycompany.com

Then the computer name is just A.corp.mycompany.com. Prefix 'TCP://' and append ':' and you then have the partner name.
On the mirror server, you would just repeat the same command, but with the principal server named :

ALTER DATABASE [AdventureWorks] SET PARTNER =
N'TCP://A.corp.mycompany.com:5022'

On the principal server, you next specify the witness server:

ALTER DATABASE [AdventureWorks] SET WITNESS =
N'TCP://W.corp.mycompany.com:5026'


|||

Thank for replying. I appreciate it very much.

I had to add the FQDN in the host file in Windows->system32->drivers->etc->host on the mirror server and principal server inatnaces and I restarted the witness server after I setup failover. Once I did all that...it worked.

Thanks Again!

Chris

No comments:

Post a Comment