Showing posts with label mirror. Show all posts
Showing posts with label mirror. Show all posts

Monday, March 19, 2012

Automatic Page Repair with Database Mirroring

One of the new features announced with database mirroring is automatic page repair, where principal server picks the corrupt page from the mirror to repair it's own database This is really interresting, but will the feature be configurable? Will this feature be available in both synchronous as asynchronous mode? And, perhaps equally important, will it be an enterprise edition feature, or available in standard edition as well?

hello,

first, what do you mean by configurable? although the final details are still being decided, the intention is that the feature is always on by default with an option to manually turn it off for extreme cases (e.g. half the disk goes bad and it's better off that the entire database is restored as opposed to having mirroring ship tons of pages).

yes, automatic page repair will be available in both synchronous and asynchronous mode - there is really no difference between the two modes from the point of view of this feature, other than the delay one might get before the page is repaired, if the outstanding log send queue is very long in async.

as for availability of the feature in standard/enterprise, this is also still being decided, but all details will be in the final release documentation.

let me know if you have any other questions.

cheers,

kal.

|||That's what I wanted to know: The option of turning the feature off, and that it worked in both synchronous and asynchronous mode. I guessed that I'd have to wait to know if this is going to be an enterprise or standard-edition feature Smile Thanks a lot.|||

That's not a feature, that's a Blessing!

I would not want to bless big customers and damn the others:-)

ok...2 days to go before vacations...

Automatic Page Repair with Database Mirroring

One of the new features announced with database mirroring is automatic page repair, where principal server picks the corrupt page from the mirror to repair it's own database This is really interresting, but will the feature be configurable? Will this feature be available in both synchronous as asynchronous mode? And, perhaps equally important, will it be an enterprise edition feature, or available in standard edition as well?

hello,

first, what do you mean by configurable? although the final details are still being decided, the intention is that the feature is always on by default with an option to manually turn it off for extreme cases (e.g. half the disk goes bad and it's better off that the entire database is restored as opposed to having mirroring ship tons of pages).

yes, automatic page repair will be available in both synchronous and asynchronous mode - there is really no difference between the two modes from the point of view of this feature, other than the delay one might get before the page is repaired, if the outstanding log send queue is very long in async.

as for availability of the feature in standard/enterprise, this is also still being decided, but all details will be in the final release documentation.

let me know if you have any other questions.

cheers,

kal.

|||That's what I wanted to know: The option of turning the feature off, and that it worked in both synchronous and asynchronous mode. I guessed that I'd have to wait to know if this is going to be an enterprise or standard-edition feature Smile Thanks a lot.|||

That's not a feature, that's a Blessing!

I would not want to bless big customers and damn the others:-)

ok...2 days to go before vacations...

Sunday, March 11, 2012

Automatic failover... help

Hi there,

We've recently set up a Principle, Mirror and Witness configuration with the Mirror and Witness in a separate building to the Principle. All three are part of the same domain (DMZ) and are different servers, the buildings are connected via a fiber optic cable. All servers and SQL Server instances are logged in with the same domain admin account DMZ\esAdmin.

Mirroring is all set-up and the databases are synchronized. Every once in a while some (not all, normally 6 out of 15) databases will switch roles and become active on the mirror. The SQL Server mirroring monitor job then reports:

Date 25/01/2007 12:37:01
Log Job History (Database Mirroring Monitor Job)

Step ID 1
Server DMZSQL01
Job Name Database Mirroring Monitor Job
Step Name
Duration 00:00:02
Sql Severity 16
Sql Message ID 32038
Operator Emailed
Operator Net sent
Operator Paged
Retries Attempted 0

Message
Executed as user: DMZ\esadmin. An internal error has occurred in the database mirroring monitor. [SQLSTATE 42000] (Error 32038). The step failed.

I have no idea, what causes the failover, it could be a slow network or a bad set-up, can anyone give me some ideas of what to do to track down the problem or any experience of what could be causing this, it happens randomly every day or three. No warning and if I go to the mirror and failover back to the principle again then it's all just fine. However I don't want half my databases working on 1 server and half on the other.

Any ideas?

Thanks
Ed

UPDATE:

I've just been looking at the logs on my Mirror and at the same time it reports in this order

Error: 1479, Severity: 16, State: 1.

The mirroring connection to "TCP://DMZSQL01.dmz.local:5022" has timed out for database "WARCMedia" after 10 seconds without a response. Check the service and network connections.

Database mirroring is inactive for database 'WARCMedia'. This is an informational message only. No user action is required.

Recovery is writing a checkpoint in database 'WARCMedia' (41). This is an informational message only. No user action is required.

The mirrored database "WARCMedia" is changing roles from "PRINCIPAL" to "MIRROR" due to Failover.

Database mirroring is inactive for database 'WARCMedia'. This is an informational message only. No user action is required.

...

This looks like a time out, is there any way to set the TimeOut threashold for Database mirroring or set retry intervals?


Have you implemented network monitoring? I suggest you compare the failover times to the network monitoring history.

-Matt

|||

No, but any pointes on what to monitor would be usefull.

Thanks

Ed

|||

Couple of things to consider:

1. run with the safety OFF (then you wouldn't need the witness server). This won't allow automatic failover from with a real failure or a false failure. The false failures are what you seem to be seeing now, but it will allow your system to run in a more predictable manner, while you understand what is going on.

2. Consider moving the Witness (if you MUST run with the witness) to the building with the Principal. All things being equal, (i.e. if the probability of both building having a disaster is equal); then having the Witness and Principal share a "more reliable" connection should provide better reliability. Assuming: 1. servers closer together physically is more reliable. 2. You can't put the Witness in a third, equally reliable data center (who has that kind of $$$) 3. Your business needs dictate that you can put the Witness with the Principal

3. Increase the timeout "ping" for mirroring. ALTER DATABASE db SET PARTNER TIMEOUT = xx. (see BOL) By default, it is 10 seconds.

|||

Thank you all for your help.

I modified the timout pint to 60 seconds and I've not had any problems since. I have a feeling either there is something on the network causing a delay or the server was just under too much load at one off times. This has solved 1 problem so I'll continue to look into the cause over the coming weeks.

Thanks
Ed

Automatic Failover Problem


Setup Configuration:

3 servers
- PRINCIPAL IP: 10.2.5.31 - DNS Lookup: db-server-2.mosside.choruscall.com
- MIRROR IP: 10.2.5.30 - DNS Lookup: sql-mirror.mosside.choruscall.com
- WITNESS ip: 10.2.5.32 - DNS Lookup: sql-witness.mosside.choruscall.com

Each Server is running Windows Server 2003 Enterprise Edition with SQL Server 2005 Enterprise Edition.
All server instances are enabled for remote connections(By default they are not).
All servers have the flag 1400 traceon and have been restarted.
PORT 5022 is unrestricted on network.

The server instances are connecting via certificates. Each server has an endpoint for the certificates to to connect on.

Certificate Setup Proceedure:

Principal_Host:
1. Create Master Key with Password

2. Create certificate with subject

3. Create endpoint for certificate (Listener_Port = 5022, Listener_ip = all)
to connect on for database_mirroring

4. Backup Certificate (principal_cert.cer)

5. Take backed up certificate to Mirror_Host

(Reapeat Steps 1-5 for Witness and Mirror)


Mirror_Host: Create Certificate on Mirror_Host for inbound connections from Principal:

6.(On Mirror_Host) Create Login for Principal using same password in step 1 (principal_login)

7. Create user for login just created. (principal_user)

8. Create local certificate for Principal on Mirror using certificate generated by principal.

ex: Create Certificate Principal_cert Authorization Principal_user FROM FILE='c:\principal_cert.cer'

9. (If an endpoint has been created already on the mirror)Grant connectiion to the login:

ex: Grant connect on endpoint::mirror_endpoint to principal_login

Repeat Steaps 6-9 for Principal and Witness Servers accordingly.


10. Import Database to SQL Server 2005 Principal Instance

11. Backup Database to disk with format

12. Backup Database log file to disk with format

13. Copy backups to mirror

14. Restore Database and log file with norecovery on Mirror_Host

15. Configre Database for Database Mirroring on Principal Server
There are two ways to do this. Via the wizzard or via the Transact-SQL window.
Using the wizzard appears to work since I started using FQDN.

PROBLEM:

After configuration, everythig appears to be correct. That is, the principal displays
that it is the principal and it is synchronized with the mirror. The mirror also displays that it is the
mirror and it is synchronized with the principal and it is in recovery. If I failover manually, the mirror
becomes the principal and the principal becomes the mirror (They form a quarum). If I disconnect the principal
from the network, the mirror is supposed to form a quarum with the witness and promote itself to principal status.
This is not what is happening. The witness recognizes that the principal is down and logs that info into its log file.
The Mirror attempts to contact the witness but cannot log onto the machine. The Mirror Logs the following:

Error: 1438, Severity: 16, State: 2.
The server instance Witness rejected configure request; read its error log file for more information.
The reason 1451, and state 3, can be of use for diagnostics by Microsoft.
This is a transient error hence retrying the request is likely to succeed.
Correct the cause if any and retry.


<<<<<<<MIRROR SERVER >>>>>>>>

2007-09-06 15:08:45.32 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:08:45.32 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:09:05.32 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:09:05.32 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:09:25.33 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:09:25.33 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:09:45.34 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:09:45.34 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:10:05.35 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:10:05.35 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:10:25.36 spid23s Error: 1438, Severity: 16, State: 2.

<<<<<<< WITNESS SERVER >>>>>>>>

2007-09-06 14:19:55.90 spid52 The Database Mirroring protocol transport is now listening for connections.
2007-09-06 15:07:11.64 spid24s Error: 1479, Severity: 16, State: 1.
2007-09-06 15:07:11.64 spid24s The mirroring connection to "TCP://db-server-2:5022" has timed out for database "APS_SQL_DEV" after 10 seconds without a response. Check the service and network connections.
2007-09-06 15:07:43.20 Server Error: 1474, Severity: 16, State: 1.
2007-09-06 15:07:43.20 Server Database mirroring connection error 4 '64(The specified network name is no longer available.)' for 'TCP://db-server-2:5022'.
2007-09-06 15:08:06.03 spid9s Error: 1474, Severity: 16, State: 1.
2007-09-06 15:08:06.03 spid9s Database mirroring connection error 2 'Connection attempt failed with error: '10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)'.' for 'TCP://db-server-2:5022'.

hello,

from the logs it looks like you are not using the fully-qualified domain name (fqdn) when you are establishing the mirroring sessions. this is most likely the reason for the lack of connectivity.

when all the partners are up, you can use the sys.database_mirroring views on the principal and the mirror to figure out whether they both can talk to the witness (when all three nodes are up). if all connections are up, then automatic failover should happen. in your case most probably the mirror's connection to the witness will be shown as "disconnected", but if you use the fqdns it should work.

hth,

kaloian.

|||Trying giving the fully qualified domain name and see if it works as shown below,

IPCONFIG /ALL
Concatenate the "Host Name" and "Primary DNS Suffix". If you see something like:
Host Name . . . . . . . . . . . . : A
Primary Dns Suffix . . . . . . . : corp.mycompany.com

Then the computer name is just A.corp.mycompany.com. Prefix 'TCP://' and append ':' and you then have the partner name.
On the mirror server, you would just repeat the same command, but with the principal server named :

ALTER DATABASE [AdventureWorks] SET PARTNER =
N'TCP://A.corp.mycompany.com:5022'

On the principal server, you next specify the witness server:

ALTER DATABASE [AdventureWorks] SET WITNESS =
N'TCP://W.corp.mycompany.com:5026'


|||

Thank for replying. I appreciate it very much.

I had to add the FQDN in the host file in Windows->system32->drivers->etc->host on the mirror server and principal server inatnaces and I restarted the witness server after I setup failover. Once I did all that...it worked.

Thanks Again!

Chris

Automatic Failover Problem


Setup Configuration:

3 servers
- PRINCIPAL IP: 10.2.5.31 - DNS Lookup: db-server-2.mosside.choruscall.com
- MIRROR IP: 10.2.5.30 - DNS Lookup: sql-mirror.mosside.choruscall.com
- WITNESS ip: 10.2.5.32 - DNS Lookup: sql-witness.mosside.choruscall.com

Each Server is running Windows Server 2003 Enterprise Edition with SQL Server 2005 Enterprise Edition.
All server instances are enabled for remote connections(By default they are not).
All servers have the flag 1400 traceon and have been restarted.
PORT 5022 is unrestricted on network.

The server instances are connecting via certificates. Each server has an endpoint for the certificates to to connect on.

Certificate Setup Proceedure:

Principal_Host:
1. Create Master Key with Password

2. Create certificate with subject

3. Create endpoint for certificate (Listener_Port = 5022, Listener_ip = all)
to connect on for database_mirroring

4. Backup Certificate (principal_cert.cer)

5. Take backed up certificate to Mirror_Host

(Reapeat Steps 1-5 for Witness and Mirror)


Mirror_Host: Create Certificate on Mirror_Host for inbound connections from Principal:

6.(On Mirror_Host) Create Login for Principal using same password in step 1 (principal_login)

7. Create user for login just created. (principal_user)

8. Create local certificate for Principal on Mirror using certificate generated by principal.

ex: Create Certificate Principal_cert Authorization Principal_user FROM FILE='c:\principal_cert.cer'

9. (If an endpoint has been created already on the mirror)Grant connectiion to the login:

ex: Grant connect on endpoint::mirror_endpoint to principal_login

Repeat Steaps 6-9 for Principal and Witness Servers accordingly.


10. Import Database to SQL Server 2005 Principal Instance

11. Backup Database to disk with format

12. Backup Database log file to disk with format

13. Copy backups to mirror

14. Restore Database and log file with norecovery on Mirror_Host

15. Configre Database for Database Mirroring on Principal Server
There are two ways to do this. Via the wizzard or via the Transact-SQL window.
Using the wizzard appears to work since I started using FQDN.

PROBLEM:

After configuration, everythig appears to be correct. That is, the principal displays
that it is the principal and it is synchronized with the mirror. The mirror also displays that it is the
mirror and it is synchronized with the principal and it is in recovery. If I failover manually, the mirror
becomes the principal and the principal becomes the mirror (They form a quarum). If I disconnect the principal
from the network, the mirror is supposed to form a quarum with the witness and promote itself to principal status.
This is not what is happening. The witness recognizes that the principal is down and logs that info into its log file.
The Mirror attempts to contact the witness but cannot log onto the machine. The Mirror Logs the following:

Error: 1438, Severity: 16, State: 2.
The server instance Witness rejected configure request; read its error log file for more information.
The reason 1451, and state 3, can be of use for diagnostics by Microsoft.
This is a transient error hence retrying the request is likely to succeed.
Correct the cause if any and retry.


<<<<<<<MIRROR SERVER >>>>>>>>

2007-09-06 15:08:45.32 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:08:45.32 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:09:05.32 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:09:05.32 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:09:25.33 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:09:25.33 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:09:45.34 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:09:45.34 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:10:05.35 spid23s Error: 1438, Severity: 16, State: 2.
2007-09-06 15:10:05.35 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
2007-09-06 15:10:25.36 spid23s Error: 1438, Severity: 16, State: 2.

<<<<<<< WITNESS SERVER >>>>>>>>

2007-09-06 14:19:55.90 spid52 The Database Mirroring protocol transport is now listening for connections.
2007-09-06 15:07:11.64 spid24s Error: 1479, Severity: 16, State: 1.
2007-09-06 15:07:11.64 spid24s The mirroring connection to "TCP://db-server-2:5022" has timed out for database "APS_SQL_DEV" after 10 seconds without a response. Check the service and network connections.
2007-09-06 15:07:43.20 Server Error: 1474, Severity: 16, State: 1.
2007-09-06 15:07:43.20 Server Database mirroring connection error 4 '64(The specified network name is no longer available.)' for 'TCP://db-server-2:5022'.
2007-09-06 15:08:06.03 spid9s Error: 1474, Severity: 16, State: 1.
2007-09-06 15:08:06.03 spid9s Database mirroring connection error 2 'Connection attempt failed with error: '10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)'.' for 'TCP://db-server-2:5022'.

hello,

from the logs it looks like you are not using the fully-qualified domain name (fqdn) when you are establishing the mirroring sessions. this is most likely the reason for the lack of connectivity.

when all the partners are up, you can use the sys.database_mirroring views on the principal and the mirror to figure out whether they both can talk to the witness (when all three nodes are up). if all connections are up, then automatic failover should happen. in your case most probably the mirror's connection to the witness will be shown as "disconnected", but if you use the fqdns it should work.

hth,

kaloian.

|||Trying giving the fully qualified domain name and see if it works as shown below,

IPCONFIG /ALL
Concatenate the "Host Name" and "Primary DNS Suffix". If you see something like:
Host Name . . . . . . . . . . . . : A
Primary Dns Suffix . . . . . . . : corp.mycompany.com

Then the computer name is just A.corp.mycompany.com. Prefix 'TCP://' and append ':' and you then have the partner name.
On the mirror server, you would just repeat the same command, but with the principal server named :

ALTER DATABASE [AdventureWorks] SET PARTNER =
N'TCP://A.corp.mycompany.com:5022'

On the principal server, you next specify the witness server:

ALTER DATABASE [AdventureWorks] SET WITNESS =
N'TCP://W.corp.mycompany.com:5026'


|||

Thank for replying. I appreciate it very much.

I had to add the FQDN in the host file in Windows->system32->drivers->etc->host on the mirror server and principal server inatnaces and I restarted the witness server after I setup failover. Once I did all that...it worked.

Thanks Again!

Chris

Automatic client redirect in SQL Server 2005

Will this work with replication too. I see it with a mirror database, but could it work if I was keeping two copies of the database and wanted to transparently reroute the connection to a surviving server?We will support such redirection on the publication database, when DB mirroring becomes a supported feature later.|||* If the publication DB is mirrored, client can do automatic redirection between the database on principal server and database on mirror server. BOL has a topic called "Replication and Database Mirroring" on this. As Li said, it will be supported once DB mirroring becomes supported feature.

* If you question is to do automatic redirection between publication DB and subscription DB, the answer is NO for SQL 2005.