I had a wonderful night this week, I thought I would stop by my office and pick up a hard drive on my way home from dinner.
When I arrived at work (about 10:00pm) I logged into my desktop and glanced at my email. I had recieved one from myself about a week earlier about moving Exchange Server 2003 data to another hard drive.
I thought that since it is 10:00pm, I could hurry and do the data move and be on my way. All the servers in this particular environment are running as Virtual Machines. I started by backing up the Exchange VM by copying the VHD file.
I figured that I could just copy it back into place if there were any problems and deal with the data move later.
So after backing up the exchange Server VHD, I booted the server and started the Exchange Data Move. After I got done moving the databases, I opened up my Outlook
And noticed an error. I could not POP3 my email anymore. I logged back into the Exchange Server and check the event logs, and the services to see why I couldn’t POP3 my email. I verified that I could use the Outlook Web Access. I could even send
email using outlook, just not POP3. By now it was almost 11:00pm so I decided to just restore my backup and attempt the data move on a later date.
I shutdown the exchange server and copied the old VHD into place. I turned on the exchange server and noticed that it took a lot longer to boot than normal. I tried to remote into the machine
and I got an error saying that ‘The RPC server is too busy to complete this operation. Please try again or consult your system administrator.’. I thought that was bad so I logged in through the console.
Once I got logged in, I got a popup saying that a service failed to start. I opened the event log and my heart dropped. I saw more event logs than I expected. I didn’t realize that the Exchange Server was also a Domain Controller.
I looked in the Directory Service Event log and saw 2 errors, NTDS Replication Event 2095 and NTDS Replication Event 2103.
[code lang=”text”]
Event Type: Error
Event Source: NTDS Replication
Event Category: Replication
Event ID: 2095
Date: 2/10/2010
Time: 1:16:27 AM
User: NT AUTHORITYANONYMOUS LOGON
Computer: EMAIL-Server
Description:
During an Active Directory replication request, the local domain controller (DC) identified a remote DC which has received replication data from the local DC using already-acknowledged USN tracking numbers.
Because the remote DC believes it is has a more up-to-date Active Directory database than the local DC, the remote DC will not apply future changes to its copy of the Active Directory database or replicate them to its direct and transitive replication partners that originate from this local DC.
If not resolved immediately, this scenario will result in inconsistencies in the Active Directory databases of this source DC and one or more direct and transitive replication partners. Specifically the consistency of users, computers and trust relationships, their passwords, security groups, security group memberships and other Active Directory configuration data may vary, affecting the ability to log on, find objects of interest and perform other critical operations.
To determine if this misconfiguration exists, query this event ID using http://support.microsoft.com or contact your Microsoft product support.
The most probable cause of this situation is the improper restore of Active Directory on the local domain controller.
User Actions:
If this situation occurred because of an improper or unintended restore, forcibly demote the DC.
Remote DC:
0bbec9e0-b956-4120-b347-ee82bad53363
Partition:
DC=MyDomain,DC=com
USN reported by Remote DC:
192575
USN reported by Local DC:
188453
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
[/code]
[code lang=”text”]
Event Type: Error
Event Source: NTDS General
Event Category: Service Control
Event ID: 2103
Date: 2/10/2010
Time: 1:16:27 AM
User: NT AUTHORITYANONYMOUS LOGON
Computer: EMAIL-Server
Description:
The Active Directory database has been restored using an unsupported restoration procedure.
Active Directory will be unable to log on users while this condition persists. As a result, the Net Logon service has paused.
User Action
See previous event logs for details.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
[/code]
I instantly shutdown both the exchange server and my primary (And apperantly not only) Domain COntroller. I made a backup copy of each.
I started googling the error. Now it was 1:20am.
I came accross a couple of promising links. After about 10 minutes, I came up with an action plan.
- 1. Shutdown DC 1
- 2. Demote DC 2 (DC showing the error) using ‘dcpromo.exe /forceremoval’
- 3. Shutdown DC 2
- 4. Boot DC 1
- 5. Sieze all roles that were owned by DC 2
- 6. Boot DC 2, then Promote it.
- 7. Be happy because it is all over.
Since I had backups of both of the VMs, I decided to follow my plan. I hit my big problem on step 2. I kept getting wierd errors during the demotion process. After some more googling, I found out that
installing Exchange Server on a Domain Controller is a Stupid, Stupid, Stupid Idea. Once you install, you can never uninstall, and that situation is not supported by Microsoft.
After I read that last tidbit I got extremely worried. I did NOT want to open a ticket with Microsoft to fix my stupid person problem.
I sat and worried for a minute and thought through the whole situation. Why am I trying to demote the server with exchange. If this was an environment where the other domain controller (DC 1) had died, I should
just be able to sieze the roles with the other Domain Controller (DC 2) and keep going. The only problem that I could see if I proceeded to demote the healthy domain controller is that the Exchange Server (DC 2) already knew
it had an unsupported restore. There had to be a way to fix that.
After my next session googling, I had a new plan:
- 1. Shutdown DC 2 (Exchange Server).
- 2. Demote DC 1 using ‘dcpromo.exe /forceremoval’.
- 3. Shutdown DC 1.
- 4. Boot DC 2.
- 5. Make DC 2 think that it has a writable copy of the Active Directory Database. (Only required if You are keeping the DC that detected Database Discrepancy.)
- 6. Remove AD Remenants using ntdsutil.
- 7. Sieze all FSMO roles that were owned by DC 1 (In this case all of them).
- 8. Boot DC 1, then Promote it.
- 9. Re-add all FSMO roles that you would like DC1 to have.
- 10. Be happy because it is all over.
- 11. Next morning, fix all the things that you forgot.
Step 1 – Shutdown DC2 (Exchange Server)
This one is pretty easy. In fact if you can’t do it. You better open that ticket with Microsoft right now.
Click Start->Turn Off. Select Shutdown and type in a reason.
Step 2 – Demote DC 1 using ‘dcpromo.exe /forceremoval’
Step 2 isn’t very hard, it just freaked me out a little because I was doing it at 2:25 am.
Click start->run. Type dcpromo.exe /forceremoval. Follow the prompts.
Step 3 – Shutdown DC 1.
Once the Domain Controller has rebooted after it’s demotion, follow the instructions in step 1.
Step 4 – Boot DC 2.
Step 4 can vary depending on your environment. It could be pressing the power button on the front of the server,
or in my case, Clicking the ‘Turn On’ link in the Virtual Server 2005 UI.
Step 5 – Make DC 2 think that it has a writable copy of the Active Directory Database.
Log into DC2 (the Exchange Server). open the registry editor (Start->Run and type ‘regedit’).
Remove the value ‘HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesNTDSParameters “Dsa Not Writable”=dword:00000004’
Do not just delete the 4, Delete the whole value.
Step 6 – Remove AD Remenants using ntdsutil.
I found the help for this one on MSDN.
Open a command prompt on DC2 (Exchange Server).
Type ntdsutil and press enter.
Type metadata cleanup and press enter.
Type Connections and press enter.
Type connect to server server and press enter.
Type quit and press enter.
Now you should be on the metadata cleanup menu.
Type select operation target and press enter.
Type list domains and press enter.
Type select domain number and press enter.
Type list sites and press enter.
Type select site number and press enter.
Type list servers in site and press enter.
Type select server number and press enter.
Type quit and press enter.
Now you should be on the metadata cleanup menu.
Type remove selected server and press enter.
You may have to click ok at some dialog boxes confirming that the current domain controller will need to assume roles that have no owner.
Type quit and press enter until you disconect.
Step 6.5 – Clean DNS (If you are using windows DNS)
Open DNS MMC, Delete the A record for DC1.
Delete the CNAME record under the _msdcs container.
If DC1 was a DNS Server remove the Name Server reference by Right-clicking on the Forward Lookup Zone, selecting Properties, and removing the server from the Name Servers tab.
For good measure, I expanded every section in the DNS MMC and removed every reference to DC1.
Step 7 – Sieze all roles that were owned by DC 1 (In this case all of them).
Article on MSDN.
Step 8 – Boot DC 1, then Promote it
For Step 8, I was working with Virtual Machines, so instead of just repromoting DC1, I just built a new VM from Scratch.
I copied one of my Base-Windows2003 VHDs, ran NewSid. And ran dcpromo.exe.
Before I could Add the new DC, I needed to log into DC2 and run:
repadmin /options -disable_inbound_repl
repadmin /options +disable_inbound_repl
repadmin /options -disable_outbound_repl
repadmin /options +disable_outbound_repl
Step 9 – Re-add all FSMO roles that you would like DC1 to have.
Step 10 – Be happy because it is all over.
It was 4:00am when I was finished with my excursion. Hopefully yours doesn’t take so long since I have offered you this guide.
Step 11 – Next morning, fix all the things that you forgot.
Step 10 should really be Step 7.5, but I didn’t know what I forgot to do until the next morning.
Not only reinstall DNS on DC1, but make sure that the Forwarders are configured properly.
(Open DNS MMC, Right-Click on ServerName, Properties, Forwarders Tab.)
Install any shared printers, or enable any file shares that previously existed (Preferably with the same names so you don’t have to change any client computers.)
Well that is the story of my NTDS Replication ID 2095. Hopfully your will go by alittle bit smoother because of this.
Alot of my help came from exchangeserverpro.com.