Dogsled Replication Protocol Provides Active Directory Authentication Services When a WAN Isn't Available
A senior system engineer needed to provide Active Directory (AD) authentication services to a research facility on top of a glacier. It had no phone lines and no Internet or network connectivity of any type. Here's the creative solution he came up with.
May 12, 2010
I work for a small IT contractor for the federal government. My primary role is to provide Active Directory (AD) design and support for a national federal agency with about 500 discrete locations across the United States. One day, while sitting in my office a few years ago, I got a call from one of my remote site administrators in Northern Alaska, who had an interesting problem. The administrative office where he worked had just opened a new research facility on top of a glacier. It would be open for eight months. During this period, anywhere from 20 to 50 scientists would be working there full time for one or more weeks at a time. The problem was that the scientists were visiting from various locations and had some specific work requirements. They all needed to share data on the two Windows Server 2003 servers in the research facility, and they all needed to share data on the Windows 2003 servers at their home offices. We were asked to come up with a way for them to use the same logon information for the servers in their home offices and in the glacier research facility.
This sounded like a job for AD, so we quickly wrote up a plan. We would join their servers to our existing AD forest, create a glacier researcher security group, and add the scientists to it. We would then make this group a member of the local groups at the research facility so we could assign appropriate permissions on the server shares. We could even use a Group Policy Object (GPO) to apply specific research-site settings by taking advantage of the GPO's loopback processing and logon-script capabilities. We only needed to decide if the scientists would log on to a local domain controller (DC) or if the network connection was stable and fast enough to allow them to log on to a remote DC.
Although the scientists were all highly trusted and no financial or personally identifiable information (PII) data was accessible through AD at the time, using a local DC wasn't recommended due to a lack of physical security. So, I called the remote site administrator back and asked him about the WAN.
His answer: "There is none"
"What do you mean there is none?" I asked.
His response was rather surprising. It seems that the research facility was about 200 miles away from the administrative office and didn't have any type of infrastructure going into it. Power was provided by a diesel generator at the site, there were no phone lines, there was no water, and worst of all, there was no Internet or network connectivity of any type.
They had discussed using a wireless bridge, but it turned out that running repeaters up and over two mountains without power wasn't feasible. They had also discussed using satellite connections, but the research facility was located too far north and the providers at that time were below the horizon. Their only contact with the rest of civilization was a weekly supply run, when one member of the team would make a 200-mile trek into town for groceries and other necessities every Thursday.
At this point, we had a solution for their logon issues but no way to extend the domain up there. I jokingly suggested that we create a new replication protocol called "replication over dogsled" (to go with replication over IP and replication over SMTP). We all laughed, but then I thought about it—it would work.
The solution I came up with would provide domain logon services and anything else we needed to the scientists. It also would provide replication of a timely enough nature so that a password change made either at a scientist's home office or the glacier site would be available at the other location. Group changes would also be available at all sites, and we would avoid the problems of lingering objects on DCs (which happens when a DC doesn't replicate during the forest-set tombstone lifetime.) Any object that's deleted on a DC would get stored for the tombstone lifetime, then permanently deleted. If a DC didn't replicate with the rest of the forest for longer than the tombstone lifetime, any deleted object that's present in the replicating DC would be seen as a new object and would be brought back from the grave. My colleagues and I refer to these DCs that don't replicate past the tombstone lifetime as "tombstoned DCs."
The Proposed Solution and Its Benefits
The solution was simple enough so that we wouldn't need any type of IT support beyond setting it up. The solution would work like this:
We would build a new site (called the glacier site) for the glacier research facility and a site-link that connected it to the administrative office 200 miles away. We would place two DCs in the glacier site. The primary DC would be built on standard server hardware that plugged into an uninterruptible power supply (UPS) system connected to the generator. The second DC would be built on a laptop with two network cards: the laptop's internal card and a PC card. The internal card would be configured with the IP address for the glacier site's network. The PC card would be configured with the IP address for the administrative office's network. The laptop would also be set as the preferred bridgehead for the glacier site. This would ensure that it would be the only DC at the glacier site that would replicate with the administrative office.
The site-link replication schedule would be set so that replication would only be allowed during a certain period. During most of the week, the laptop would remain at the glacier site and replicate with the primary DC there. On Thursday, it would travel to the administrative office on the weekly supply run. The laptop DC would stay at that site until Sunday morning. During this time, it would replicate with the administrative office's server. Later on Sunday when the laptop DC returned to the glacier site, it would begin replicating with the glacier site's primary DC. The laptop's power management settings would be configured so that it would go into standby mode when unplugged, then wake up when plugged in. This would prevent the laptop from shutting down unexpectedly and eliminate the need for somebody to log on and power it off.
Every Friday while the laptop was connected to the administrative office's network, I would verify replication by running the command
repadmin /showrepl
at the main office where I work. By checking the laptop's last replication time with both the administrative office's DC and the glacier site's primary DC, I'd be able to verify that replication was occurring in both places. If either of the replication times was off by more than a few days, we could conclude that either the wrong network card got plugged in or the server didn't get powered on, and work with the IT people from there.
I presented this idea to the rest of my team and to the remote site administrator, pointing out its benefits:
The scientists could use the same logon credentials from their home offices and from the research facility.
Permissions for the research facility's servers could be managed via AD groups, greatly simplifying resource management. The administrative office and the scientists' home offices were already members of our AD domain, so these servers already had the necessary permissions. Adding the glacier site's servers to the domain would require a bit of front-end work, but once that would be completed, we could use the existing AD groups to manage permissions at all the sites where the scientists worked.
Changes made at the administrative office or anywhere else in the AD domain would replicate to the glacier site within one week, keeping things more or less current. This would include changes to the scientists' user accounts as well as required policy or configuration changes (e.g., locking the computer after inactivity) made by security or desktop administrators at other sites.
Changes made at the glacier site would replicate back to the rest of the domain within one week, keeping the entire domain relatively up-to-date.
Scientists who changed their password (or any other AD attribute) at either the research facility or their home office would find that change at the other location the following week. Because the scientists were either at the research facility or their home location for a week at a time, the one-week lag time between updates would be acceptable.
The research facility's primary DC and laptop DC wouldn't become tombstoned due to replication failures for longer than the tombstone lifetime of 60 days. The primary DC was kept current with changes by the laptop DC from Sunday night until Thursday, and the laptop DC was current with the rest of the domain by the weekend.
Site-based GPO settings could be enforced at the glacier site.
Everybody agreed to this solution, which was dubbed the "dogsled replication protocol." We built it over the following week. The only change made was the decision to avoid using actual dogsleds, replacing them with a 4X4 SUV that was able to travel the path up to the top of the glacier.
We were able to maintain this solution for five months until the SUV was involved in an accident that destroyed the laptop DC. At that point, the decision was made to close the facility for the winter.
The Possible Applications and Risks
The dogsled replication protocol can be adopted for use in other scenarios. A couple of scenarios that come to mind are as follows:
An office needs to be set up for a short time and WAN links aren't available or aren't feasible. This could include offices for summer camps, retreats, or emergency work locations.
An office is suffering from a long-term WAN outage caused by damaged wireless transmitters, satellite receivers, or cable connections. The damage might be caused by the environment (e.g., storms, earthquakes) or even people. In one case, I used the dogsled replication for a few months at a site where a mile of copper wire was stolen and the site wasn't able to replace it for several months.
There are some security risks associated with this solution. The biggest risk concerns the physical security of both the onsite DC and the laptop DC. The onsite DC can be somewhat physically secured, depending on the specific situation. However, if someone gains physical access to the onsite DC, he or she has logical access to your entire AD database. This is the same security risk that Microsoft documents for any branch-office scenario.
The laptop DC poses a much higher security risk. The people transporting this DC must be trusted individuals, as they'll have unsupervised physical access to the server for prolonged periods. There's also the risk of theft or damage to the laptop DC while it's in transit.
When we implemented this solution, we felt that these risks were limited because of population factors (nobody lived near the glacier or on the road to the administrative office) and the type of data stored in AD. AD didn't touch any of our financial or messaging systems at that time and didn't include any PII data. It provided authentication and access to research-based file shares only.
With the security improvements in Windows Server 2008, there are a couple of actions you can take to mitigate the security risks associated with the onsite DC. You can set up the onsite DC as a read-only domain controller (RODC), caching only the passwords of the branch office's users. This significantly decreases the risk of the onsite DC being compromised. The downside is that changes (including password changes) can't be made on the RODC.
You can't set up the laptop DC as an RODC because it must be a writeable DC—that's the only way to replicate changes between two locations. However, you can use Windows BitLocker Drive Encryption to protect the laptop DC.
For example, had BitLocker been available when we used the laptop DC in Alaska, we could've set up BitLocker as follows. We could've stored the startup encryption key on two USB flash drives. One USB flash drive could've remained in a locked drawer or safe at the glacier site, while the other one could've remained in a locked drawer or safe at the administrative office. We could've then scheduled a task to shutdown the laptop DC at 5 a.m. every Thursday and at 5 a.m. every Sunday so that it would be powered off during transport. When the laptop DC arrived at the research facility or administrative office, someone could've started it by inserting the USB flash drive, plugging in the network cable, and pressing the power button. Using BitLocker in this way would have protected the laptop's information from being stolen, compromised, or used while in transit.
A Viable Solution
By using Windows 2003 and some creative replication settings, we were able to provide AD authentication services to a large number of people over a long period of time in a location where a WAN connection wasn't available. Although there were some security risks, these risks can now be lessened with Server 2008 and BitLocker. Thus, if you find yourself in a situation where WAN isn't available or feasible, you might consider trying the dogsled replication protocol.
About the Author
You May Also Like