Quantcast
Channel: Ask Premier Field Engineering (PFE) Platforms
Viewing all 501 articles
Browse latest View live

The Startup Script is Dead

$
0
0

Get ready to have an opinion! Smile Matthew Reynolds (https://twitter.com/MatthewMWR) here with my personal advice (not announcing any product changes) about which configuration vectors are working well and which are not—for today and tomorrow’s enterprises.

Any of these sound familiar?

· “I need to run <insert IT task> on every PC in the organization.”

· “Our startup scripts don’t run reliably”

· “People are complaining about slow logons”

We’ve all faced these scenarios right? In IT we need to run jobs on managed machines—whether to set a registry value, update files, fix a management agent or <insert automation challenge of the week>.

Boot & logon (particularly on client machines, but increasingly even for servers) have become an IT dead zone where automation and configuration vectors fall apart. Meanwhile people are stuck waiting for slow starting machines while traditional IT tasks (whether Foreground Group Policy, auto-start services, boot triggered scheduled tasks, etc.) time-out and fail in the background. No one wins.

image

How did boot and logon become unreliable triggers for configuration and automation?

Echoing the ghost of autoexec.bat—old school IT folks like me have historically used boot triggered vectors like startup scripts, logon scripts, auto-start services, et al. for jobs that had nothing to do with powering up the machine. Instead we were using startup & logon as a proxy for “do this task periodically.” Modern Windows devices, however, are more likely to sleep or hibernate between user interactions. Traditional boot and logon have become infrequent and irregularly timed for many.

Worse still is the state of corporate network connectivity during startup and logon. Unlike our 1990s open Ethernet LAN where the DC was under a friend’s desk—modern devices are likely to be on a home office, coffee shop, conference center, or airport network during startup and logon. Even in the office or datacenter good connectivity to DCs and other corp resources during boot is no sure bet. Network virtualization, cloud hosting, VLAN switching, 802.1x, IPSec, host based firewall initialization, spanning tree calculation, media sense, duplex negotiation, etc. often result in poor connectivity during startup.

In the past we’ve sometimes encouraged people to turn knobs on the OS like “Always wait for the network…” or “ExpectedDialupDelay” in attempts to resuscitate this zombie. In many cases these just result in ever longer timeouts and productivity delays.

IT culture will have to adapt to the new normal. Moving forward the most productive configuration vectors will be those which are not tied to startup.

Some good alternatives to boot triggered IT automation and configuration

For installing server roles, setting OS configuration, etc. consider Desired State Configuration (DSC)

http://www.microsoftvirtualacademy.com/training-courses/getting-started-with-powershell-desired-state-configuration-dsc-

For the unfamiliar, DSC is like a DevOps version of Group Policy. It allows you to define your desired end state in an easily distributable standards based document, and then say “make it so” to the DSC resources included in or added to the OS. DSC can be used standalone or as an aspect of orchestration frameworks such as Chef (http://redmondmag.com/articles/2015/04/10/devops-automation-and-container-support.aspx).

DSC is Microsoft’s strategic direction for configuration management. Currently DSC culture is Server/cloud/DevOps centric, but I expect usage will grow to encompass more traditional on-prem Server usage. I also expect it will be integrated into many management frameworks, and even become a factor in client device configuration management.

For mapping drives, settings registry values, etc. consider Group Policy Preferences

GPP is old news, but can’t be ignored in this discussion. For many client-centric startup and logon script scenarios there are alternatives like GPP: Registry, GPP: Environment, and so on. The rarely discussed advantage of GPP is that (for the most part) it doesn’t have to block your logon waiting for the network during boot. GPP items can run in the background during the day when the machine is stable and online. Starting with Windows 8.1 even GPP: Drive Maps can run asynchronously (not logon blocking).

For things that traditional Group Policy is good at and don’t require boot time processing Group Policy is still great!

Administrative Templates OS and app settings, User Rights Assignment, Security Options, Audit Policy, et al. on domain joined machines remain great uses of Group Policy. As of today Group Policy covers many areas that DSC does not, and vice-versa. Overlap may increase over the years, but for scenarios that work well there is no need to abandon Group Policy.

There are tasks for which Group Policy was never a good fit (like Software Installation), and there are scenarios which have aged poorly (Startup Scripts), so consider migrating away from these.

As you migrate away from boot/logon time dependencies from Group Policy, also consider retiring settings like “Always wait for the network” and “ExpectedDialupDelay” so GP can happily process in the background without blocking users from doing their jobs.

For installing software, updating antivirus, deploying OSes, etc. use existing management frameworks for jobs they are good at

Robust frameworks exist inside and outside Microsoft for installing software, updating antivirus, deploying OSes with task sequences, etc. In the spirit of http://blogs.technet.com/b/fdcc/archive/2010/10/06/sticking-with-well-known-and-proven-solutions.aspx evaluate your startup scripts (and similar tasks) to see if another good enterprise solution exists already.

Of course as IT folks we often have to be the glue that fills the gaps. Some scripts should remain scripts, but how should we launch them outside the IT dead zone at startup?

For scripts and everything else: Scheduled Tasks can be your secret weapon

Windows scheduled tasks are a flexible and robust way to do IT automation—particularly for one-off cases where a broader framework would be overkill. Scheduled tasks give you fine grained control over scheduling, retries, security context, and conditions (like network connectivity). Task Scheduler is now a core component of Windows relied upon by Plug and Play, Group Policy, Diagnostics, PowerShell jobs, and more. Google relies on Task Scheduler to keep Chrome up to date. Unlike startup and logon scripts scheduled tasks always run independently of your logon critical path. Even if a logon triggered task takes a long time to run it will not block your logon.

To create Scheduled Tasks at scale, you can use one-time fan out approaches (suitable for modest number of reachable online servers) like PowerShell’s New-ScheduledTask. For broader reach you can also use pull based approaches like GPP: Scheduled Tasks.

With GPP: Scheduled Tasks you get all the goodness of Group Policy (e.g., scoping, layering, etc.), Group Policy Preferences (e.g., item level targeting), and the Windows Task Scheduler.

Need to run a script on thousands of machines, but not more than once a month—and it shouldn’t start until the network is online-- and the script needs to do system level stuff without the end-user being a local admin—and the script should never block the user from logging in? GPP Scheduled Tasks can do that.

Creating a GPP Scheduled task

Creating a GPP scheduled task is very similar to creating any ordinary scheduled task through the Task Scheduler user interface.

image

Naming the task and setting general parameters

image

 

Configuring one or more triggers

image

 

Configuring one or more actions

image

Configuring other conditions and settings

A few gotchas to be aware of with GPP: Scheduled Tasks

1) When configuring a GPP scheduled task as part of Computer Configuration, you’ll need to decide what security context it will run under. Typically in the machine context you would use a system account such as one of the following. As always use the account with the least privileges needed to do the task

a. NT AUTHORITY\LOCAL SERVICE

b. NT AUTHORITY\NETWORK SERVICE

c. NT AUTHORITY\SYSTEM

2) Sometimes the user account field doesn’t seem to save correctly when first creating the GPP Scheduled Task item. Try re-opening the item in the editor and fixing the user account information (if needed) after initial creation

3) Sometimes people hit this scenario (seems more likely with older versions of GP editor) where “NT AUTHORITY\SYSTEM” gets saved in the task definition as a group which causes the task item not to be applied. I have not encountered this but be aware: http://trentent.blogspot.com/2014/10/group-policy-preferences-scheduled-task.html

4) Under Conditions> Start only if the following network connection is available the only option in the group policy editor is “any connection”. If you need to configure more specific conditions such as connectivity to a specific corporate network you can use New-ScheduledTask or schtasks.exe instead of GPP: Scheduled Tasks. You might still use GPP: Scheduled Tasks to trigger the launch of a script that creates other more complex tasks.

a. Coincidentally I use this pattern frequently… deploy a simple GPP: Scheduled Task as a scale-out bootstrap more complex actions (which may involve creating additional scheduled tasks)

A few gotchas for scheduled tasks and IT automation in general

1) As with any management tool-- you can easily do more harm than good here. Don’t create expensive tasks or tasks that run frequently.

2) As with any IT automation— take care to not accidentally create an escalation of privilege vector for non-admin end users. If your task will run as a privileged account (typical for system administration tasks), the task must not run any code that could be modified by non-admins such as a script stored in a non-secure location.

3) For tips on managing PowerShell version levels, execution policies, etc. for an enterprise (helpful when launching PowerShell scripts broadly (e.g., using scheduled tasks) see Ashley McGlone’s recent post http://blogs.technet.com/b/ashleymcglone/archive/2015/04/22/pshsummit-managing-powershell-in-the-enterprise-using-group-policy.aspx

May your old Startup and Logon scripts rest in peace.

-Matthew “start me up” Reynolds

https://twitter.com/MatthewMWR


How to Setup a Password Expiration Notification Email Solution

$
0
0

"Hello World!" 

Hi there! Mike Kullish, here. I'm a PFE based out of Minneapolis, MN with a focus on AD, Hyper-V and DFS but I try to help customers with anything on the Windows Desktop and/or Server platforms. I have been with Microsoft for nearly three years and this is my first blog post. 

Have you ever had a need to configure notifications for user's password expirations but found that existing solutions didn't quite fit the bill? We all know you can use built-in solutions with Windows and Active Directory/Group Policy but this requires users to interactively log-on to a network-based computer. What about those BYOD or mobile users or users of web apps/email? More often than not, these users will have to call the helpdesk because they had no idea their domain passwords were going to expire. Statistics show that some of the most common calls to the helpdesk are password-related and implementing a process like the one covered here could really make a dent in your helpdesk call volume and costs.

Recently, a customer asked for some help implementing a solution for this issue based on a script they'd found on the Microsoft TechNet Script Center.  The script queries the pwdLastSet attribute of user accounts in AD and the MaxPwdAge property within the domain, then does some time computations and sends an email to those users who are near a password expiration 'event.'

                  

 

I thought it would make a helpful blog post to cover some of the details and considerations when implementing a solution like this. The particular script my customer found was the work of Microsoft MVP Robert Pearman and he deserves the Kudos for initially putting it together, as well as several refinements to it (including support for Fine Grained Password Policies).

DISCLAIMER:

  • PFEs don't normally provide code beyond sample or "proof of concept" code
  • The code we discuss here is an additional layer beyond code from a PFE; it is code from a Microsoft MVP resource
  • As with ANY code, you should always test/validate its behavior in an isolated lab
  • Note the script author has validated the code via his own testing on Windows Server 2008 R2.  Did I mention you should test/validate the code?  
  • If/when you're ready to deploy it to production, you should employ a solid change control process and a controlled release. This code can possibly generate emails to 100,000s of users.

You can download the script from the following link. (https://gallery.technet.microsoft.com/Password-Expiry-Email-177c3e27)  

Click on the blue box and save the file to a workstation or member server. Obviously, a DC would work but likely isn't the best choice. The workstation or member server needs the RSAT tools for Active Directory installed. If you already have an "admin server" system where you have existing scripts, tools, Scheduled Tasks, etc., that would be a logical place for this.

Once you have downloaded the script:

  1. Place the file in a directory on your admin server. (For this example, I will use C:\scripts)
  2. Edit the following portions of the script as applicable using Notepad or PowerShell ISE
    • $smtpServer="mail.domain.com"
      • This will be the name of your SMTP server. The admin server machine will need to be able to send using SMTP – you will likely need to work with your email team to get that process working
    • $expireindays = 21
      • This is the number of days prior to password expiration that you want to notify users. The actual number of days remaining before expiration will be displayed in the email notification.
    • $from = "Company Administrator <support@mycompany.com>"
      • This field can be modified to be sent from a valid email account within your environment. Consideration should be given to this address in order to prevent the perception of a phishing email as well as how replies will be handled.
    • $logging = "Enabled" # Set to Disabled to Disable Logging
      • Logging is recommended to ensure that you can trace any errors that might occur
    • $logFile = "C:\scripts\PasswordExpiration\pwdexp.csv"
      • This field should be changed to a desired location on the local system or network share as desired.
    • $testing = "Enabled"
      • Set to Disabled to email users (configuring this to Enabled, runs a check against all accounts and sends emails ONLY the account specified in the $testRecipient field below.)
        • Configuring this to disabledactually sends emails to the users that will have their passwords expire in the configured amount of time.
        • Understand this - you risk sending out a mass-email to 10s, 100s or 10,000s of users.
        • This is automation – with great power, comes great responsibility
    • $testRecipient = "someone@company.com"
      • Specifies the test user account that will receive the test email. CAUTION: This account will receive 1 email for every user the script identifies.
  3. Save the script once you are done editing it.
  4. Now you can test the script by running it in a lab.
    1. You may need to modify the execution policy for PowerShell scripts on your admin server machine.
  5. You should get an email that looks something like this:

    From: someone@company.com [mailto:someone@company.com]
    Sent: Thursday, March 23, 2015 12:52 PM
    To: Someone@company.com
    Subject: Your Windows password will expire in 4 days.
    Importance: High

    Dear someone,

    Your corporate network password will expire in 4 days.


    To change your password on a PC press CTRL-ALT-Delete and chose "Change Password."

  6. It is important to ensure that you change the section of the script under $body.  The message should be modified to ensure that user's don't accidentally delete the email because they suspect it is spam or a phishing email. Good inter-team collaboration and communication about this "password expiration notification process" cannot be emphasized enough.
    1. Work with your helpdesk and security teams to ensure everyone signs off on this effort and approves the specific text and additional information for the email, including how to manage a 'reply' to that email address
  7. When it all is working as desired/expected, you can disable testing:
    1. $testing = "Disabled"  

Now, at some pre-determined time, you or one of your staff can execute the script to generate the 'password expiry notification email' to the affected users.   

For those who don't want to manually run the script, it's a simple process to create a Scheduled Task to run the script automatically.

There are numerous other ways to address this need; I have talked to many people who have developed their own processes, scripts and/or code for this. This particular process was pretty easy to implement and I was able to work with my customer to get the whole thing working in a short amount of time.  

Thanks to Hilde and all the other PFE bloggers here for helping me "dip a toe" in the blog-pond (or pool?) and a special thanks to Microsoft MVP Robert Pearman who provided some insight and details around his script.

See you all next time!

Mike "CANNONBALL!" Kullish

Windows To Go at Microsoft Ignite 2015

$
0
0

Hey y’all, Mark and Yong Rhee here. We hope everyone enjoyed our session. We’ll post the recording when it becomes available. As promised we have a slew of bonus content for you. You can take a look at it here.

Thanks for watching.

Mark ‘full of stage fright’ Morowczynski and Yong “demo demon” Rhee

How to find expensive, inefficient and long running LDAP queries in Active Directory

$
0
0

Hey y’all, Mark back again. I’d like to say in my best TV show announcer voice, we have a real treat for you today. Have you ever wondered what clients were sending expensive or inefficient LDAP queries to your domain controllers? Are long running LDAP queries possibly leading to poor server application performance or even failures of these applications? What about which clients are sending an excessive amount of LDAP queries to domain controllers? Are these queries leading to high CPU utilization on your DCs? Are these queries even completing or are they timing out in some cases?

Have you suspected all of the above might be happening but had no easy way to identify such queries or the IP addresses sending them? Today with some help from PowerShell we will finally have that easy way you’ve been looking for.

 

Collecting the data

Analyzing the data

Driving a resolution

 

Collecting the data

First we need to ensure our DCs are capturing the enhanced 1644 event metadata. To enable this you need to do the following.

-Have a Server 2012 R2 DC or have KB 2800945 installed on Server 2012, Server 2008 R2 or Server 2008 domain controllers.

- Configure registry keys for Field Engineering to 5 (HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NTDS\Diagnostics\15 Field Engineering)

image

Note: Field Engineering diagnostic logging is not enabled by default and it should only be enabled when actively troubleshooting. Logging level 5 will cause numerous events other than the 1644 event to be captured in your directory services event log. You’ll want to turn this setting on when actively troubleshooting LDAP queries and then turn set the logging level back to 0 when you are done. NO reboot is required to turn this setting on or off so really you have no excuse.

Next, configure the values for the registry-based filters for expensive, inefficient and long running searches. If the following registry entries exist, change the values to the desired threshold in milliseconds. If the registry entries do not exist, create a new entry with that name, and then set its value to the desired threshold.

Registry Path

Data Type

Default value

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters\Expensive Search Results Threshold

DWORD

10,000

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters\Inefficient Search Results Threshold

DWORD

1,000

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters\Search Time Threshold (msecs)

DWORD

30,000

Our current thinking is to set the values for “expensive” and “inefficient” both to a value of 0 then start with a Search time Threshold setting of 100 Milliseconds (100 decimal / 64 hex).

After the registry values are set on the DCs you want to analyze, you should start seeing 1644 events logged in the Directory Services log. After you’ve collected enough data (say 30 minutes worth during peak hours, when queries are slow to execute or when the CPU is running hot), go ahead and export the Directory Services Log. Then you’ll want to grab the 1644 Reader PowerShell script from the TechNet scripting library and copy it to a PowerShell capable computer that also has MSFT Excel on it

Right click on the PS script in Explorer and choose “run with powershell”.

The script will allow you to input a path containing the .EVT / .evtx files or process all.evt[x] files in the same directory as the script by just hitting enter. You can also have add directory service event logs from more than one domain controller and the script will report on ldap query metrics that clients submitted across those DCs.

 

image

The Event1644-reader stores its initial output as a .csv file before generating the Excel XLS file. If you want to keep appending newer logs to this same report we have a time saving tip for you. Remove the old evtx files but keep the .csv file. The script will enumerate all evtx first, then any csv to import.

Analyzing the data


Open the XLS script created above

Click “enable content”

image

Now that we have our data I’m going to do a quick walk through of the data displayed in 8 tabs of the Microsoft Excel 2010 and later built by the Event 1644 reader PS script.

Tab 1-RawData

image

This is the data straight from the 1644 events log separated by column. The name of the DCs that serviced each LDAP query is captured in column A labeled “LDAP server” LDAP queries are captured in Column F labeled “Filter”. The data filter allows you to isolate specific queries like those from client X to DC Y issuing query Z.

Tab 2- TopIP_StartingNode

image

The TopIP Starting node tab shows you which directory partition is being queried the most based on Search Count and Average Search Time. The %GrandTotal shows you the overall percentage that each partition is being. In the example above, 99% of LDAP queries in this report targeted the configuration NC, making it a good 1stcandidate for investigation. Many cells in the XLS built by the 1644 reader PS feature a “click-through” capability where clicking on given cell displays the underlying “metadata” for the value being displayed. For example, clicking on the partition name in starting node grouping displays the top filters (queries) targeting that partition. Clicking through again displays the IP addresses that issued that query. . Clicking again shows the date and time those queries were issued, allowing you to answer the questions “What”, “who” and “when” clients generated a specific workload.

Also note in the above screenshot the LDAPServer Filter is (All). You can use this drop down to select a specific DC if more than one log was in the directory.

Focusing based on LDAP query count

Tab 3- TopIP

image

This shows you the IP addresses that generated the most queries in descending order. The %GrandTotal is the overall percentage total from this specific client. In the example above 10.92% of all queries came from client #1 (row 4). %Running Total is the percentage if you were adding up the clients. In the example above, the top 9 callers generated 61+% of all LDAP queries by volume.

image

You are also able to expand each client IP to see what query they were sending.

Tab 4- TopIP-Filters

image

This tab is really the inverse of tab 3. It is showing what query was sent the Search Count in descending volume order. The %GrandTotal and %RunningTotal work the same way as Tab 3.

image

We can also expand to see what IPs are making that exact query.

Focusing based on LDAP query time

Tab 5- TopTime-IP

image

Tabs 3 and 4 focused more on total number of queries. Tabs 5 and 6 focuses on actual time of these queries. Tab 5 shows lots of really interesting info. First it shows which IP is taking the most search time. But it also shows the average search time. We’ll come back to this point later. %GrandTotal and %RunningTotal work the same as tabs 3 and 4.

image

You are also able to expand to see what query that host is doing which will be helpful later on.

Tab 6-TopTime-Filters

image

This is once again the inverse of the previous sheet. Instead of sorting by IP for total search time we are sorting by the actual query.

image

As before we can expand the query to see which client.

Buckets and Sandbox

Tab 7-TimeRanks

image

The TimeRanks tab breaks the query volume into 50 MS time-based quartiles go give a picture of the overall query performance. For example 2,125 or 14.82% of the LDAP queries in this sample completed in 50-99 MS. If we move a few rows down to the 250-299 MS bucket we can see that 77.85% of all queries sent were completed in 299 MS or less since our % running total includes all the pervious buckets. While some queries executed 300 or more MS, given that there were only 3 of them, it may be more interesting to optimize queries in the shorter time quartiles.

image

Expanding a bucket will show you all queries that fell into it as well as the total count.

Tab 8 is just a sandbox tab where you are able to create your own pivot tables from the data. If you find an interesting data pivot let us know in the comments.

Driving a resolution

As they say in Spiderman, with great data comes great responsibility, probably. Now that you have this information the first question to ask yourself is, what am I going to do with all this? Let me run through a few things to help get you started.

What Queries Are Taking The Longest?

Your gut probably told you to look at total number of queries and we’ll get there but I want to focus on something else first on Tab 5.

image

Excessive query volume can be a problem. But if they are being serviced very quickly they might not be a problem at all. However by focusing on Tab 5 we are going to see which clients are causing the most search time. Work with these to figure out what application is making these long running LDAP calls, ProcMon and WPR are good for this. Don’t forget to check the AvgSearchTime as well. Here we can we see in row 11 that someone made 1 call and it took 44,336 MS to return since they basically just asked for everything. Clearly we could make some adjusts there in terms of efficacy. The next client in row 14 asked for a pretty complicated query and it took 30,061 MS to return. Again optimizations can hopefully be made.

Who is sending all the queries?

image

Tab 3 is a great place to start. Focus on those clients that have a high search count and a high average search time. Again you’ll need to trace this back on the client side to what application is making this call. ProcMon and WPR should get you started.

Are These Clients In The Right AD Site?

We’ve discussed previously about finding clients talking to DCs in the wrong AD site. If you see any clients that shouldn’t be talking with the DCs you have logs for you may need to confirm your subnets are defined properly in AD Sites and Services.

Indexing Attributes?

As you dig through this data and work with your developers or application owners you might come to find some of the slowness is due to the attributes being indexed. Thankfully we’ve already covered this topic in great length here. Remember while the index is building it may delay AD replication and to will also increase your database size.

Working With Developers?

This information is also really useful to your development teams. Take this as an opportunity to partner with them and attack these queries that can be improved with them. Things they may be able to do including client side caching of data, throttling the queries or even re-writing the queries to make them more efficient. They might not even realize what the query is doing or how slow it is taking since nobody has called to complain. Now is a good time to get ahead of the problem, work together and everyone wins.

Do I have the latest LDAP query optimizer installed?

Improvements are still being made in how domain controllers can further optimize LDAP queries. There are a few updates you may want to avoid, and some others you should download and test.

Windows Server 2012 R2 RTM introduced a new query optimizer and the most detailed LDAP logging capabilities.

KB 2862304 backported subsets of the LDAP query optimizer to Windows Server 2012, 2008 R2 and 2008 DCs. That code introduced a defect where queries that referenced undefined attributes where slower to execute. That performance problem is resolved for Windows Server 2012 R2 DCs by KB 3042816. An update for KB3042816 is still being worked on for 2012, 2008 R2 and 2008.

Hopefully this is enough to get you started and really start reducing excessive LDAP queries and optimizing performance. Want to again send a thank you to Arren and Ming Chen for his wonderful script.

Mark “fully LDAP optimized” Morowczynski

Mailbag: All ADFS All The Time (Issue #11)

$
0
0

Hey y’all Mark and Tom here. Thinks are starting to return to normal so hopefully we should be back to a regular posting schedule. Tom should have some more time since HIS hockey team is already out of the playoffs while mine continues to march on. This mailbag is chalked full of ADFS goodness. Let’s get into it.

 

Expiring Token-Signing Certificate

Monitoring RPs metadata

Workplace Join Expiration

SQL Team installing ADFS

Stuff from the Interwebs

 

Question

We have soon expiring token-signing certificates and we need to coordinate with multiple relying parties. Is there a way to use a separate token-signing certificate per replaying party?

Answer

No there is not. As Dave discussed in his ADFS Deep Dive Certificate Planning post (http://blogs.technet.com/b/askpfeplat/archive/2015/01/26/adfs-deep-dive-certificate-planning.aspx) validity period is something to consider and extending the token-signing certificate for greater than 1 year so the frequency of this type of change is not so frequent. We’ve also heard your feedback for this request.

Question

We have several RPs that are set to automatically monitor and update the relying party metadata. How often does this refresh occur and how can I check that it was checked?

Answer

It should be refreshing every 24 hours. To validate that you’ll need to use the following PowerShell command to Get-AdfsRelyingPartyTrust –Name “<your relying party trust name>”. Here is an example of my O365 RP.

image

You can see the LastMonitoredTime of the RP.

Question

Some of my workplace join devices are expiring after 30 days of inactive use. My environment requires me to make this longer, such as 90 days. Can you do this?

Answer

Yes you can, use the Set-ADFSDeviceRegistration –MaximumInactiveDays 90. More info can be found https://technet.microsoft.com/en-us/library/dn479315.aspx

Question

My SQL team is asking me for the exact permissions required to install ADFS DB and set permissions. Is this documented anywhere?

Answer

We’ll do you one better. The Export-ADFSDeploymentSQLScript is what you need. https://technet.microsoft.com/en-us/library/dn479308.aspx. It will output two files, one for creating the databases and one for the permissions.

 

Stuff from the Interwebs

This is some stuff we’ve set aside from the last few weeks. You probably found out about already but if you missed it.

-Baseball has many great things about it. One is when managers lose their minds, like this.

-We are finally getting another Dark Knight comic.

-Summer movies have started don’t miss any.

-Finally, everyone lost their mind about the Star Wars trailer, and it was cool. Let’s not forget what it could have been with Patton Oswalt’s great improv.

 

Mark ‘yes and’ Morowczynski and Tom ‘18 holes a day’ Moser

Getting Started with Microsoft Deployment Toolkit for Windows Server 2012 R2 and Windows 8.1: Part II, starting with Selection Profiles (and some random Windows Server 2003 End of Service ramblings)

$
0
0

A while back (and longer than I’d like to admit), I wrote a post on getting started with Windows Deployment for Windows Server or Windows client using the Microsoft Deployment Toolkit. I’ve helped many customers since then setup and use MDT in their environment. Many of these engagements are a result of Windows Server 2003 end of support quickly approaching and customers wanting to spend as little time as possible on the actual operating system build process. For those of us keeping track, this is set to occur on July 14th, 2015 or 64 days from the writing of this post.

Some of you may asking what end of support means. To summarize, it means:

  • No updates. 41 critical security updates were released for Windows Server 2003 in 2014 alone. No updates will be developed or released after end of support.

  • No support. As an engineer, we’re not allowed to talk to you about any issues involving Windows Server 2003 after July 14th (unless you’re trying to migrate off of Windows Server 2003).

  • No compliance. Windows Server 2003 servers will not pass a compliance audit.

Why do we always talk migration and not performing in-place upgrades?

  • Migration provides a transition path. A transition from x86 to x64. A transition from physical to virtual (and vice versa). Additionally, it provides more control.

  • Migration provide a clean operating system install. Migration allows you to target what you want to move over and don’t want to move over. It gives us the opportunity to get rid of stale data, settings, applications, and more from over a decade ago.

  • Migrations reduce the risk and downtime. The operating system installation and most migration tasks take place while the source server is still live. This also gives us the opportunity for verification and performance benchmarking prior to bringing the target server online. Lastly, it gives us a rollback option. If things do not go well, we can roll back to the source server.

Migrations are a good thing. Don’t get me wrong. We fully support in-place upgrades. But keep in mind the following when considering an in-place upgrade:

  • You cannot upgrade between architectures. Most Windows Server 2003 instances are 32-bit installs. Windows Server 2008 R2 and later are only 64-bit.

  • You cannot upgrade from Windows Server 2003 to Windows Server 2012 R2.

Migration is a great time to do some spring cleaning, to take the opportunity and step back to look at what the future state of your standard operating environment should look like, and to ensure the foundation (i.e. Windows Server 2012 R2) is healthy and optimized for your environment.

So why am I helping so many enterprises setup MDT in their environment?

  • Ease of automating the build process for Windows Servers (or clients)

  • Ease of integrating all the security updates until that point of time into a custom image

  • Ease of customizing the deployment

  • Removing end user configuration error

  • And it fully integrates into System Center Configuration Manager should you decide to pursue that venture at some point in the future

And the latest trend, MDT helps provide consistency across both virtual and physical servers.

  • Instead of maintaining an image for your physical servers and a template with an image for your virtual servers, the latest trend shows customers moving to a single deployment method where templates are used to setup the actual virtual machine settings and configuration, but MDT is used to deploy the image. Why do this? In short, it provides consistency across both virtual and physical environments.

  • Instead of two different processes, we have one.

  • Instead of maintaining the operating system in an image for physical servers and maintaining the operating system in the template for virtual servers, we maintain it in a single location.

So if you haven’t read my post on getting started with MDT, I certainly encourage you to do so. MDT won’t help your remediate your problem applications, but it will help you get Windows Server more quickly out the door with less manual steps needed as part of the build process. The getting started post is linked above or can be found here:

http://blogs.technet.com/b/askpfeplat/archive/2013/09/16/getting-started-with-windows-deployment-for-windows-server-2012-or-windows-8-using-microsoft-deployment-toolkit-mdt.aspx

I’m expanding upon that original post by request by beginning with selection profiles. Selection profiles were mentioned in my original post as one of the reasons for a well-defined folder structure in MDT.

Paul D, this one’s for you. Sorry it has taken me so long. I hope you find as much value in this post as you did in the original post.

So what are selection profiles anyhow?

A couple quick facts about selection profiles:

  • Selection profiles allow you to filter the content you have added to MDT.
  • Selection profiles give you more control.
  • Selection profiles only work with folders.
  • Selection profiles allow you to select one or more folders you’ve created in MDT workbench to use for various activities.
  • Selection profiles are found under the Advance Configuration node in the MDT Deployment Workbench.

We actually create some default selection profiles for you:

If you look around, you will find them referenced in various places.

Like in any of our default Task Sequences during driver injection:

Or under our Windows PE Drivers and Patches settings:

Linked deployment shares, Media…they all use selection profiles.

So why use selection profiles? 

It’s all about control, control, CONTROL and who doesn’t want more control? J

You can:

  • Control which drivers and updates are used when creating your boot image

  • Control which drivers are used during driver injection in your task sequences

  • Control what is replicated to your linked deployment shares

  • Control what is included with your media deployments

With Selection Profiles, you can control what we use or don’t use based on our folder structure.

They’re super easy to create and setup. You do need your well-defined folder structure in place. You’ll quickly find that you can only select or not select at the folder level.  To create custom select profiles, simply right-click on Selection Profiles, choose New Selection Profile, and walk through the wizard. Here are some examples I’ve recently experienced onsite of when selection profiles have come in handy:

In March of this year, I was onsite and when deploying an image, it would fail prior to the first reboot. MDT would format the drive, apply the image, and then we would see it hang when applying the unattend.xml suing DISM. It would sit there for a while and would then fail with the lovely and generic error shown below:

So what would your guess be?

I automatically thought drivers (after all, isn’t it always the drivers?) and we spent a good chunk of time eliminating that.

Then I thought our base image (we were in the process of building a custom image) was corrupt. More time spent there.

Then we recreated the Task Sequence just for grins.

But we still failed.

So I stepped back and looked at what we hadn’t eliminated yet through our process of elimination. And you know what the answer turned out to be? Packages.

The customer had added some packages for an image he created for a new operating system that were erroneously being injected during our installation in Preinstall phase shown below:

So what did we do to fix this? We placed the updates for the newer OS in a separate packages folder and the updates for the OS we were deploying into its own folder like so:

Then we created a selection profile like so:

Then updated our Apply Patches to use the new Selection Profile. Problem solved!

Another example, Windows PE automatically includes all network drivers, mass storage drivers, and packages as part of your boot media as shown below: 

 

However, occasionally WinPE will require specific drivers just during the WinPE phase or will require specific drivers to be excluded just during the WinPE phase. As an example, wireless network drivers. We don’t support deploying an operating system over wireless. Why include them in your boot image. It could only cause potential problems. Instead, create a folder for WinPE drivers under Out-of-Box drivers that includes only the network and storage drivers required during deployment and create a selection profile for WinPE drivers. An example, when booting from the MDT boot image, a CMD shell would pop up just after the Post Install phase right before we reboot. This breaks the automation as we have to close the CMD window that is automatically opened. This is traditionally the result of a failure installing a device into WinPE and sure enough, we see the following failure in the wpeinit.log for a wireless adapter that we couldn’t care less about at this point during the deployment:

2014-12-04 19:13:34.247, Info      ==== Initializing Network Access and Applying Configuration ====

2014-12-04 19:13:34.247, Info      No EnableNetwork unattend setting was specified; the default action for this context is to enable networking support.

2014-12-04 19:13:34.247, Info      Global handle for profiling mutex is non-null

2014-12-04 19:13:34.247, Info      Waiting on the profiling mutex handle

2014-12-04 19:13:34.262, Info      Acquired profiling mutex

2014-12-04 19:13:34.942, Info      Install MS_MSCLIENT: 0x0004a020

2014-12-04 19:13:34.942, Info      Install MS_NETBIOS: 0x0004a020

2014-12-04 19:13:35.239, Info      Install MS_SMB: 0x0004a020

2014-12-04 19:13:35.784, Info      Install MS_TCPIP6: 0x0004a020

2014-12-04 19:13:36.775, Info      Install MS_TCPIP: 0x0004a020

2014-12-04 19:13:36.775, Info      Service dhcp start: 0x00000000

2014-12-04 19:13:36.775, Info      Service lmhosts start: 0x00000000

2014-12-04 19:13:36.978, Info      Service ikeext start: 0x00000000

2014-12-04 19:13:37.103, Info      Service mpssvc start: 0x00000000

2014-12-04 19:13:37.118, Info      Service mrxsmb10 start: 0x00000000

2014-12-04 19:13:37.118, Info      Released profiling mutex

2014-12-04 19:13:37.118, Info      Spent 2859ms installing network components

2014-12-04 19:13:38.528, Info      Installing device sd\vid_02d0&pid_4324&fn_1 X:\WINDOWS\INF\bcmdhd.inf failed with status 0x80070002

2014-12-04 19:13:41.076, Info      Installing device usb\vid_0424&pid_7500 X:\WINDOWS\INF\oem0.inf succeeded

2014-12-04 19:13:41.545, Info      Installing device root\kdnic X:\WINDOWS\INF\kdnic.inf succeeded

2014-12-04 19:13:44.310, Info      Spent 7204ms installing network drivers

2014-12-04 19:13:44.310, Info      STATUS: FAILURE (0x80070002)

The answer is again, selection profiles to control which drivers we’re using in our boot image. We can do this by:

1. Creating a WinPE Drivers folder under Out-of-Box Drivers in MDT Workbench and importing the necessarily network and storage drivers into this folder.

2. We then create a selection profile specifically for WinPE Drivers as shown below:

3. Instead of using the default All Drivers and Packages for drivers, we then use the newly created WinPE Drivers selection profile:

A third example involves driver conflicts.

What if you have similar hardware that uses very similar, but different drivers? Or you’re deploying a legacy operating system like Windows Server 2008 R2 or Windows 7 where the hardware requires specific driver versions that are not the latest driver versions you have added to the MDT Workbench for the more current version of the operating system that you are also deploying? Driver conflicts like this are difficult and time consuming to troubleshoot. If you run into a scenario where the hardware you are deploying to is either

  1. Picking up the wrong driver
    or

  2. Picking up the wrong version of a driver

Then selection profiles may help.

You can use any number of WMI queries to determine what is needed to run a specific step in a Task Sequence. On each Task Sequence step, we have an Options tab. This Options tab is important for many reasons.

  • We can disable a step in the Task Sequence

  • We can continue on error (useful when installing applications or Windows Updates and don’t necessarily want your Task Sequence to halt if one of them fails)

  • We can add any number conditions for that specific step. This is useful with drivers, but it can also be useful in a number of other ways such as application installs, partitioning schemes, and much more.

I often use the Query WMI option during the driver injection step to control which drivers I install based on the Selection Profile.

To find out your syntax for WMI, you can use the following command or reference MSINFO32: 

WMIC ComputerSystem GET Model

Here’s an example for a Dell PowerEdge M710. Let’s say I have problems with this version of the driver causing conflicts for the other hardware I deploy. All my other drivers play nicely together, but when I add this specific one, it causes problems for my other machines. What would my steps look like?

1. Create the appropriate driver folder structure. In this case, create a separate folder for the problematic driver. I called mine M710:

2. Create a Selection Profile

3. Open my Task Sequence and add an additional Inject Drivers step.

Now why am I adding an additional step instead of just modifying the one that is already there? Well, in my case, I do not want a separate Task Sequence for this one problematic driver. I want this model to use this driver and all my other hardware to use the normal PNP driver injection. So what I’m going to do is run my Dell Driver Injection step first and then follow it up with the normal Inject Drivers step.

I like to rename my steps when adding additional steps so that I know what I am doing in my Task Sequence without having to click on the step. When I’ve finished adding my additional Inject Drivers step, it looks like this:

4. Run WMIC ComputerSystem GET Model  to get the model information for the query or look it up online.

5. Write your WMI query. I write my WMI query as follows (numerous examples of WMI queries appear online):

    SELECT * FROM Win32_ComputerSystem WHERE Model LIKE %M710%

6. Add the query as a condition on the Options tab of the Inject Driver step and click Apply:

7. Test to verify it works as expected.

Here’s another example I’ve used quite a bit recently when using a single Task Sequence to deploy both physical and virtual servers. What if you want the VMWare drivers or software to install only on your virtual servers? You can use the following query for any Inject Drivers, Format and Partition Disk, or Install Applications step:

SELECT * Model from Win32_ComputerSystem where Model = "VMware Virtual Platform"

Take a look at Bill Spears’ post for another option using the Rules tab (aka the customsettings.ini) in MDT: http://blogs.technet.com/b/askcore/archive/2013/05/09/how-to-manage-out-of-box-drivers-with-the-use-of-model-specific-driver-groups-in-microsoft-deployment-toolkit-2012-update-1.aspx

Selection Profiles with Media Deployments


I’ve had several advisory calls and onsite engagements here recently where I’m surprised to learn that the engineers are not familiar with Media Deployments.

Media can be found under the Advanced Configuration section of MDT:

Media should be more aptly named, “Offline Media” because that is exactly what it is. Media will allow you to deploy your same custom image in an environment without any network connectivity or internet access. When you create a media deployment, you use a selection profile to tell MDT what all you want included in the media. Let’s walk through this.

Here’s my scenario. I have several remote branch offices where I need to deploy Windows Server 2012 R2. These branch offices have a lousy internet connection. It would be painful to install any operating system over the network and since these branch offices only have a server or two, I really do not want to place a Linked Deployment Share locally. Media would be a good fit for this scenario.

1. First create a selection profile. Selection Profiles are important with Media deployments as we can limit what we include as part of our media and decrease the size of the Media deployment. I know at a bare minimum I need an operating system and a Task Sequence, but other than that, it is up to me on whether I select additional applications, drivers, packages, etc. I want my Media deployments to match my network deployments as much as possible. Therefore, I want the same custom operating system to be installed with the same applications and packages and I also want to make my drivers available as well.
My selection profile looks as follows:

I also have my Windows Server 2012 R2 Packages selection and my Window Server 2012 R2 post-deploy applications selected, but this cannot be seen due to scrolling.

2. Right-click on Media and select New Media. This pops open the New Media Wizard. Supply a path where you want the media to be created (it cannot be the MDT share), and choose the Selection Profile you created in step #1.

3. Click Next a couple times and Finish to complete the wizard.

4. When it is done, you’ll have something that looks like the following under Media. It may look like it is done creating, but it is not. You can think of this as creating the necessary folder structure:

5. Right-click on your media and choose Update Media Content to generate the offline media. This takes some time.

6. After it completes, if you use File Explorer to browse the folder location, you should have something that looks like the following:

So what do we have here?

LiteTouchMedia.iso– Bootable ISO that can be burned to DVD (if not too large) or mounted in a VM.

Content  folder– This folder contains everything you need to create a bootable UFD or USB hard disk. This is the more common scenario. It does require that you format the device NTFS if your image size is over 2GB due to FAT32 limitations. But once you have it formatted correctly, all you need to do is copy the contents of the Content folder directly to the USB hard drive. Then simply insert the USB at boot and as long as the hardware is capable of booting from USB, you will see the familiar MDT LiteTouch wizard and can continue your deployment without any connection to the MDT deployment share:

As you can see, selection Profiles are quite useful. Don’t be afraid to use them to control as little or as much as the deployment process as you like.  

For more MDT goodness, make sure to check out Hilde’s MDT blog posts:

And don’t forget Hilde’s posts for more great deployment fun:

http://blogs.technet.com/b/askpfeplat/archive/2014/08/04/mdt-2013-part-i-mdt-configuration-capture-a-windows-server-2012-r2-base-os-image.aspx

http://blogs.technet.com/b/askpfeplat/archive/2014/08/11/mdt-2013-part-ii-create-a-deployment-task-sequence-and-deploy-a-custom-windows-server-2012-r2-base-os-image.aspx

Cheers,

Charity “Deploy Happy” Shelbourne

How To Provide Feedback On Windows Server

$
0
0

Hey y’all, Mark here. Today is a national holiday in the USA which means technically, for me, Dante from Clerks would say, “I’m not even supposed to be here today!” That being said we had a real quick thing to share out with you.

Have you ever thought to yourself, “I wish Windows Server did this, who can I tell about this fabulous amazing idea I have?!” Maybe even “Why does this do it this way, it would be way better if it did it this way!” Your day has finally come my friend. The Windows Server User Voice is now live. You can put your ideas and feedback in there and if they are actually as good as you think, other users can vote on them too.

image

As you can see we have several ideas in there getting votes. Just a reminder, this is not a place for technical help that would be the TechNet forums. That’s all there is to it. Jump in there and tell us what you think.

Mark “this job would be great if it wasn’t for the customers” Morowczynski

I kid I kid! I love you guys. Well most of you anyways. Smile

How Shared VHDX Works on Server 2012 R2

$
0
0

Hi, Matthew Walker here, I’m a Premier Field Engineer here at Microsoft specializing in Hyper-V and Failover Clustering. In this blog I wanted to address creating clusters of VMs using Microsoft Hyper-V with a focus on Shared VHDX files.

From the advent of Hyper-V we have supported creating clusters of VMs, however the means of adding in shared storage has changed. In Windows 2008/R2 we only supported using iSCSI for shared volumes, with Windows Server 2012 we added the capability to use virtual fibre channel, and SMB file shares depending on the workload, and finally in Windows Server 2012 R2 we added in shared VHDX files

 
Shared Storage for Clustered VMs:
Windows Version
2008/R2
2012
2012R2
iSCSI
Yes
Yes
Yes
Virtual Fibre Channel
No
Yes
Yes
SMB File Share
No
Yes
Yes
Shared VHDX
No
No
Yes

So this provides a great deal of flexibility when creating clusters that require shared storage with VMs. Not all clustered applications or services require shared storage so you should review the requirements of your app to see. Clusters that might require shared storage would be file server clusters, traditional clustered SQL instances, or Distributed Transaction Coordinator (MSDTC) instances. Now to decide which option to use. These solutions all work with live migration, but not with items like VM checkpoints, host based backups or VM replication, so pretty even there. If there is an existing infrastructure with iSCSI or FC SAN, then one of those two may make more sense as it works well with the existing processes for allocating storage to servers. SMB file shares work well but only for a few workloads as the application has to support data residing on a UNC path. This brings us to Shared VHDX.

Available Options:
Hyper-V Capability
Shared VHDX used
iSCSI Drives
Virtual FC drives
SMB shares used in VM
Non-Shared VHD/X used
Host based backups
No
No
No
No
Yes
Snapshots/Checkpoints
No
No
No
No
Yes
VM Replication
No
No
No
No
Yes
Live Migration
Yes
Yes
Yes
Yes
Yes

Shared VHDX files are attached to the VMs via a virtual SCSI controller so show up in the OS as a shared SAS drive and can be shared with multiple VMs so you aren’t restricted to a two node cluster. There are some prerequisites to using them however.

Requirements for Shared VHDX:
2012 R2 Hyper-V hosts
Shared VHDX files must reside on Cluster Shared Volumes (CSV)
SMB 3.02

It may be possible to host a shared VHDX on a vender NAS if that appliance supports SMB 3.02 as defined in Windows Server 2012 R2, just because a NAS supports SMB 3.0 is not sufficient, check with the vendor to ensure they support the shared VHDX components and that you have the correct firmware revision to enable that capability. Information on the different versions of SMB and capabilities is documented in a blog by Jose Barreto that can be found here.

Adding Shared VHDX files to a VM is relatively easy, through the settings of the VM you simply have to select the check box under advanced features for the VHDX as below.

image

For SCVMM you have to deploy it as a service template and select to share the VHDX across the tier for that service template.

image

And of course you can use PowerShell to create and share the VHDX between VMs.

PS C:\> New-VHD -Path C:\ClusterStorage\Volume1\Shared.VHDX -Fixed -SizeBytes 30GB

PS C:\> Add-VMHardDiskDrive -VMName Node1 -Path C:\ClusterStorage\Volume1\Shared.VHDX -ShareVirtualDisk

PS C:\> Add-VMHardDiskDrive -VMName Node2 -Path C:\ClusterStorage\Volume1\Shared.VHDX -ShareVirtualDisk

Pretty easy right?

At this point you can setup the disks as normal in the VM and add them to your cluster, and install whatever application is to be clustered in your VMs and if you need to you can add additional nodes to scale out your cluster.

Now that things are all setup let’s look at the underlying architecture to see how we can get the best performance from our setup. Before we can get into the shared VHDX scenarios first we need to take a brief stint on how CSV works in general. If you want a more detailed explanation please refer to Vladimir Petter’s excellent blogs starting with this one.

 

image

This is a simplified diagram of the way we handle data flow for CSV, the main points here are to realize that access to the shared storage in this clustered environment is handled through the Cluster Shared Volume File System (CSVFS) filter driver and supporting components, this system handles how we access the underlying storage. Because CSV is a clustered file system we need to have this orchestration of file access. When possible I/O travels a direct path to the storage, but if that is not possible then we will redirect over the network to a coordinator node. The coordinator node shows up in the Failover Cluster manager as the owner for the CSV.

With Shared VHDX we also have to have orchestration of shared file access, to achieve this with Shared VHDX all I/O requests are centralized and funneled through the coordinator node for that CSV. This results in I/O from VMs on hosts other than the coordinator node being redirected to the coordinator. This is different from a traditional VHD or VHDX file that is not shared.

First let’s look at this from the perspective of a Hyper-V compute cluster using a Scale-Out File Server as our storage. For the following examples I have simplified things by bringing it down to two nodes and added in a nice big red line to show the data path from the VM that currently owns our clustered workload. For my example I making some assumptions, one is that the workload being clustered is configured in an Active/Passive configuration with a single shared VHDX file and we are only concerned with the data flow to that single file from one node or the other. For simplicity I have called the VMs Active and Passive just to indicate which one owns the Shared VHDX in the clustered VMs and is transferring I/O to the storage where the shared VHDX resides.

 

image

So we have Node 1 in our Hyper-V cluster accessing the Shared VHDX over SMB and connects to the coordinator node of the Scale-Out File Server cluster (SOFS), now let’s move the active workload.

 

image

So even when we move the active workload SMB and the CSVFS drivers will connect to the coordinator node in the SOFS cluster, so in this configuration our performance is going to be consistent. Ideally you should have high speed connects between your SOFS nodes and on the network connections used by the Hyper-V compute nodes to access the shares. 10 Gb NICs or even RDMA NICs. Some examples of RDMA NICs are Infiniband, iWarp and RDMA over Converged Ethernet (RoCE) NICs.

Now as we change things up a bit, we will move the compute onto the same servers that are hosting the storage

image

As you can see the access to the VHDX is sent through the CSVFS and SMB drivers to access the storage, and everything works like we expect as long as the active VM of the clustered VMs is on the same node as the coordination node of the underlying CSV, so now let’s look at how the data flows when the active VM is on a different node.

image

Here things take a different path than we might expect, since SMB and CSVFS are an integral part of ensuring proper orchestrated access to the Shared VHDX we send the data across the interconnects between the cluster nodes rather than straight down to storage, this can have a significant impact on your performance depending on how you have scaled your connections.

If the direct access to storage is a 4Gb fibre connect and the interconnect between nodes is a 1Gb connection there is going to be a serious difference in performance when the active workload is not on the same node that owns the CSV. This is exacerbated when we have 8Gb or 10Gb bandwidth to storage and the interconnects between nodes is only 1Gb. To help mitigate this behavior make sure to scale up your cluster interconnects to match using options such as 10 Gb NICs, SMB Multi-channel and/or RDMA capable devices that will improve your bandwidth between the nodes.

One final set of examples to address concerns about scenarios where you may have an application active on multiple clustered VMs that are accessing the same Shared VHDX file. First let’s go back to the separate compute and storage nodes.

image

And now to show how it goes with everything all together in the same servers.

image

So we can even implement a scale out file server or other multi-access scenarios using clustered VMs.

So the big takeaway here is more about understanding the architecture to know when you will see certain types of performance, and how to set proper expectations based on where and how we access the final storage repository for the shared VHDX. By moving some of the responsibility for handling access to the VHDX to SMB and CSVFS we get a more flexible architecture and more options, but without proper planning and an understanding of how it works there can be some significant differences in performance based on what type of separation there is between the compute side and the storage side. For the best performance ensure you have high speed and high bandwidth interconnects from the running VM all the way to the final storage by using 10 Gb or RDMA NICs, and try to take advantage of SMB Multi-Channel.

--- Matthew Walker


Working with the VMAgent on Microsoft Azure IaaS VMs

$
0
0

Ahhh, welcome to summer. The cool splashing blue water of the municipal pool. Blue-raz slushies from Tropical Snow. Bright, clear blue skies. Sometimes all that blue makes me think of Azure - the colors of the portal, the logo and the word itself – Azure – is a type of blue.

Gary Green and Mike Hildebrand will be your tour guides today, speaking with you about some very handy Azure IaaS VM management tools and a big "bonus nugget" at the end that almost makes this post a two'fer (no skipping ahead).

As an IT Pro, you're likely getting more and more comfortable with the cloud paradigm and you're brain is sparking with ideas of how your current IT architecture could benefit from integration with cloud services. That's great!

You've signed up for an Azure trial and you quickly blow through the UI wizard to create a VM. Now, you eagerly await the provisioning process to finish. Tick. Tock. This is worse than waiting for the toaster to pop.

You pound a can Mountain Dew KickStart and a few minutes later, you see in the portal that your shiny new VM is up and running. Ohhhh man!! You're like a kid on Christmas. A moth to the flame. A dog to a bone. Hilde to a theater-sized box of Mike and Ikes. You get the idea.

You click the "connect" button at the bottom of the portal and up pops an RDP connection dialog box for your VM.

Then …

Your mind races but it's coming up empty …

"What credentials did I set up on the VM to login?"

Your minds is blank. You try some of your usual suspects for test credentials but none work.

You think out loud, "Seriously?? I don't remember what username and password I specified when I set this thing up two minutes ago?!"

Perhaps you fat-fingered the password the same way twice when you entered it? It certainly can happen. I know sometimes my brain thinks "the" but my fingers type "then" or "they." It's as if my fingers have a mind of their own. Muscle memory is weird.

Maybe you had to choose a VM name that wasn't your 1st, 2nd or 12th choice due to uniqueness requirements of cloud service names? Maybe it's just me; maybe it's early-onset dementia; maybe it's the Azure version of the Jedi Mind-trick.

Maybe you are coming back to a VM that you provisioned a while back and now, your brain draws a blank for the user ID and/or password.

Or, maybe you aren't as absent minded as I am? Maybe you have a solid naming convention for Azure resources that enables you to precisely keep track of and know what's-what (that's a suggestion, by the way… J ).

Well the folks working on Azure aren't a lazy bunch and they created some out-of-band tools to help you manage your VMs. Collectively, these are enabled by an agent you can install on the VM called, creatively enough, the "VM Agent." For you virtualization techs out there, this won't be a foreign idea – it is along the lines of the Hyper-V "Integration services" and the "VMWare Tools." These drivers and other software light up features and functions for virtualized systems. The VM Agent for Azure VMs does the same type of thing for VMs running in Azure.

The focus of this post is IaaS Windows VMs but there is an agent for LINUX VMs, too, and the Agent is always installed on PaaS VMs (where it is known as the "GuestAgent").

If you use the 'quick-create' option, the agent is automatically part of the VM deployment but if you use the gallery to build your VM, you can check the box to install the agent (default) or clear the check to not install it.

 

NOTE: At the moment, it is not supported to uninstall the VMAgent from a VM after it's been installed. If you don't want the Agent on your VM, don't install it when provisioning the VM.

NOTE: If you don't want the VMAgent on your VM and you're provisioning VMs via Azure PowerShell, you use the –DisableGuestAgentswitch with either the New-AzureQuickVMor New-AzureVM/Add-AzureProvisioningConfig

 

Once installed, there are a few indications the VMAgent is there:

  • The presence of BGInfo details on the desktop – this is one default 'extension' of the VMAgent.

    NOTE: Currently, BGInfo options aren't configurable – you get what you get

     

  • A specific file/folder structure on the OS drive.

     

  • A couple of "Windows Azure" Services

     

If you didn't choose to have the VMAgent installed when you provisioned the VM, you can install it after the fact via an MSI and some Azure PowerShell.

Here's the process to get the VMAgent on an existing VM:

  • RDP into the VM, download the VMAgent MSI and run it within the VM. This installs the client software within the OS of the VM.
  • If you haven't done so in the past, download/update Azure PowerShell from here and install it (can be done on the target VM itself or any Internet-connected workstation/system – you need connectivity to Azure)
  • Open an elevated Azure PowerShell console and connect it to the Azure subscription where the target VM lives by typing the following commands:
    • Get-AzurePublishSettingsFile(then press enter)
      • Provide credentials for the Azure subscription
      • Read, understand and consider the following WARNING

      • Download/save the settings file via the pop-up prompt
    • Import-AzurePublishSettingsFile "C:\users\user03\Downloads\Windows Azure-credentials.publishsettings" (using the name/path to your settings file; press enter)

       

      !! SECOND WARNING !! This is one of two ways to connect Azure PowerShell to an Azure subscription. The subscription settings file you download contains your Azure credentials and should be well-protected and/or deleted when you're done connecting. Be sure to read more about connection methods and note the bright pink warning here: https://azure.microsoft.com/en-us/documentation/articles/powershell-install-configure/

     

  • Now, we use Azure PowerShell to hook into the specific cloud service where the VM resides and update a property of the VM object. This informs the Azure 'Fabric' that this VM now has the VMAgent running/available.
    From your connected Azure PowerShell console, type the following:
    • Get-AzureVM | Select ServiceName,Name (then press enter; this will return the values for your VMs which you can use for the next commands)
    • $vm = Get-AzureVM –ServiceName CloudService01 –Name VM-01(then press enter)
    • $vm.VM.ProvisionGuestAgent = $true(then press enter)
    • Update-AzureVM –Name VM-01 –VM $vm.VM –ServiceName CloudService01(then press enter)

You can now use Azure PowerShell to interact with the VM 'out-of-band', via the VMAgent, and perform tasks such as resetting the local admin username and password, re-enabling remote access defaults and turning on/off BGInfo.

    NOTE: Most of these actions require a sign-out/sign-in or possibly a reboot of the VM to take effect.

Rename local admin account and reset password

  • Get-AzureVM -ServiceName CloudService01 –Name VM-01 | Set-AzureVMAccessExtension –UserName NewLocalAdminUserName –Password S0meH@rdPassW0rd | Update-AzureVM

    NOTE: If this is performed against a Domain Controller, it will affect the built-in Administrator account of the domain, including enabling it (if it is disabled) and renaming the SAMAccount attribute of that ID. In my testing, doing this against a DC also resulted in the DC being gracefully rebooted.

     

Reset Remote Access to IaaS defaults (enable RDP and open a firewall port)

  • Get-AzureVM –ServiceName CloudService01 –Name VM-01 | Set-AzureVMAccessExtension | Update-AzureVM
    • NOTE: This enables RDP and opens a firewall port on the Windows Firewall. It does NOT recreate the "End Point" port mapping. If you deleted that, you'll need to manually recreate it.

 

Disable BGInfo

  • Get-AzureVM -ServiceName CloudService01 -Name VM-01 | Set-AzureVMBGInfoExtension -Disable | Update-AzureVM

     

Enable BGInfo

  • Get-AzureVM -ServiceName CloudService01 -Name VM-01 | Set-AzureVMBGInfoExtension | Update-AzureVM

     

 

Logging

Activities are logged to specific logs files on the VM itself:

In this specific log clip, I used Azure PowerShell to rename the local admin account to mhilde and reset the password:

 

Additionally, there are Azure-specific Event Logs but they don't track specific changes at the moment; these are more about health of the Agent.

 

Obviously, the OS itself will capture certain changes, too (in this case, as above, an account rename from user-002 to mhilde):

Lastly, there is logging within the Azure Portal:

  • Select "Management Services" from the list of all items (on the left of the portal) then click "Operation Logs"
  • A change can be correlated between the OPERATION ID within the Portal and the OperationId in the Azure PowerShell window:

     

And now for the bonus nugget/teaser …

In times past, the only choice you had was to use Azure PowerShell to work with most of the VMAgent functions.

However, you can do these things via GUI from the Azure preview portal!

  •  

    NOTE: The Azure Preview Portal is NOT currently supported. For your production workloads, obviously we recommend you continue to use the "vCurrent" portal (https://manage.azure.com) and/or the PowerShell commands above to avoid any issues where support isn't available. If you like living your Azure-life on the edge, though, or you're interested in a sneak-peek at the near-future, the preview portal is a really good time.

Disable BGInfo

  • Open the details tile/panel (or "blade" as they're called) for your VM and click "All settings" from the VM details
  • Click "Extensions" from the Settings blade
  • Click to select/highlight the BGInfo Extension
  • Click "Delete"
  • Click "Yes" to accept the verification prompt

 

  • Check the Notifications in the portal

     

    BGInfo is no longer active…

 

Enable BGInfo

  • From the "Settings" Blade of the target VM, click "Extensions"
  • Click "Add"
  • Select "BgInfo"
  • Click "Create"
  • Click "OK" on the blade that flies out on the far left – that blade didn't fit in the screen shot

     

Reset local admin account and password

  • From the "Settings" blade for the VM, click "Password reset"
  • Enter a new local admin account username and password
    • Password requirements are:
      • At least 8 characters long and
      • 3 of the following:
        • Upper
        • Lower
        • Number
        • Special character
    • If you know the current local admin account username and you don't want to change it, simply re-enter that username here.
  • Click "Reset password"

    NOTE: If this is performed against a Domain Controller, it will affect the built-in Administrator account of the domain, including enabling it (if it is disabled) and renaming the SAMAccount attribute of that ID. In my testing, doing this against a DC also resulted in the DC being gracefully rebooted.

 

Reset Remote Access to IaaS defaults (enable RDP and open a firewall port)

  • From the main blade for the target VM, click "Reset Remote Access"
  • Click "Yes" to the confirmation prompt
    • NOTE: This enables RDP and opens a firewall port on the Windows Firewall. It does NOT recreate the "End Point" port mapping. If you deleted that, you'll need to manually recreate it.

     

 

Wrap-up

Well, that about does it for today. The Azure VMAgent, along with some Azure PowerShell, can be quite helpful for managing your IaaS VMs if/when you've painted yourself into a corner.

You can perform some of those admin tasks via the preview portal but that UI isn't supported yet.

The addition of these capabilities to the preview portal was a nice surprise to me but, equally as exciting (or more), were the other style changes and feature-adds to the preview portal. If you haven't been out there recently, I URGE you to go have a look (https://portal.azure.com).

You'll find a very polished and eye-friendly web UI with a high degree of operational maturity and functionality to manage your Azure VMs. I, for one, am looking forward to GA for the preview portal (no, I don't know when that will be).

Cheers from Gary and Hilde!

ADFS Deep-Dive: Troubleshooting

$
0
0

Just in case if you haven’t seen this series, I’ve been writing an ADFS Deep-Dive series for the past 10 months. Here are links to the previous articles:

Before you start troubleshooting, ask the users that are having issues the following questions and take note of their answers as they will help guide you through some additional things to check:

  • Where are you when trying to access this application? At home? Office?
  • Are you connected to VPN or DirectAccess?
  • Can you log into the application while physically present within a corporate office?
  • What browser are you using?
  • How are you trying to authenticating to the application? Username/password, smartcard, PhoneFactor?

If you’re not the ADFS Admin but still troubleshooting an issue, ask the ADFS administrators the following questions:

  • Is the problematic application SAML or WS-Fed?
  • Who is responsible for the application? Someone in your company or vendor?
  • Is the issue happening for everyone or just a subset of users?
  • Can you get access to the ADFS server’s and Proxy/WAP event logs?

First, the best advice I can give you for troubleshooting SSO transactions with ADFS is first pinpoint where the error is being throw or where the transaction is breaking down. Is the transaction erroring out on the application side or the ADFS side? That will cut down the number of configuration items you’ll have to review. The user won’t always be able to answer this question because they may not be able to interpret the URL and understand what it means. Grab a copy of Fiddler, the HTTP debugger, which will quickly give you the answer of where it’s breaking down:

http://www.telerik.com/download/fiddler

Make sure to enable SSL decryption within Fiddler by going to Fiddler options:

image

Then “Decrypt HTTPS traffic”. I also check “Ignore server certificate errors

image

Warning: Fiddler will break a client trying to perform Windows integrated authentication via the internal ADFS servers so the only way to use Fiddler and test is under the following scenarios:

  1. The user that you’re testing with is going through the ADFS Proxy/WAP because they’re physically located outside the corporate network.
  2. You have hardcoded a user to use the ADFS Proxy/WAP for testing purposes.
  3. The application is configured to have ADFS use an alternative authentication mechanism.
  4. ADFS is hardcoded to use an alternative authentication mechanism than integrated authentication.
  5. You have disabled Extended Protection on the ADFS servers, which allows Fiddler to continue to work during integrated authentication. This is not recommended.

The classic symptom if Fiddler is causing an issue is the user will continuously be prompted for credentials by ADFS and they won’t be able to get past it.

If you recall from my very first ADFS blog in August 2014, SSO transactions are a series of redirects or HTTP POST’s, so a fiddler trace will typically let you know where the transaction is breaking down.

http://blogs.technet.com/b/askpfeplat/archive/2014/08/25/adfs-deep-dive.aspx

image

Frame 1: I navigate to https://claimsweb.cloudready.ms. It performs a 302 redirect of my client to my ADFS server to authenticate.

Frame 2: My client connects to my ADFS server https://sts.cloudready.ms. My client submits a Kerberos ticket to the ADFS server or uses forms-based authentication to the ADFS WAP/Proxy server.

Frame 3: Once I’m authenticated, the ADFS server send me back some HTML with a SAML token and a java-script that tells my client to HTTP POST it over to the original claims-based application – https://claimsweb.cloudready.ms.

Frame 4: My client sends that token back to the original application: https://claimsweb.cloudready.ms. Claimsweb checks the signature on the token, reads the claims, and then loads the application.

 

Simplified

Just remember that the typical SSO transaction should look like the following:

  1. Initial request to application.
  2. Redirect to ADFS for authentication
  3. User sent back to application with SAML token.

Identify where the transaction broke down – On the application side on step 1? When redirected over to ADFS on step 2? Or when being sent back to the application with a token during step 3? Also, to make things easier, all the troubleshooting we do throughout this blog will fall into one of these three categories.

All the things we go through now will look familiar because in my last blog, I outlined everything required by both parties (ADFS and Application owner) to make SSO happen but not all the things in that checklist will cause things to break down. Consequently, I paired that list down only to the items that will break SSO and then reorganized them into the above troubleshooting categories and we’re now going to step through each:

http://blogs.technet.com/b/askpfeplat/archive/2015/03/02/adfs-deep-dive-onboarding-applications.aspx

 

1.) The SSO Transaction is Breaking during the Initial Request to Application

If the transaction is breaking down when the user is just navigating to the application, check the following:

 

Is RP Initiated Sign-on Supported by the Application?

If the transaction is breaking down when the user first goes to the application, you obviously should ask the vendor or application owner whether there is an issue with the application. But if you find out that this request is only failing for certain users, the first question you should ask yourself is “Does the application support RP-Initiated Sign-on?”

I know what you’re thinking, “Why the heck would that be my first question when troubleshooting?” Well, sometimes the easiest answers are the ones right in front of us but we overlook them because we’re super-smart IT guys. You know as much as I do that sometimes user behavior is the problem and not the application. J

If the application doesn’t support RP-initiated sign-on, then that means the user won’t be able to navigate directly to the application to gain access and they will need special URL’s to access the application. Ask the user how they gained access to the application? Through a portal that the company created that hopefully contains these special URL’s, or through a shortcut or favorite in their browser that navigates them directly to the application. Doh! If they answer with one of the latter two, then you’ll need to have them access the application the correct way – using the intranet portal that contains special URL’s. It’s often we overlook these easy ones. Smile

There can obviously be other issues here that I won’t cover like DNS resolution, firewall issues, etc.

 

2.) The SSO Transaction is Breaking when Redirecting to ADFS for Authentication

If the transaction is breaking down when the user is redirected to ADFS for authentication, then check the following items:

 

Is the ADFS Logon URL correctly configured within the application?

Many applications will be different especially in how you configure them. Some you can configure for SSO yourselves and sometimes the vendor has to configure them for SSO. Consequently, I can’t recommend how to make changes to the application, but I can at least guide you on what might be wrong. If the application is redirecting the user to the wrong URL, that user will never authenticate against ADFS and they’ll receive an HTTP 404 error – Page not found. This should be easy to diagnose in fiddler. Just look what URL the user is being redirected to and confirm it matches your ADFS URL.

Also make sure that your ADFS infrastruce is online both internally and externally. Test from both internal and external clients and try to get to https://<sts.domain.com>/federationmetadata/2007-06/federationmetadata.xml.

Key Takeaway: Regardless of whether the application is SAML or WS-Fed, the ADFS Logon URL should be https://<sts.domain.com>/adfs/ls with the correct WS-FED or SAML request appended to the end of the URL.

 

Is a SAML request signing certificate being used and is it present in ADFS? (Optional)

How do you know whether a SAML request signing certificate is actually being used. Well, look in the SAML request URL and if you see a signature parameter along with the request, then a signing certificate was used:

image

https://sts.cloudready.ms/adfs/ls/?SAMLRequest=jZFRT4MwFIX%2FCun7KC3OjWaQ4PbgkqlkoA%2B%2BmAKdNCkt9hZ1%2F14GmkwfFl%2Fv%0APfc7p6cr4K3qWNq7Ru%2FFWy%2FAeZ%2Bt0sDGRYx6q5nhIIFp3gpgrmJ5erdj1A9Y%0AZ40zlVHISwGEddLotdHQt8Lmwr7LSjzudzFqnOuAYQyNLP1Kmb62gtdHvwWc%0AD6PSKOEaH8DgE5ni7CEvkLcZokjNT9AzhIM%2FBF4fACvAyNtuYvRSRSIiZXlN%0AwrlY0CriSxKGhNLDFeXhYjkfZAC92GpwXLsY0YCEM0JnQVQESxaEjCyekZd9%0AP%2BxG6lrq18stlJMI2G1RZLMp%2FJOwMAYfBChZnbpko7E9a%2Fcylv9UipJ%2FFbjC%0AZy6TZcfuB%2Bx2kxklq6OXKmU%2B1sOpEzEiCCfTye%2FfT74A%0A&RelayState=cookie%3A29002348&SigAlg=http%3A%2F%2Fwww.w3.org%2F2000%2F09%2Fxmldsig%23rsa-sha1&Signature=M0xoWQfcN3Yp94T2HiqIdJzEkxYqGc6hhopqi8xOI%2B2BtPSLufFDdQIF7z6Xjm6XdLq1MH9Av5xz2QWYs84ZYhlG3fHtZCjjaoI2wZqplRszHla%2BjtZoW20NGDepDsCRT0AKNkhe%2B4Yj3LshrM6EX5O3obx2Mypy8EcsoURkTF3kf1dwKqsGA3ka7ehbRmUQGJUXD0u4iFBog7YgkL4Q9FYMTanZeRo2X4%2FkAeNxT8ormKWJfYnAzg0F4Ku60zDd5N7jYu4XeyOsXDthEFI5H4WYucAprREl2hgSUI21J782kKzrslalIaJ5BKPIO50NPCIb5Sf6Zw4maLpZrFEfrw%3D%3

Now check to see whether ADFS is configured to require SAML request signing:

Get-ADFSRelyingPartyTrust –name “shib.cloudready.ms”

image

 

By default, relying parties in ADFS don’t require that SAML requests be signed. Although it may not be required, let’s see whether we have a request signing certificate configured:

image

Even though the configuration isn’t configured to require a signing certificate for the request, this would be a problem as the application is signing the request but I don’t have a signing certificate configured on this relying party application. If the application is signing the request and you don’t have the necessary certificates to verify the signature, ADFS will throw an Event ID 364 stating no signature verification certificate was found:

image

Key Takeaway: Make sure the request signing is in order. It isn’t required on the ADFS side but if you decide to enable it, make sure you have the correct certificate on the RP signing tab to verify the signature. You would need to obtain the public portion of the application’s signing certificate from the application owner.

 

Is the Request Signing Certificate passing Revocation?

Also, ADFS may check the validity and the certificate chain for this request signing certificate. This configuration is separate on each relying party trust. To check, run:

Get-adfsrelyingpartytrust –name <RP Name>

image

You can see here that ADFS will check the chain on the request signing certificate. If you would like to confirm this is the issue, test this settings by doing either of the following:

1.) Temporarily Disable Revocation Checking entirely and then test:

Set-adfsrelyingpartytrust –targetidentifier “https://shib.cloudready.ms” –signingcertificaterevocationcheck “None”

2.) Or export the request signing certificate run certutil to check the validity and chain of the cert:

certutil –urlfetch –verify c:\requestsigningcert.cer

I even had a customer where only ADFS in the DMZ couldn’t verify a certificate chain but he could verify the certificate from his own workstation. You can imagine what the problem was – the DMZ ADFS servers didn’t have the right network access to verify the chain.

 

Is the application sending the right identifier?

If the application does support RP-initiated sign-on, the application will have to send ADFS an identifier so ADFS knows which application to invoke for the request. The methods for troubleshooting this identifier are different depending on whether the application is SAML or WS-FED. We need to ensure that ADFS has the same identifier configured for the application.

SAML

From fiddler, grab the URL for the SAML transaction; it should look like the following:

image

https://sts.cloudready.ms/adfs/ls/?SAMLRequest=jZFRT4MwFIX%2FCun7KC3OjWaQ4PbgkqlkoA%2B%2BmAKdNCkt9hZ1%2F14GmkwfFl%2Fv%0APfc7p6cr4K3qWNq7Ru%2FFWy%2FAeZ%2Bt0sDGRYx6q5nhIIFp3gpgrmJ5erdj1A9Y%0AZ40zlVHISwGEddLotdHQt8Lmwr7LSjzudzFqnOuAYQyNLP1Kmb62gtdHvwWc%0AD6PSKOEaH8DgE5ni7CEvkLcZokjNT9AzhIM%2FBF4fACvAyNtuYvRSRSIiZXlN%0AwrlY0CriSxKGhNLDFeXhYjkfZAC92GpwXLsY0YCEM0JnQVQESxaEjCyekZd9%0AP%2BxG6lrq18stlJMI2G1RZLMp%2FJOwMAYfBChZnbpko7E9a%2Fcylv9UipJ%2FFbjC%0AZy6TZcfuB%2Bx2kxklq6OXKmU%2B1sOpEzEiCCfTye%2FfT74A%0A&RelayState=cookie%3A29002348&SigAlg=http%3A%2F%2Fwww.w3.org%2F2000%2F09%2Fxmldsig%23rsa-sha1&Signature=M0xoWQfcN3Yp94T2HiqIdJzEkxYqGc6hhopqi8xOI%2B2BtPSLufFDdQIF7z6Xjm6XdLq1MH9Av5xz2QWYs84ZYhlG3fHtZCjjaoI2wZqplRszHla%2BjtZoW20NGDepDsCRT0AKNkhe%2B4Yj3LshrM6EX5O3obx2Mypy8EcsoURkTF3kf1dwKqsGA3ka7ehbRmUQGJUXD0u4iFBog7YgkL4Q9FYMTanZeRo2X4%2FkAeNxT8ormKWJfYnAzg0F4Ku60zDd5N7jYu4XeyOsXDthEFI5H4WYucAprREl2hgSUI21J782kKzrslalIaJ5BKPIO50NPCIb5Sf6Zw4maLpZrFEfrw%3D%3

See that SAMLRequest value that I highlighted above? Its base64 encoded value but if I use SSOCircle.com or sometimes the Fiddler TextWizard will decode this:

https://idp.ssocircle.com/sso/toolbox/samlDecode.jsp

image

Select Redirect and then click decode:

image

If it doesn’t decode properly, the request may be encrypted. If it does decode property, if we click on the XML View, it should look like this:

image

Here you can see where my relying party trust has the same identifier as the value in the SAML request so the identifier the application is sending us here is fine:

image

WS-FED

If the identifier is wrong, I’m not even given a chance to provide my credentials and am immediately given an error screen:

image

I captured the following URL from Fiddler during a WS-FED transaction with the wrong identifier being passed from the application:

image

https://sts.cloudready.ms/adfs/ls/?wa=wsignin1.0&wtrealm=https%3a%2f%2fclaims.cloudready.ms&wctx=rm%3d0

If you URL decode this highlighted value, you get https://claims.cloudready.ms. And you can see that ADFS has a different identifier configured:

image

Another clue would be an Event ID 364 in the ADFS event logs on the ADFS server that was used stating that the relying party trust is unspecified or unsupported:

image

Key Takeaway: The identifier for the application must match on both the application configuration side and the ADFS side. Look for event ID’s that may indicate the issue. If you don’t have access to the Event Logs, use Fiddler and depending on whether the application is SAML or WS-Fed, determine the identifier that the application is sending ADFS and ensure it matches the configuration on the relying party trust.

 

Is the correct Secure Hash Algorithm configured on the Relying Party Trust?

This one typically only applies to SAML transactions and not WS-FED. In the SAML request below, there is a sigalg parameter that specifies what algorithm the request supports:

image

https://sts.cloudready.ms/adfs/ls/?SAMLRequest=jZFRT4MwFIX%2FCun7KC3OjWaQ4PbgkqlkoA%2B%2BmAKdNCkt9hZ1%2F14GmkwfFl%2Fv%0APfc7p6cr4K3qWNq7Ru%2FFWy%2FAeZ%2Bt0sDGRYx6q5nhIIFp3gpgrmJ5erdj1A9Y%0AZ40zlVHISwGEddLotdHQt8Lmwr7LSjzudzFqnOuAYQyNLP1Kmb62gtdHvwWc%0AD6PSKOEaH8DgE5ni7CEvkLcZokjNT9AzhIM%2FBF4fACvAyNtuYvRSRSIiZXlN%0AwrlY0CriSxKGhNLDFeXhYjkfZAC92GpwXLsY0YCEM0JnQVQESxaEjCyekZd9%0AP%2BxG6lrq18stlJMI2G1RZLMp%2FJOwMAYfBChZnbpko7E9a%2Fcylv9UipJ%2FFbjC%0AZy6TZcfuB%2Bx2kxklq6OXKmU%2B1sOpEzEiCCfTye%2FfT74A%0A&RelayState=cookie%3A29002348&SigAlg=http%3A%2F%2Fwww.w3.org%2F2000%2F09%2Fxmldsig%23rsa-sha1&Signature=M0xoWQfcN3Yp94T2HiqIdJzEkxYqGc6hhopqi8xOI%2B2BtPSLufFDdQIF7z6Xjm6XdLq1MH9Av5xz2QWYs84ZYhlG3fHtZCjjaoI2wZqplRszHla%2BjtZoW20NGDepDsCRT0AKNkhe%2B4Yj3LshrM6EX5O3obx2Mypy8EcsoURkTF3kf1dwKqsGA3ka7ehbRmUQGJUXD0u4iFBog7YgkL4Q9FYMTanZeRo2X4%2FkAeNxT8ormKWJfYnAzg0F4Ku60zDd5N7jYu4XeyOsXDthEFI5H4WYucAprREl2hgSUI21J782kKzrslalIaJ5BKPIO50NPCIb5Sf6Zw4maLpZrFEfrw%3D%3

If we URL decode the above value, we get:

SigAlg=http://www.w3.org/2000/09/xmldsig#rsa-sha1

In this instance, make sure this SAML relying party trust is configured for SHA-1 as well:

image

 

Is the Application sending a problematic AuthnContextClassRef?

In this case, the user would successfully login to the application through the ADFS server and not the WAP/Proxy or vice-versa. Ultimately, the application can pass certain values in the SAML request that tell ADFS what authentication to enforce. ADFS and the WAP/Proxy servers must support that authentication protocol for the logon to be successful. The following values can be passed by the application:

https://msdn.microsoft.com/en-us/library/hh599318.aspx

image

image

I copy the SAMLRequest value and paste it into SSOCircle decoder:

image

The click XML View:

image

The highlighted value above would ensure that users could only login to the application through the internal ADFS servers since the external-facing WAP/Proxy servers don’t support integrated Windows authentication. You would also see an Event ID 364 stating that the ADFS and/or WAP/Proxy server doesn’t support this authentication mechanism:

image

 

Is there a problem with an individual ADFS Proxy/WAP server?

This one only applies if the user responded to your initial questions that they are coming from outside the corporate network and you haven’t yet resolved the issue based on any of the above steps. There are known scenarios where an ADFS Proxy/WAP will just stop working with the backend ADFS servers. Look at the following on all ADFS Proxy/WAP servers:

  1. ADFS event logs for errors or warnings,
  2. Make sure the ADFS service is running.
  3. Make sure the Proxy/WAP server can resolve the backend ADFS server or VIP of a load balancer.
  4. Obviously make sure the necessary TCP 443 ports are open.  :)
  5. Are you using a gMSA with WIndows 2012 R2? There is a known issue where ADFS will stop working shortly after a   gMSA password change. The following update will resolve this: http://support.microsoft.com/en-us/kb/3032590
  6. There are some known issues where the WAP servers have proxy trust issues with the backend ADFS servers:

http://blogs.technet.com/b/applicationproxyblog/archive/2014/05/28/understanding-and-fixing-proxy-trust-ctl-issues-with-ad-fs-2012-r2-and-web-application-proxy.aspx#pi148362=2

 

3.) The SSO Transaction is Breaking when the User is Sent Back to Application with SAML token

Many of the issues on the application side can be hard to troubleshoot since you may not own the application and the level of support you can with the application vendor can vary greatly. With all the multitude of cloud applications currently present, I won’t be able to demonstrate troubleshooting any of them in particular but we cover the most prevalent issues.

 

Is the URL/endpoint that the token should be submitted back to correct?

If the user is getting error when trying to POST the token back to the application, the issue could be any of the following:

  • The endpoint on the relying party trust in ADFS could be wrong.
  • The endpoint on the relying party trust should be configured for POST binding

If you suspect either of these, review the endpoint tab on the relying party trust and confirm the endpoint and the correct Binding (POST or GET) are selected:

image

  • The client may be having an issue with DNS
  • The application endpoint that accepts tokens just may be offline or having issues. Contact the owner of the application.

 

Is the Token Encryption Certificate configuration correct? (Optional)

This one is hard to troubleshoot because the application will enforce whether token encryption is required or not and depending on the application, it may not provide any feedback about what the issue is. If you suspect that you have token encryption configured but the application doesn’t require it and this may be causing an issue, there are only two things you can do to troubleshoot:

 

  1. Ask the owner of the application whether they require token encryption and if so, confirm the public token encryption certificate with them. Don’t compare names, compare thumbprints.
  2. It’s very possible they don’t have token encryption required but still sent you a token encryption certificate. Remove the token encryption certificate from the configuration on your relying party trust and see whether it resolves the issue. You may encounter that you can’t remove the encryption certificate because the remove button is grayed out. The way to get around this is to first uncheck “Monitor relying party”:

image

To ensure you have a backup of the certificate, export the token encryption certificate first by View>Details>Copy to File. Then you can remove the token encryption certificate:

image

Now test the SSO transaction again to see whether an unencrypted token works.

 

Is the Token Encryption Certificate passing revocation?

Also, ADFS may check the validity and the certificate chain for this token encryption certificate. This configuration is separate on each relying party trust. To check, run:

Get-adfsrelyingpartytrust –name <RP Name>

image

You can see here that ADFS will check the chain on the token encryption certificate. If you would like to confirm this is the issue, test this settings by doing either of the following:

3.) Temporarily Disable Revocation Checking entirely

Set-adfsrelyingpartytrust –targetidentifier “https://shib.cloudready.ms” –encryptioncertificaterevocationcheck “None”

4.) Or run certutil to check the validity and chain of the cert:

certutil –urlfetch –verify c:\users\dgreg\desktop\encryption.cer

 

Does the application have the correct token signing certificate?

This one is hard to troubleshoot because the transaction will bomb out on the application side and depending on the application, you may not get any good feedback or error messages about the issue.. Just make sure that the application owner has the correct, current token signing certificate. Confirm the thumbprint and make sure to get them the certificate in the right format - .cer or .pem.

Here is a .Net web application based on the Windows Identity Foundation (WIF) throwing an error because it doesn’t have the correct token signing certificate configured:

image

 

Does the application have the correct ADFS identifier?

When this is misconfigured, everything will work until the user is sent back to the application with a token from ADFS because the issuer in the SAML token won’t match what the application has configured. At that time, the application will error out. Applications based on the Windows Identity Foundation (WIF) appear to handle ADFS Identifier mismatches without error so this only applies to SAML applications. The default ADFS identifier is:

http://<sts.domain.com>/adfs/services/trust

Notice there is no HTTPS. Confirm what your ADFS identifier is and ensure the application is configured with the same value:

image

 

What claims, claim types, and claims format should be sent? (Optional)

This one is nearly impossible to troubleshoot because most SaaS application don’t provide enough detail error messages to know if the claims you’re sending them are the problem. If we’ve gone through all the above troubleshooting steps and still haven’t resolved it, I will then get a copy of the SAML token, download it as an .xml file and send it to the application owner and tell them:

This is the SAML token I am sending you and your application will not accept it. Tell me what needs to be changed to make this work – claims, claims types, claim formats?

One again, open up fiddler and capture a trace that contains the SAML token you’re trying to send them:

 

image

If you remember from my first ADFS post, I mentioned how the client receives an HTML for with some JavaScript, which instructs the client to post the SAML token back to the application, well that’s the HTML we’re looking for here:

image

Copy the entire SAMLResponse value and paste into SSOCircle decoder and select POST this time since the client was performing a form POST:

image

And then click XML view and you’ll get the XML-based SAML token you were sending the application:

image

Save the file from your browser and send this to the application owner and have them tell you what else is needed. It is their application and they should be responsible for telling you what claims, types, and formats they require. A lot of the time, they don’t know the answer to this question so press on them harder.

 

Environmental or User-specific Issues?

Be sure to check the following:

  • Make sure the service principal name (SPN) is only on the ADFS service account or gMSA:

     Setspn –L <service Account Name or gMSA name>

     Example Service Account: Setspn –L SVC_ADFS

     Setspn –x –f

  • Make sure their browser support integrated Windows authentication and if so, make sure the ADFS URL is in their intranet zone in Internet Explorer.
  • Make sure the DNS record for ADFS is a Host (A) record and not a CNAME record. CNAME records are known to break integrated Windows authentication.
  • Don’t make your ADFS service name match the computer name of any servers in your forest. It will create a duplicate SPN issue and no one will be able to perform integrated Windows Authentication against the ADFS servers.

If the users are external, you should check the event log on the ADFS Proxy or WAP they are using, which bring up a really good point. If you have an ADFS WAP farm with load balancer, how will you know which server they’re using? It’s for this reason, we recommend you modify the sign-on page of every ADFS WAP/Proxy server so the server name is at the bottom of the sign-in page. Then you can ask the user which server they’re on and you’ll know which event log to check out.

 

How is the user authenticating to the application?

Check the following things:

  • If using PhoneFactor, make sure their user account in AD has a phone number populated.
  • If using smartcard, do your smartcards require a middleware like ActivIdentity that could be causing an issue?
  • If using username and password and if you’re on ADFS 2012 R2, have they hit the soft lockout feature, where their account is locked out at the WAP/Proxy but not in the internal AD? Here is another Technet blog that talks about this feature:
  • Or perhaps their account is just locked out in AD. Smile

 

David “Troublemaker” Gregory

Windows Server DHCP Server Migration – Two Issues from the Field

$
0
0

Hey folks – Hilde here with a post about some DHCP migration snafus I've bumped into recently.

Many of my large, enterprise customers have DHCP servers that were deployed over a decade ago, humming along on some flavor of the Windows Server 2003 OS (WS 2003).

For the most part, those DHCP services have been pretty much dial-tone over the years but with the end of support coming for WS 2003 (that date is July 14, 2015, by the way), they are migrating the service to a newer OS version.

You know what they say about sleeping dogs? Well, there's nothing like walking by the sleeping DHCP dog and stomping on his tail.

TOPIC #1 – existing DHCP-registered DNS records get deleted when you disable/delete the scope on the old DHCP server.

  • Scenario –
    • You migrate scopes from OLD-DHCP-01 to NEW-DHCP-01 but you're leaving OLD-DHCP-01 on the wire as it isn't quite ready for decom yet.
    • As part of your migration, once you're done migrating the scope(s), you disable or delete the scope(s) from OLD-DHCP-01
    • Shortly after, your helpdesk lights up like a Christmas tree and there is a flood of issues which all seem to track back to name resolution problems.
    • A quick check of your DNS zone shows a large number of dynamic entries have been deleted from DNS.
    • A reboot or some other kick of the end-point device re-registers the records and for that specific device, service returns to normal.
  • Complications/impacts/Issues –
    • A handful of devices isn't an epic fail but there were 1000s of devices affected; the outage was massive and impacted multiple business units.
    • Some/many of the end-point devices were non-Windows devices (printers, handhelds, etc) and are hard, if not impossible, to remotely reboot/manage at scale and/or in an automated fashion.
    • This was totally unexpected – NEW-DHCP-01 is operational, IP Helper statements are all fixed up, the reservations/leases are all over on that new server. Why did this happen?!?
  • Root Cause –
    • There is a setting in the DHCP server/scope options that says "Discard A and PTR records when lease is deleted"
    • This was checked (and is the default setting) on OLD-DHCP-01 and that server was still active
    • When the scope/lease/reservation was disabled/deleted from OLD-DHCP-01, the next time the server's DHCP cleanup process/thread ran, it went out and deleted the DNS records, as that is what it had in its DHCP DB.
    • The value of this checkbox is recorded for each lease/reservation in the DHCP DB at the time of the lease/reservation creation.In my experiences, I've seen where toggling this setting has no impact on existing leases/reservations.
    • Painful. Consider yourself warned.
  • A possible work-around –
    • Decommission OLD-DHCP-01 as part of the migration effort:
      • Shut the server down
      • Unplug the server from power
      • Remove the power supply(ies) from the server
      • Pull the network cable
      • Put tape over the NIC port(s)
      • Fill the NIC port(s) with hotglue
      • Remove the NIC
      • Cut the server in half with a sawzall
      • All of the above

TOPIC #2 – DHCP authorization details in AD are showing up inconsistently and/or in different locations.  You're wondering what's going on and how/why they got there.

  • Scenario –
    • As part of your migration/clean-up efforts, you have a look at the DHCP Server Authorization records either via the DHCP UI, AD Sites and Services and/or ADSIEdit. Depending on the OS versions used, you may see inconsistent entry formats and/or relics of old DHCP servers long gone/renamed/re-IP'd).

     

    • For WS 2003 RTM DHCP Servers, the authorization record would register as an IP address, not a FQDN (as shown below in the AD Sites and Services MMC and ADSIEdit)

    • In WS 2003 R2 or SP2 (not sure which/when) this was changed and the authorization record would register as a FQDN (as shown below in the UIs)

    • If you look at the NetServices container via ADSIEdit or AD Sites and Services MMC, you might notice some differences here, too:
      • In the Windows 2000 DHCP Server service, the DHCP authorization process would register an entry directly under NetServices, as well as add an entry in the dhcpServers attribute of the DhcpRoot object under NetServices:

         

        Windows 2000-era ADSIEdit (remember this UI?)

         

        Windows 2003-era ADSIEdit

    • In the Windows 2003 DHCP Server service, the DHCP authorization process would write a record directly under NetServices but not add an entry in the dhcpServers attribute of the DhcpRoot object under NetServices.
      • In the screen shots below, notice there is an entry for all three DHCP Servers under NetServices but there is only an entry for the WS 2000 DHCP server in the dhcpServers attribute of the DhcpRoot object – neither the Windows Server 2003 nor Windows 2003 R2 DHCP servers registered a record there.

    • There was also a difference between the Windows 2000 and WS 2003 DHCP MMCs in how they'd display the DHCP "authorized servers" list
      • Windows 2000 Server – Manage Authorized Servers UI
        • It would only look at the dhcpServers attribute on the DhcpRoot object and only list the record(s) it found there (the WS 2000 DHCP server in this case)

      • WS 2003 – Manage Authorize Servers UI (same AD forest as above)
        • It looks at both the dhcpServers attribute (and lists the records there) as well as the NetServices container (and lists the WS 2003 and WS 2003 R2 DHCP server records there).

 

So, what's the bottom line here?

  • As always, test your changes when you can in a lab that isrepresentative of your production environment
    • This is where I plug (again) the idea of testing your DR plans to establish an isolated lab from production backups. This way, you're validating your DR plans and producing a 'like-production' lab for testing. Schema extensions? Got 'em. Specific service accounts, OU structures, GPOs, etc? Got 'em. Throw in restores of your DHCP servers and you've got a lab to test your DHCP migration.
    • Many large enterprises have been around for a long time and have infrastructure that began life as Windows 2000, NT 4.0, or even earlier.
    • In this case, the Windows 2000 Operating System is loooong out of support from Microsoft. I only used that OS so I would have a "historically accurate" representation of a typical infrastructure that has been around for a long time and to illustrate the differences across OS versions of DHCP. If I'd built my lab DHCP servers from the newer OSes, I wouldn't have been able to reproduce these idiosyncrasies.
  • Unexpected things can happen when you stomp on the tail of a dog that's been sleeping for 10 years.

Cheers!

Mike Hildebrand

Active Directory Risk Assessments – Lessons and Tips from the Field – Volume #1?

$
0
0

Greetings – Hilde here to pass along some wisdom for AD shops everywhere.

Recently, I was part of a conversation with a handful of true Active Directory rock-stars here in Premier Field Engineering who have done a lot of AD Risk Assessment Program (RAP) deliveries.

  • As a reminder, the "RAP as a Service" delivery includes a very in-depth scan of a technology (AD, GPO, Failover Cluster, Desktop OS, etc) and provides a thorough review of the scan results, as well a conference call with a RAP-accredited PFE to discuss the results. Reports and remediation plans are generated for the environment and you are licensed to use the RAP as a Service "client" (the scanning tool) for a year to re-scan/review the environment as much as you'd like. Another value-add is the portal for reviewing, analyzing and interacting with your data, results and reports. The portal has evolved to become a superb aspect of the RAP as a Service platform and benefits from frequent updates and continuous improvement

Bryan Zink and the YYZ PFE have been delivering/involved in AD Risk Assessments since they helped develop/originate the program almost 15 years ago.

Two other veteran PFEs – Doug Gabbard and David Morillo - have been delivering AD Risk Assessments for years and also have tremendous depth and experience in AD.

Some of those names you may recognize from this blog or perhaps from your own AD Risk Assessment; these guys are all members of the Hall of Justice for AD.

As you review this, take note of the patterns and use the information to improve the AD environments you are responsible for.

Bryan Zink

After more than 12 years and 500 on-site assessments of customer Active Directory environments, lots of unusual and interesting experiences come to mind. I've had the pleasure of working with customers across all sorts of Industries with AD Forests ranging in size from two Domain Controllers all the way up to more than 3,000. What's probably most noteworthy though are the common scenarios. In no particular order:

Membership counts – As an AD Admin, invest your time and effort to really understand delegation and deploy a manageable least privileged access model. Also, make it your business to keep your groups manageable.

Ignorance is no excuse – Know your subnets, where they're in use and make sure they map to the correct Active Directory sites. LANs and WANs tend to change much more frequently than your AD Topology. If you don't stay on top of this, Users and the helpdesk will be the first to let you know. They will experience everything from slow logons to poor performance to lost data and applications from policies not applying.

Embrace your inner 'Bob Ross' – Performance analysis is more art than science. Size your Domain Controllers with a purpose. Understand how Windows uses CPU, Memory and Disk. Measure and understand what normal really looks like. This will keep late night troubleshooting down to a minimum.

 

YYZ PFE from AZ 

Strict Replication is your friend…

Don't ignore FRS problems and get moving towards DFSR for SYSVOL repl (if you haven't already)…

Backups are cool…

Monitor AD replication…

Subnet definitions are critical…

Change notification should be evaluated …

Preferred Bridgeheads are usually a bad idea…

Limit your time sync/drift thresholds…

Review your application partitions in terms of DNS (DomainDNSZones and ForestDNSZones) – you just might find conflicts, duplicates or other stale/half-moved DNS data …

There aren't many (any?) good reasons for more than two Sites per Site Link…

 

Doug Gabbard

Educate multiple engineers on how to update/use the RAP as a Service client/scanning tool to collect data. Too often, only one person knows how to use the tool.

Likewise, educate multiple engineers how to use the portal to review collected data, reports, issues, etc.

Budget time or plan time or call it whatever you like to perform remediation. Going back to the beginning days of risk assessments, all the way up to today, data is collected, reports are delivered, but little to no remediation is accomplished. Also, many times, the environment is never scanned again and the data collection tools sit idle.

Trust and verify – this is a twist on the "trust but verify" – just a little more friendly. This is one of the best ways to learn more deeply how AD works.

 

David Morillo

My top recommendation after doing 5 years of risk assessments is to decide on an interval, create a reminder and rerun the scanning tool on a regular basis. 7-9 out of 10 customers I've visited don't do much (if anything) with the tool after the initial engagement concludes.

Enforcing and blocking inheritance on GPOs should be used sparingly as these are advanced features and can complicate troubleshooting.

Using SYSVOL to house/replicate file types such as .exe, .msi or DLLs is not recommended as doing so could delay the promotion of a DC, increase replication traffic and cause excessive disk utilization, among other things.

Many times, there is a lack of clarity/understanding between the "backup team" and the "AD Team" and proper backups are not configured, not configured properly or don't follow best practices. Backing up 2 domain controllers per domain with Windows Server Backup is a simple thing to configure and a very cost-effective insurance policy in the event a recovery is needed.

Environments without subnets defined in AD Sites & Services is very common. These can have a big impact on client performance and are very easy to remediate in most situations.

Keeping up with patches should be a normal process for most IT shops. However, we often find one or two DCs that are missing one or more critical security patches and/or DCs that are missing non-security updates such as the rollups that have become very important for proactive operational health. One such example is the "Enterprise hotfix rollup" for Windows Server 2008 R2 (it applies to Windows 7, too, so get it on your clients) - https://support.microsoft.com/en-us/kb/2775511/en-us. No good conversation about patches should stop at the OS – we see old drivers, old firmware and another item often overlooked are the VM integration components – they're either out of date or mis-matched (or both) across DCs. Be disciplined.

 

A few from Hilde

Practice DR – setup Outlook calendar entries with reminders to test your recovery processes (restore a user account, a group, an OU, a GPO, a DC, the whole Forest) at recurring intervals throughout the year. Don't wait until you need your recovery skills, documentation and data to discover that there is a problem with one or more of those aspects.

Accurately document the environmental and system settings (and maintain the docs).  AD Sites, Site Links, DC/DNS configurations and many other settings are automatically captured for an entire AD Forest via the RAP as a Service client tools - which is supremelyvery helpful.  You can use that collected data and copy/paste it into Word/Excel files as an excellent starting point for documentation but don't stop there - consider GP Links, DNS zone replication details, member server NIC settings, DHCP Scope settings, etc.  We all know there tends to be configuration drift and subtle (or drastic) environmental changes over the years; take a look at your docs (if there are any).  I'd bet they are likely no longer accurate.

Protect AD from accidents – broadly enable "accidental deletion" preventions on AD objects including OUs, DNS zones, etc. We all make mistakes unless we are prevented from doing so.

 

There you have it folks. A handful of great tips for improved AD health and reduced risk.

Notice, there was nothing unexpected here. No magic wands or secret potions. No cryptic registry settings or secret hotfixes.

Just ditches to be dug.

Here's your shovel … get to work!

If this post goes over well, keep your ear to the tracks for a "Volume #2" ... 

Cheers!

 

Hilde

Connect to a Home Network from Anywhere via Azure, Point to Site and Site to Site VPNs

$
0
0

Hi all, this is JJ Streicher-Bremer, Platforms PFE. A long time reader, first time poster.

I was recently onsite with a customer and using their guest wireless network. I wanted to connect into my lab network at home for a demo and like many of you, I have enabled port forwarding on my Internet router for the RDP protocol into my home lab.

Typically, this works great and I have direct connectivity from the Internet at-large into my home lab but on that day, it failed via the guest network. I tried connecting through the Internet-sharing feature of my phone, though, and I was able to connect.

I asked the customer about port restrictions on the guest network and sure enough, their security team was blocking everything except "standard web ports". This explained why I couldn't connect to RDP via their network but my phone tether was fine.

Since I have been digging more and more into Azure IaaS and, in particular, networking in Azure, I thought there might be a way to solve this port blocking issue with "The Cloud."

I had already followed some great blog posts to get my on-premises home lab network connected into a network up in Azure through the use of a Site to Site VPN (S2S).

http://blogs.technet.com/b/askpfeplat/archive/2014/03/03/connect-an-on-premises-network-to-azure-via-site-to-site-vpn-and-extend-your-active-directory-onto-an-iaas-vm-dc-in-azure.aspx

Also, I had a way to connect from my Internet-connected client directly into my Azure network through a Point to Site (P2S) VPN.

http://blogs.msdn.com/b/piyushranjan/archive/2013/06/01/point-to-site-vpn-in-azure-virtual-networks.aspx and http://blogs.technet.com/b/cbernier/archive/2013/08/21/windows-azure-how-to-point-to-site-vpn-walk-through.aspx

With all this hard work done for me by my esteemed colleagues, I had a VPN that could connect my client into Azure and another VPN that could connect from Azure to my lab. It seemed I should take the next logical step. In my mind that step is figuring out how to VPN from my client into Azure and gain access to my "on-premises" home lab network.

Thus, we have the topic of this blog post.

To start with, let's set up prerequisites, define some terms, and document some of the network settings.

The expectation is that you, like me, have already created a S2S (Site to Site) VPN connecting your "Lab Under The Stairs," or LUTS, into Azure and a P2S (Point to Site) VPN connecting a client PC into Azure. If you have not, please follow the above links to complete those steps and then come back here.

The Setup

Here are the networks and addresses I'll be working with in this post - use appropriate network addresses for your specific setup.

LUTS network 

172.16.16.0/24 

LUTS network default gateway 

172.16.16.1 

LUTS network RRAS VPN internal address 

172.16.16.164

Azure network 

172.16.17.0/24 

P2S VPN Client network 

172.16.18.0/24 

 

 

I know that the Azure network understands how to send packets to my LUTS network (through the definition of the local networks used in the Azure Gateway configuration) and how to send packets to my P2S VPN clients.

From here, let's start by looking at my P2S client side routing table (below) while I was VPNed into my Azure network (the gray laptop in the above diagram).

I can see from the highlighted routes that my client knows that to send packets to the 172.16.17.0/24 network (in Azure), they need to be sent via the 172.16.18.2 NIC on my client PC.

Indeed, this works because I can ping a server on that Azure network. I also notice that my client knows that to send packets to anything on the entire 172.16.0.0/16 network, the next hop is 172.16.18.0.

So, in theory, my client knows how to send packets all the way to my LUTS network (172.16.16.0/24).

Unfortunately, a ping from my client PC shows that something is amiss here.

 

The Problem

Based on the above, the client knows how to send packets to a system on the LUTS network but perhaps there isn't a route back from the LUTS network to the client on the P2S VPN.

As you can see in the route print below, from a server in my lab, it does not know of a route to the P2S VPN network.

Interestingly, there is no route to the Azure network either.

Taking a quick step back, I want to note that in Mike Hildebrand's post (referenced above), he took the simple path and used the RRAS server as his default gateway: all network traffic leaving his lab network has to go through the RRAS server.

In my case, I didn't take the simple path, so my default gateway IP (172.16.16.1) does not equal the internal IP address of my RRAS server (172.16.16.164).

The Fix – Part I

I had a couple of options to make this work; edit each host or edit the network.

  • Edit the hosts
    • I could add a static route on all my LUTS systems, essentially telling them that traffic bound for my Azure network has to go through my RRAS server.
    • The command would look similar to this: "route add 172.16.17.0 mask 255.255.255.0 172.16.16.164".
    • This does work and if I added the "–p" switch to the command it would even make the route persistent across reboots.

     

  • Edit the network
    • The other option is to fix the network, rather than each individual host, so I looked at my network router to see what I could do there.
    • My router allows me to add static routes - your router may/may not have this feature
    • This means that a packet leaving my LUTS server headed for the Azure network would first go to the default gateway (172.16.16.1) where it would be re-directed to my RRAS server (172.16.16.164) for forwarding up to Azure.

 

Knowing that I can fix my network (rather than configuring each host) I figured I just needed to add another route that points traffic headed for my P2S VPN clients to the RRAS server as well (router UI screenshot below):

I set that up but still no joy, my pings were still timing out…

The next troubleshooting step for me was to get a network trace from my LUTS server to make sure that packets were actually getting to the LUTS server and to see where they headed after they left.

Using one of my favorite commands "netsh trace start capture=yes", I captured some traffic while I was pinging from my P2S VPN client to the LUTS server.

Looking at the ICMP traffic in that trace, it shows that my client does get the "echo request" packet from the P2S VPN client (great news!).

 

I see that the LUTS server tries to send out a packet back to 172.16.16.8 (P2S VPN client) and is redirected by my default gateway to the RRAS system (172.16.16.164).

WAIT - why is my client being re-directed from my RRAS system back to the default gateway??

Could it be that my RRAS system doesn't know how to route packets back to the Azure P2S VPN network?

The Fix – Part II

A quick check of the routing table on the RRAS server shows that it really has no clue how to route traffic back to the P2S VPN network. It was just handing the traffic back to its default gateway, which redirected it back to the RRAS server.

The good news is that I can rectify that issue with a simple addition to the static routes in the RRAS console:

After that last piece, SUCCESS!!!

 

With this post, my intent was to discuss connectivity to a home lab through Azure and some of the routing challenges I encountered, as well as some of the troubleshooting and resolution steps.

Thanks for reading and please let me know your thoughts.

JJ "PING -T" Streicher-Bremer

DNS Policies in Windows Server 2016 Tech Preview 2

$
0
0

Hello - Gary Green and Mike Kline here to bring you Ask PFE Plat's very first post regarding Windows Server 2016 (well, Technical Preview #2, to be specific)!

Over the years, Microsoft Windows Server DNS has provided excellent functionality and a frequently-expanding feature-set for our customers.  Our friends in the DNS Product Group are hard at work on some GREAT new features for the next version of Windows Server.

One such feature is DNS Policies.

DNS Policies allow you to control how a DNS Server handles queries/responses based on various parameters such as client IP subnet, the IP address of the network interface which received the DNS request, or even the time of day.

One use-case for a DNS Policy is the ability to provide clients geographically-appropriate resources for a given name, based on the client's IP address.

Another common configuration for many customers is some sort of "split-brain" DNS where the same DNS domain name (i.e. CONTOSO.COM) is used both on the Internet and on the internal corporate network but the name may resolve to different internal/external IP addresses. With DNS Policies, this configuration can be more easily set up.

One of the advantages of an elastic infrastructure is the ability to scale resources up or down as needed. One way DNS Policies can help with this is via the "time of day" parameter – it can shift load to certain IP addresses during certain times, such as off-hours.

Some clarifying details/notes:

  • As mentioned, this information applies to Technical Preview #2 - and is subject to change
  • Currently, DNS Policies can only be configured via PowerShell
  • DNS Policies will work only on Windows Server vNext/2016 DNS servers
    • Also, all DNS servers hosting a policy-controlled zone must be WS 2016 to take advantage of this functionality.
    • Clients can be any version
  • At present, DNS Policies are configured and stored locally on each DNS server, but they can be easily deployed across DNS servers using PowerShell
  • Zones and their scopes (note: not referring to DHCP scopes here) must be in file-backed zones. We're working on AD-integrated zone support
  • You cannot add scopes on Conditional forwarders

 

The DNS Product Group published several great blog posts for DNS Policy implementation:

 

Also, take a look at Microsoft PowerShell MVP Jan Egil Ring's post about DNS Policies:

 

We'll certainly be blogging more about Windows Server 2016 (and Windows 10, of course) but while we've got your ear about DNS, we're planning a DNS Q & A with one of the PMs for the DNS Product Group at Microsoft.

Use the comments below to post your burning DNS questions (about these new Policies or anything else Windows DNS related) and look for a future post where we'll discuss some of those questions.

Gary Green and Mike Kline signing off for now…

Mailbag: Hi Mom! (Issue #12)

$
0
0

Greetings all – Mike Hildebrand, Mike Kline and a few others here with some musings for an out of sorts "Monday Mailbag." 

 

It's the end of an era…

Last week, Windows Server 2003 reached its "End of Extended Support" date (July 14, 2015). Hopefully, you have mostly fond memories of it; I know I do.

 

As one era comes to an end, another begins …

We're getting closer and closer to Windows 10 launch day and I, for one, am super excited.  Gabe Aul has some insights for you about build 10240 on the Windows Insider blog

 

In preparation for Windows 10 launch day on July 29, get ready to "Upgrade your World"

 

Q – I can't get enough Windows 10 technical details. Where can I find out more?

A – The Windows 10 Hardware Developer site has a lot of excellent technical depth and it's not only for devs. For example, anyone interested in Start menu customization? There is a ton of information there.

 

Q – We use Key Management Service (KMS) for product activation. What's the skinny on activating Windows 10 clients via KMS?

A - If your KMS host is running on Windows Server 2012/R2, you'll need to apply a hotfix to your KMS host to support Windows 10 activation

 

Q – My mom continues to call me every few days, usually around 5:30 a.m. ("I didn't wake you up, did I?") with questions about her "specially reserved" free copy of Windows 10. Can I just send her a link with answers to all of her questions?

A – Not sure we can state we have "all the answers" to your mom's questions but these FAQ resources are pretty darn helpful.

 

As the OS train of progress moves on down the tracks, so does our list of authors …

Mark Morowczynski, one of the founders of this blog and the current leader of the band, will be heading to Redmond, where he talked his way into a gig with the Azure AD Product Group. He's leaving PFE and Chicago behind him and while the Windy City will be a little less windy without him, he will still be twitting/tweeting/twitching - https://twitter.com/markmorow- and look for him to start making noise over on the AAD blog http://blogs.technet.com/b/ad/ . We all wish him well on the next phase of his career and we'll miss his ballyhoo and harassment contributions.

As we reflected back on our writers and the body of work here, we felt we should take a moment to recognize other contributors who have also climbed aboard the train to somewhere else:

  • Doug Symalla – one of the founders
  • Marty Lucas
  • Jake Mowrer
  • Jeff Stokes
  • Joao Botto
  • Milad Aslaner
  • Anyone else who slipped out the back door 

 

Cheers and Happy Monday!


Mailbag: The Windows 10 Double-Bubble Edition (Issue #13)

$
0
0

Welcome to the last Friday before Windows 10 launches! If you've been playing along at home, you likely realized that we did a mailbag post just a few days ago - on Monday of this week. Well, what can we say? It's busy-times here at Microsoft.  Many of us are involved in some form of Windows 10 mania, so we have some fresh Windows 10 info we wanted to share with you. We have some other topics, too, so get yourself a cold can of Mountain Dew KickStart and let's roll.

 

Q – I've heard some interesting things about the new Active Directory security analysis and incident detection product but I'm unclear if it's an on-prem solution or is it cloud-based?

A - Microsoft Advanced Threat Analytics or "ATA" is an on-prem solution that is pretty amazing. Don't take my word for it, though, take it straight from the Principal PM's mouth:

http://blogs.technet.com/b/ad/archive/2015/07/22/microsoft-advanced-threat-analytics-coming-next-month.aspx

 

Q – We have had DirSync in place for a while for synchronizing on-prem AD to Azure AD for O365. We're looking to get to Azure AD Connect but are leery of the upgrade process … how should we move forward?

A – We're not one to re-invent the wheel if we can help it – there is a very thorough post that's got your covered (of course, you should always test any significant changes in a lab).

https://azure.microsoft.com/en-us/documentation/articles/active-directory-aadconnect-dirsync-upgrade-get-started/

 

Q - I have setup a custom Azure RemoteApp Cloud deployment that I need to update so I can publish a couple more applications.  I don't have the original template image available any longer.  Can I do that and if so, how?

A – Our own Jim Kelly has encountered this very question, so he wrote up the process here for everyone's benefit

http://www.jkazure.com/

 

Q – I want to get (re) certified but I have some "known unknowns" as well as likely some "unknown unknowns". I don't really know what to expect. Any advice?

A – We're PFEs; we always have advice J

  1. This is the main page for Microsoft certification info and should likely be an early stop for you 
    1. https://www.microsoft.com/learning/en-us/certification-overview.aspx
  2. Then, go here and find the exam/cert you're looking to get – this provides the skills measured and offers many links to resources for prep
    1. https://www.microsoft.com/learning/en-us/exam-list.aspx
  3. Visit the TechNet eval center to get fully-functional trial versions of products (and other resources) to gain knowledge and hands-on experience with the product(s)
    1. https://www.microsoft.com/en-us/evalcenter/
  4. With the free 'second shot' offer, you can take a test between July 12, 2015 and Jan 12, 2016 and if the result is "less than passing," double-down on your prep efforts and then sign up to take the exam again at no additional cost. Another way to look at this is – you pay for the second test and the first one is a no-cost practice test. In any event, I'll bet $5 of Moser's money that as long as you prep appropriately, you'll pass one or the other.
    1. https://borntolearn.mslearn.net/b/weblog/archive/2015/07/12/the-return-of-second-shot-in-july-2015
  5. One of our prior posts, even though it is a bit dated, covers some training options
    1. http://blogs.technet.com/b/askpfeplat/archive/2014/02/19/alternative-microsoft-training-options-many-free.aspx

 

Q – I'm confused about the Windows 10 in-place upgrade. There were some updates for Windows 7 and/or 8.1 that are related to the upgrade but the details in those KBs were sketchy at best. I've seen forums and other sites refer to various registry keys, GPOs/ADMX files, the Windows logo tray app (GWX), etc. Help me help myself.

A – Very recently, we've updated two KBs and published a new one with a plethora of details and guidance for how to control the many aspects of the Windows 10 in-place upgrade.

 

Q – I'm excited for the Windows 10 launch. How can I be a part of the experience?

A - In our last post, we mentioned the UpgradeYourWorld program and another great way to be a part of the action is to head to your nearest Microsoft Store:

http://blogs.windows.com/bloggingwindows/2015/07/20/join-microsoft-stores-to-celebrate-windows-10-with-special-events-guest-appearances-workshops-and-more/

 

Lastly, if you're in IT, you've likely been using or working with/on Windows in some form or fashion for quite some time. You may have been a part of past Windows releases, deployments, and/or operations from WFW, 3.1, 4.0, 95, 98, Millennium, 2000, XP/2003, Vista/2008, 7/2008 R2, 8.0/2012, 8.1/2012 R2.  

For us platforms-focused PFEs, a new release of the Windows OS is always exciting but Windows 10 seems to have a higher-than usual level of excitement and positive energy.  We think you'll love it.

 

Cheers and Happy Friday from Hilde, Dan Cuomo, Mike Kline and Jim Kelly

Third-party Active Directory Migration Tools and KB 3070083

$
0
0

Hello, Chad Munkelt here with my very first post for the Ask PFE Platforms blog.

I wanted to discuss a new hotfix that Microsoft released recently:

This hotfix was created to address an issue with third-party Active Directory migration tools that receive a duplicate Service Principle Name (SPN) error when trying to migrate users or computers within the same forest.

In Windows Server 2012 R2, we introduced SPN uniqueness checks/blocks which ensure applications or administrators aren't able to create objects in Active Directory with the same SPN as another object.

Typically, preventing duplicate SPNs is a great idea. Duplicate SPNs can cause issues, including Kerberos authentication problems or application failures. However, there are some situations/tools that require the ability to bypass the duplicate SPN check in order to function properly.

A prime example would be a third-party Active Directory migration tool or even the built in command NETDOM. When these tools are used, the SPN uniqueness check prevents the application from fully moving or migrating computers and users, and will often error out.

In order for these applications to work properly, the hotfix alters AD behavior via the dSHeuristics setting in Active Directory and allows the SPN uniqueness check to be bypassed.

This may be useful to individuals who are running all Windows Server 2012 R2 domain controllers, and need to do an intra-forest migration using third party Active Directory migration tools.

  • Note - Microsoft's Active Directory Migration Tool (ADMT) isn't impacted by this this issue

!! CAUTIONARY NOTES !!

  1. This is a temporary change to get through a very specific scenario. Once complete, it is STRONGLY recommended to revert the change to ensure the AD retains its built-in protection mechanisms. Take a screenshot of the current/pre-changed value so you can revert the change when you no longer need this setting enabled.
  2. Changing the dSHeuristics settings in this manner is for a very specific scenario and is not recommended unless you are experiencing the same issues.
  3. Changing this setting should not be necessary if you are using Microsoft ADMT.
  4. Test all changes in a non-production environment or test lab first.
  5. This hotfix requires a restart of the target domain controller after it is installed.
  6. Ensure you have gone through all of your change control processes prior to implementing the change in the production environment.
  7. Changing or modifying the Configuration Container is a forest wide change. It is replicated out to all domain controllers, change with caution.
  8. Before you ever change or modify the Configuration Container ensure you have a current, valid Active Directory backup and you've verified you can restore.
  9. This does not disable the all SPN uniqueness checks across the board – this only affects manually setting of a duplicate SPN in ADSI Edit or using NETDOM.

 

Details

We didn't scare you away? Carry on, wayward son…

You can configure the dSHeuristics to either bypass UPN, SPN, or both UPN and SPN checking. The supported values tied to this function and this specific aspect of dSHeuristics are listed below. You can find out more about the various dSHeuristics settings from here:

The following are the supported dSHeuristics values for this situation:

  1. dSHeuristic 21st char = 1: AD DS allows adding duplicate user principal names (UPNs)
  2. dSHeuristic 21st char = 2: AD DS allows adding duplicate service principal names (SPNs)
  3. dSHeuristic 21st char = 3: AD DS allows adding duplicate SPNs and UPNs
  4. dSHeuristic = Any other value: AD DS enforces uniqueness check for both SPNs and UPNs

Examples (assuming you have the default dSHeuristic values to begin with – which you may/may not have):

  1. For disabling UPN uniqueness check, set the 21st character of dSHeuristics to "1" (000000000100000000021)
  2. For disabling SPN uniqueness check, set the 21st character of dSHeuristics to "2" (000000000100000000022)
  3. For disabling UPN and SPN uniqueness checks, set the 21st character of dSHeuristics to "3" (000000000100000000023)

 

Download and Install the Hotfix

In order to make the changes necessary, you need to download the hotfix from KB3070083 and ensure you have met the KB prerequisites. It is recommended that you install the hotfix on all domain controllers that will be used during the Active Directory migration and recall, the install WILL require a reboot.

 

Configure dSHeuristics to Disable SPN Uniqueness Check

You can modify the attribute through ADSI Edit, LDP.exe, and "Get-ADObject/Set-ADObject" AD PowerShell cmdlets. Please note that in all of the examples I am using the domain Contoso.com, ensure you replace that with your domain.

Using PowerShell to modify the dSHeuristics.

One of the easiest ways to modify the dSHeuristics attribute is through PowerShell. The commands for using PowerShell are as follows:

Note: This value may not be the exact same value you have, this is replacing the default value my lab has, and changing the 21st character. Ensure you do a Get-ADObject first and accurately record the value, taking care below to only change the 21st character.

  • To check dSHeuristics value using get-adobject
    • (get-adobject "CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=Contoso,DC=com" –Properties dsheuristics).dsheuristics
  • For disabling UPN uniqueness check, set the 21st character of dSHeuristics to "1".
    • Set-adobject 'CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=Contoso,DC=com' –replace @{dsheuristics='000000000100000000021'}
  • For disabling SPN uniqueness check, set the 21st character of dSHeuristics to "2".
    • Set-adobject 'CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=Contoso,DC=com' –replace @{dsheuristics='000000000100000000022'}
  • For disabling UPN and SPN uniqueness checks, set the 21st character of dSHeuristics to "3".
    • Set-adobject 'CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=Contoso,DC=com' –replace @{dsheuristics='000000000100000000023'}
  • For setting dSHeuristics back to the default value of <not set>
    • Set-adobject 'CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=Contoso,DC=com' –clear dsheuristics
    • Note: If you had a value set for dSHeuristics set prior to making this change that will also be removed with the clear command.

 

Below is an example using the PowerShell ISE to set and view the dSHeuristics value. We used line continuation (back tick `) for readability purposes.

 

 

For those that are not comfortable with using PowerShell or want to use another method, you can use ADSI Edit or LDP.exe.

Below are two sets of instructions – quick, for those familiar with ADSI Edit and LDP and a longer set for those not as familiar with ADSI Edit or LDP.

Quick Steps - Using ADSI Edit

  1. In the left pane of ADSIEdit, right-click ADSI Edit and select Connect to from the menu.
  2. Select Configuration from the Select a well-known Naming Context menu and click OK.
  3. In the left pane, expand Configuration, CN=Services, CN=Windows NT. In the right pane, right-click CN=Directory Service, and select Properties from the menu.
  4. In the CN=Directory Service Properties dialog box, select dSHeuristics on the Attribute Editor tab and click Edit.
  5. In the String Attribute Editor dialog box, type 000000000100000000022 to disable NETBIOS based SPN uniqueness check, and click OK.
  6. Click OK in the CN=Directory Service Properties dialog box and close ADSI Edit.

 

Quick Steps - Using LDP

  1. Open LDP.exe
  2. Click on Connection and select Connect
  3. Enter the name or IP address of a domain controller
  4. Click on connection and select Bind…
  5. Chose a Bind Type and click OK.
  6. Click on View and select Tree.
  7. Select CN=Configuration,DC=CONTOSO,DC=COM for Base DN, click OK.
  8. Drill down to CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=Contoso,DC=Com and right click and select Modify.
  9. Enter the below information: Please note that you must know your current value and only change the 21st character in the values field.
    1. Attribute: dSHeuristics
    2. Values: 00000000010000000022
    3. Operation: Replace
  10. Click on Enter and then Run.

 

Longer Steps – Using ADSI Edit (captured here with the awesome Problem Steps Recorder)

Step 1: Open Server Manager

Step 2: Click on "Tools"

 

 

Step 3: Click on "ADSI Edit".

 

 

Step 4: Click on "ADSI Edit (tree item)" in "ADSI Edit".

 

 

Step 5: Click on "Connect to... ".

 

 

Step 6: Under "Select a well know Naming Context" click on the drop down in "Connection Settings".

 

 

Step 7: Click on "Configuration (list item)".

 

 

Step 8: Click on "OK (button)" in "Connection Settings".

 

 

Step 9: Click on "Configuration [CONTOSO-DC01.CONTOSO.COM] (tree item)" in "ADSI Edit".

 

 

Step 10: Double click on "CN=Configuration,DC=CONTOSO,DC=COM (tree item)" in "ADSI Edit". Choosing your domain, obviously.

 

 

Step 11: Double click on "CN=Services (tree item)" in "ADSI Edit".

 

 

Step 12: Click on "CN=Windows NT (tree item)" in "ADSI Edit".

 

 

Step 13: Right click on "CN=Directory Service (edit)" in "ADSI Edit".

 

 

Step 14: Click on "Properties (menu item)".

 

 

Step 15: Click on "dSHeuristics" in "CN=Directory Service Properties".

 

 

Step 16: Click on "Edit (button)" in "CN=Directory Service Properties" and take a screenshot of the current value so you can revert the change when you no longer need this setting enabled.

 

 

Step 17: Input dSHeuristics value.

  • For disabling UPN uniqueness check, set the 21st character of dSHeuristics to "1" (000000000100000000021)
  • For disabling SPN uniqueness check, set the 21st character of dSHeuristics to "2" (000000000100000000022)
  • For disabling UPN and SPN uniqueness checks, set the 21st character of dSHeuristics to "3" (000000000100000000023)

     

 

Step 18: Click on "OK" in "String Attribute Editor".

 

 

Step 19: Click on "OK" in "CN=Directory Service Properties".

 

Step 20: Allow the setting to replicate out to all domain controllers prior to testing the setting.

 

Longer Steps - Using LDP

Step 1: Right click PowerShell and select "Administrator: Windows PowerShell ".

 

 

 

Step 2: Enter ldp.exe in the PowerShell window.

 

 

Step 3: Click on Connection.

 

Step 4: Click on "Connect... (menu item)"

 

 

Step 5: Click on "Server: (edit)" in "Connect".

  

 

 

Step 6: Input the name or IP Address of your domain Controller in "Server: (edit)" in "Connect". Click on "OK (button)" in "Connect".

 

 

Step 7: Click on "Connection (menu item)" in "ldap://CONTOSO-DC02.CONTOSO.COM/DC=CONTOSO,DC=COM". Please note that this will be your domain name and not Contoso.

 

 

Step 8: Click on "Bind... (menu item)".

 

 

Step 9: Click on "Bind as currently logged on user (radio button)" in "Bind". At this point ensure you use credentials that have permissions to modify the Configuration Container. If you are not logged in with the applicable rights, choose "Bind with credentials".

 

 

Step 10: Click on "OK (button)" in "Bind".

 

 

Step 11: Click on "View (menu item)" in "ldap://CONTOSO-DC02.CONTOSO.COM/DC=CONTOSO,DC=COM".

  

 
 

Step 12: Click on "Tree (menu item)".

 

 

Step 13: Click on "Open (button)" in "Tree View".

 

 

Step 14: Click on "CN=Configuration,DC=CONTOSO,DC=COM (list item)". Please note this is your domain, not Contoso.

 

 

Step 15: Click on "OK (button)" in "Tree View".

 

 

Step 16: Click on "CN=Configuration,DC=CONTOSO,DC=COM (tree item)" in "ldap://CONTOSO-DC02.CONTOSO.COM/DC=CONTOSO,DC=COM".

 

 

Step 17: Double click on "CN=Services,CN=Configuration,DC=CONTOSO,DC=COM (tree item)" in "ldap://CONTOSO-DC02.CONTOSO.COM/DC=CONTOSO,DC=COM".

 

 

Step 18: Double click on "CN=Windows NT,CN=Services,CN=Configuration,DC=CONTOSO,DC=COM (tree item)" in "ldap://CONTOSO-DC02.CONTOSO.COM/DC=CONTOSO,DC=COM".

 

 

Step 19: Click on "CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=CONTOSO,DC=COM (tree item)" in "ldap://CONTOSO-DC02.CONTOSO.COM/DC=CONTOSO,DC=COM".

 

 

Step 20: Right click on "CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=CONTOSO,DC=COM (tree item)" in "ldap://CONTOSO-DC02.CONTOSO.COM/DC=CONTOSO,DC=COM".

 

 

Step 21: Click on "Search (menu item)".

  

 

 

Step 22: Click on "Open (button)" in "Search".

 

 

Step 23: Click on "CN=Configuration,DC=CONTOSO,DC=COM (list item)".

 

 

Step 24: Left click on "Filter: (edit)" in "Search".

 

 

Step 25: Click on "Filter: (edit)" in "Search".

 

 

Step 26: Input (dSHeuristics=*).in "Filter: (edit)" in "Search".

 

 

Step 27: Click on "Subtree (radio button)" in "Search".

 

 

Step 28: Click on "Attributes: (edit)" in "Search" and enter a "*" without quotes.

 

 

Step 29: Click on "Run (button)" in "Search".

 

 

Step 30: You can see the current value in the right hand side. Record the pre-change value is, so you can change it back when you are done. Click on "Close (button)" in "Search".

 

 

Step 31: Click on "CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=CONTOSO,DC=COM (tree item)" in "ldap://CONTOSO-DC02.CONTOSO.COM/DC=CONTOSO,DC=COM".

 

 

Step 32: Right click on "CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=CONTOSO,DC=COM (tree item)" in "ldap://CONTOSO-DC02.CONTOSO.COM/DC=CONTOSO,DC=COM".

 

 

Step 33: Click on "Modify (menu item)".

 

 

Step 34: Click on "Attribute: (edit)" in "Modify".

 

 

Step 35: Input dSHeuristics on "Attribute: (edit)" in "Modify".

 

 

Step 36: Click on "Values: (edit)" in "Modify".

 

 

Step 37: Input the new dSHeuristics Value "Values: (edit)" in "Modify".

  • For disabling UPN uniqueness check, set the 21st character of dSHeuristics to "1" (000000000100000000021)
  • For disabling SPN uniqueness check, set the 21st character of dSHeuristics to "2" (000000000100000000022)
  • For disabling UPN and SPN uniqueness checks, set the 21st character of dSHeuristics to "3" (000000000100000000023)

 

 

Step 38: Click on "Replace (radio button)" in "Modify".

 

 

Step 39: Click on "Enter (button)" in "Modify".

 

 

Step 40: Click on "Run (button)" in "Modify".

 

 

Step 41: Click on "Close (button)" in "Modify".

 

 

Step 42: To ensure the value has changed you can re-run the search you performed earlier to verify the value changed.

 

Allow the changes to replicate across AD and you should be able to now use your third party AD migration tools without getting the duplicate SPN error!

 

Thanks to Mike Kline and Mike Hildebrand for the help and review.  Cheers!

 

Chad

Diving into the Netlogon Parser (v3.5) for Message Analyzer

$
0
0

Brandon Wilson here again talking to you about the next generation of the Netlogon parser for Message Analyzer, which is available with the installation of Message Analyzer 1.3.1. Some of this is going to sound familiar if you read my blog on the v1.1.4 parser…you’ll also notice the format of this blog is pretty much the same, but there are some additions on how to properly filter when using the parser (for the better, I promise). Before I continue on, if you for some reason can’t move to Message Analyzer 1.3.1, I should also mention that the link at the bottom of this page can be used to download the Netlogon parser v3.5 as well as the “Netlogon View” for the analysis grid so you can implement the changes into Message Analyzer 1.1, 1.2, or 1.3.

This next generation version of the Netlogon parser (v3.5) was developed with significant enhancements to both performance and problem diagnosis/troubleshooting. When I say significant improvements, I mean just that! I would like to take the opportunity to give a shout out to the Message Analyzer product group for all of their assistance with everything from development to sanity checks to blog reviews as well! As with the release of the Netlogon parser v1.1.4, this version is compatible with Message Analyzer 1.3.1, and is backwards compatible with Message Analyzer 1.1, 1.2, and 1.3 (with some caveats that are explained below). Since the test platform for this particular version of the parser was Message Analyzer 1.3.1, we will use that for examples in this blog.

The interface for Message Analyzer 1.3 and 1.3.1 has changed a bit since, and I’ll try to touch on the areas pertinent to the Netlogon parser here, but outside of the GUI changes the pertinent methods for troubleshooting and parsing using the Netlogon parser are the same as we’ve went over in the previous blog posts, however some of the updates include the ability to easily filter out account warnings and problem identification. If you haven’t reviewed the previous blog posts, these are essential reading for proper usage of the Netlogon parser, and you should review the Introduction blog , the Troubleshooting Basics for the Netlogon Parser for Message Analyzer blog, and the New Features in the Netlogon Parser (v1.1.4) for Message Analyzeras pre-requisites, which cover some of the main features and troubleshooting techniques that were available in v1.0.1 (the initial public release) and v1.1.4 of the Netlogon parser.

It would also be a good idea to get a handle on Netlogon error codes from the Quick Reference: Troubleshooting Netlogon Error Codes blog and troubleshooting MaxConcurrentApi issues in the Quick Reference: Troubleshooting, Diagnosing, and Tuning MaxConcurrentApi Issuesblog, both of which can help guide you to proper troubleshooting and root cause analysis for Netlogon related issues.

As I said in my last blog on v1.1.4, I talk about versions a lot when it comes to the Netlogon parser but in reality, as of the date of this post, they are all named Netlogon.config, and the only way for you to truly know the version you have is to open the file and look at the version table at the top. Trust me, I keep that table up to date (if I didn’t, I would lose track of what I’m working on….again)! The previous versions (1.0.1 and 1.1.4) had many features to help you understand and diagnose Netlogon issues (with v1.1.4 having significant advantages). I’ve said it already, but because it excites me for some oddball reason, I can’t stress how significant the updates are in this version.

As with all of our parsers, this is provided as is, however there are a few different feedback mechanisms for you to use (and I DO want to see your feedback). You can use the feedback link/button in Message Analyzer, reach out in the Message Analyzer forum, you can send an email to MANetlogon@microsoft.comwhere you can submit ideas and any problems or buggy behavior you may run across, or of course you can leave a comment here. I highly recommend reaching out through one of the available methods to provide your suggestions for additional detections to add, problems you encounter, etc.

You can also read up more on Message Analyzer itself at http://blogs.technet.com/MessageAnalyzer

In this walkthrough, we will cover the following:

GUI changes in Message Analyzer 1.3 and 1.3.1

Updates and New Detection Features in the Netlogon Parser v3.5

Known issues

Filtering your output

How to update the Netlogon parser manually to v3.5

How to add the new "Netlogon Analysis" grid view

Reference links

GUI changes in Message Analyzer 1.3 and 1.3.1

The primary UI in Message Analyzer 1.3 and 1.3.1 is much the same as Message Analyzer 1.2. The “Hide Operations” function is, ironically, hidden a bit more and has been renamed to “Show Messages Only”. BUT, with the Netlogon parser v3.5, it becomes unnecessary to use. That’s because in this iteration, very few operations are used. With that being said, here’s a basic view of the GUI before you open any logs:

image

The “Show Messages Only” option is found in the Tools menu under Windows à Viewpoint. Once you select it, the Viewpoint tab will now appear in the bottom right. As you can see in the below screenshot, all you need to do is click on the Operations dropdown and select “Show Messages Only”.

image

 

Updates and New Detection Features in the Netlogon Parser v3.5

As I mentioned, there are numerous new features and updates added to v3.5 of the Netlogon parser. That being said, I also had to remove some functionality unfortunately. We will be adding these features later using another mechanism, so you will get the option back in the future.

Before I show you the guts of the new features, I want to give you an idea of the updates:

1. First, the things we had to remove…

a. It was with a lot of hesitation, and a lot of frustration in trying to work around problems, that I removed the feature for authentication attempts to be brought together by an operation. There was a catch 22 in the function in that, while it worked flawlessly, there are certain backend items that need to be addressed before it can be re-instituted in order to accomplish the expected performance (unrelated to Message Analyzer).

2. No more 100MB log size limitation!

3. Significantly improved performance!

4. Slight changes to the wording for NO_CLIENT_SITE detection

5. Added multiple error code identifications (and provide the meaning of these codes in the summary output)

6. Evaluate inconsistent/unexpected format lines to still provide valid output for errors and account warnings detected

7. Adjusted wording for multiple summary messages

8. Provided an easier method for filtering to identify problems and account “discrepancies”

9. Adjusted summary wording for Netlogon service startup from “SVC STARTUP” to “SERVICE STARTUP”

10. New analysis grid layout added to the “Layout” dropdown for Netlogon Analysis

So, let’s go over the additions in a bit more detail:

1. No more 100MB log size limitation!

a. Previous versions of the Netlogon parser struggled with file sizes beyond 100MB. Although you could run the parser against larger files, doing so meant it was time to put Message Analyzer in the background and fix something else. That is no longer the case!

2. Significantly improved performance!

a. When I say significant, I do mean significant. To give you an idea, on my test machines, what used to take 24 minutes and some change (on my test machines) to parse took just over a minute!

3. Slight changes to the wording for NO_CLIENT_SITE detection

a. This is where your input comes into play! There was a request to add some wording to indicate lines were related to no client site identification being made. Since the no client site detection is still an operation, wording has been added to the summary of each detection to state “no client site detected” in order to simplify spotting these lines when showing messages only and not using operations.

4. Added multiple error code identifications (and provide the meaning of these codes in the summary output)

a. Most error codes added relate to identifying account issues (including helping you hunt down account lockouts) and potential attempts to compromise security (using invalid accounts for instance). However there are also a couple of new error codes added for troubleshooting purposes. Here is a listing of the errors added for this version (this list also includes a few non-error code related detections that have been added):

Status/Return Code

Technical Meaning

0xC000006D

STATUS_LOGON_FAILURE

0xC002001B

RPC_NT_CALL_FAILED

0xC0000072

STATUS_ACCOUNT_DISABLED

0xC0020030

RPC_NT_UNKNOWN_AUTHN_SERVICE

0xC000006F

STATUS_INVALID_LOGON_HOURS

0xC0000193

STATUS_ACCOUNT_EXPIRED

0xC0000001

STATUS_UNSUCCESSFUL

0xC000006A

STATUS_WRONG_PASSWORD

0xC0000064

STATUS_NO_SUCH_USER

0xC0000071

STATUS_PASSWORD_EXPIRED

0xC0000199

STATUS_NOLOGON_WORKSTATION_TRUST_ACCOUNT

NeverPing setting detection

This detects whether or not the NeverPing registry value is set in the Netlogon\Parameters key. In order to make this determination, a service restart must have occurred while Netlogon logging is enabled.

“error” detection

Generic parsing for the word “error”. NOTE: This may come up with some false positives due to its generic nature and it is not case sensitive. For instance, if the phrase “ERROR_SUCCESS” was in a line, this would be flagged by this parsing mechanism. All detections flagged by this parsing operation will have a prefix of “DIAGNOSIS:” in the summary wording.

“failed” detection

Generic parsing for the word “failed”. NOTE: This may come up with some false positives due to its generic nature and it is not case sensitive. A real example within a Netlogon log would be during service startup when the DnsFailedDeregisterTimeout value is read; although this is NOT a problem indicator, it registers with a “DIAGNOSIS:” prefix in the summary wording because the word “failed” was detected.

5. Evaluate inconsistent/unexpected format lines to still provide valid output for errors and account warnings detected

a. In previous versions of the parser, the unexpected formats/inconsistent log lines would be brought into an operational grouping together. With this version, we now break out some of the specifics and identify errors within those lines so you don’t miss a beat!

6. Adjusted wording for multiple summary messages

7. Provided an easier method for filtering to identify problems and account “discrepancies”

a. While the old methods of filtering still work, you are now able to filter on “DIAGNOSIS”, “WARNING”, “failure”, “failed”, and “error” to bring back account warnings or problems that have been identified. We’ll talk about that a bit more later on in the blog.

8. Adjusted summary wording for Netlogon service startup from “SVC STARTUP” to “SERVICE STARTUP”

9. New analysis grid layout added to the “Layout” dropdown for Netlogon Analysis

a. Starting with Message Analyzer 1.3.1, the Layouts menu will now contain the “Netlogon Analysis” grid view as shown below in the “How to add the “Netlogon Analysis” grid view” (note that section is for implementing the Netlogon Analysis grid view on Message Analyzer 1.3 and below). The Netlogon Analysis grid view will provide a streamlined analysis grid that contains only the columns needed for reviewing Netlogon logs in order to save you from adjusting columns each time you open a Netlogon log to review!

So now with some of the explanations of the updates out of the way, let’s take a look at the new detections that are available, along with the new view of the new and existing detections. If you need a recap on the other detections not listed in this blog, please review the Introduction blog , Troubleshooting Basics for the Netlogon Parser for Message Analyzer blog, and the New Features in the Netlogon Parser (v1.1.4) for Message Analyzerblog.

First, let’s take a look at the one item that does still perform operational grouping; the detection of NO_CLIENT_SITE entries in the Netlogon log. As I mentioned before, the change made here is only to reflect that no client site was detected in the summary wording. You can of course use this information to determine where you may be missing proper site/subnet assignments within Active Directory, which can lead to slow or failing authentication attempts.

image

image

Before we move onto the new items that are included, let’s take a look at the new look in the Netlogon parser v3.5 for the detections that were already available in v1.1.4:

image

image

As you can see, the way the information is presented has changed a bit. This is especially true of authentication attempts. The way the look is broken down now, in simplistic terms, is that if an error is detected that is NOT in an authentication (but may be in response to an authentication), then the error is reported back with the reason for the error, and in some cases some potential problem areas to look at. For example, in the first screenshot (2 screenshots above this paragraph), you can see RPC call cancellation detections, along with no logon server available detections, RPC bad stub data, etc. All lines now contain the real text from the log after the meaning of the error/failure identified with the exception of authentication requests, which still provide you a translated view that has now been enhanced to also provide the meaning of the error code within the line.

The key to note here though, is that all of these are preceded with the word “DIAGNOSIS”. This can greatly ease finding problems, because now we can simplify filtering to find all problems, essentially at a glance. We will get more into filtering through the logs later in the blog.

Another thing to note here is the lines that state “ACCOUNT WARNING”. Many of the new additions to the parser have this at the beginning of the lines. Filtering on “WARNING” or “ACCOUNT WARNING” can show you all the authentication problems. This includes possible security or account compromises (attempting to use an invalid username, an invalid password, attacks resulting in account lockouts, etc). An additional account related prefix added to the summary that you can see above is “WRONG PASSWORD”, which is pretty self-explanatory…

For authentication attempts, the format is a tad different as problems do not contain the “DIAGNOSIS” key at the beginning (however filtering on “DIAGNOSIS” should still point you to the account logon failure reasons). Instead, you will be informed that an authentication was entered, and that an authentication failed, along with the reason for the failure, followed by a simplified translation of the authentication attempt. However, there is an exception to this, and that exception is the authentication failures that are due to account issues, which will still contain the “ACCOUNT WARNING” or “WRONG PASSWORD” wording prefix in the summary. This was done in order to filter out account problems or possible security risks in their entirety.

The highlighted frames in this screenshot show a successful authentication with the new look:

image

A failed authentication will look similar to the highlighted lines below:

image

So what are you seeing here? As mentioned above, it shows you that the authentication attempt was entered. The next line tells you that there was an authentication failure, and what the error code translation for the error code is. After that is the actual line from the Netlogon log. Notice though how the line is not prefixed by any “DIAGNOSIS” wording. This was done primarily because it seemed easier to read through authentication attempts without the additional wording (not to mention that filtering on “DIAGNOSIS” will bring back these failures anyways).

As far as authentications that come back with a warning regarding the account, this is shown in the highlighted frames below (it is also shown in the screenshot above as well if you were looking closely):

image

There is one more status I briefly mentioned above, and that is if an authentication attempt is returned with the wrong password. Those returns will be prefixed by the words “WRONG PASSWORD” in order to make them stand out as seen below:

image

image

There is also one more change in this version of the Netlogon parser. There is some code to capture inconsistencies/unexpected syntaxes within the Netlogon log which occurs with some versions of the Netlogon.dll binary versions. In previous versions of the Netlogon parser, these lines would be captured and compiled into an operational grouping titled "The lines grouped here are typically not useful for troubleshooting! Please expand grouping for details", which contained all the detected lines. This version of the parser expands significantly on that. Now the parser is coded to go through these lines to identify authentication attempts and to search for indication of any problems in those lines as well. These lines are now reported with a prefix of “LOG INCONSISTENCY”. If an “account warning” (or potential security risk) occurs, the line will be prefixed with “LOG INCONSISTENCY ACCOUNT WARNING”. If a non-account related error is identified, those lines will be prefixed with “LOG INCONSISTENCY DIAGNOSIS”. All of these lines will be moved to the top of analysis grid due to a lack of a timestamp. So, if you are filtering, the same filtering methods discussed above will ALSO bring back these other lines as well.

Here is a screenshot to provide an example of what these lines look like:

image

Above, I also mentioned a change to the wording for Netlogon service startups. The syntax is still the same, only the wording has changed:

image

NOTE: The service startup lines is also where the NeverPing status is detected!

image

Rather than bore you with 9 million more screenshots of examples of the new functionality, I will provide a few screenshots that contain the new detection feature frames highlighted. Hopefully by this point in the blog, you have a decent understanding of how the format has changed, and what to look for. Later in the blog, we will also take a look at more filtering techniques.

In the below screenshot, you can clearly see that we found some log inconsistencies that contain both account warnings as well as problems, as well as lines with expected syntax that contain problems and account warnings. If we look closer at this example, we can see that an unknown authentication service is attempting to authenticate a user, a disabled account is attempting to authenticate, that we have an RPC call failure, a failure to find a logon server, a failure to share SYSVOL, an account lockout, and a couple more RPC errors. Can you spot these issues?? Never mind the fact that the lines are highlighted, the wording is pretty straight forward as to what was identified.

image

The next few screenshots show both new and existing functionality with the new wording format for the summary:

image

image

image

Oh and before I forget, I did want to show you an example of the “false positive” detection due to the generic filters to look for “failed” or “error” (in this case, “failed”). I admit, it’s somewhat annoying at times, but is pretty easy to identify typically (I’ve only seen the false positive with service startup so far), and the benefit outweighs the false detection risk.

image

I think that about covers the new features ramp up....

Are you still with me? Asleep? Drooling from boredom yet?? Let’s just assume you’re still awake and have some interest shall we…!

Known issues

Although the parser is significantly improved, there are still a few known issues:

1. Message Analyzer performance

a. There are known issues with using Message Analyzer on single core virtual machines where the CPU can (and will) spike up to 99-100%.

b. Message Analyzer, when used with the Netlogon parser, can have a decent memory footprint. I recommend having at least 4GB of RAM, but as we all know, the more RAM the better!

2. Netlogon parser performance

a. In certain (rare) scenarios, Netlogon parser performance and functionality can be impacted if there are non-contiguous timestamps within the log file being reviewed. Put another way, if you have temporarily enabled Netlogon logging in the past, and then re-enable it later, you may impact performance and functionality due to the differing timestamps.

i. If you experience this situation, you can stop the Netlogon service, delete or rename Netlogon.log, then start the Netlogon service once again to start from scratch with your file. NOTE: For application servers and domain controllers, this will push authentication and requests over to other servers so make sure you don’t do this during production hours!

3. Timestamps (only when used with Message Analyzer 1.1)

a. When using the Netlogon parser v3.5 with Message Analyzer 1.1, the timestamp -UTC time issue that is corrected when the parser is used with Message Analyzer 1.2 and above still exists. You still gain the additional functionality.

4. False positive “DIAGNOSIS” detections due to the generic queries

a. This is a known issue, but the benefit seems to outweigh the risk in ensuring that no stone is left unturned!

Filtering your output

First things first…you need to know what fields are available so you know how to fine tune your filters. My goal here is not to show you every single way to filter, as there are various methods, but to get you started on how to do some simple and more advanced filtering with the Netlogon parser. The syntax in some cases can be changed to simplify even complex filters, but as I said, this is to get you started. As you become more familiar with Message Analyzer, or if you are already familiar with Message Analyzer, then you will learn the ins and outs of filtering even better.

Variable (Typical Filtering Method)

Explanation

Msgtype

EX: *msgtype == “CRITICAL”

EX: *msgtype contains “CRITICAL”

Contains the type of message being conveyed within the log file such as CRITICAL, MAILSLOT, SESSION, LOGON, PERF, MISC, etc. This will be found in nearly all messages, but isn’t quite as useful for filtering given the other capabilities of the parser.

RemainingText

EX: *RemainingText contains “failed”

Can contain many variations of text; anything from a complete line from the log, to random portions of the text that aren’t very interesting for troubleshooting and problem analysis. This can be found in nearly all messages, but may be rare to use given the other capabilities of the parser.

domainName

EX: *domainname == “CONTOSO\”

EX: *domainName contains “CONTOSO”

Contains the domain name for the user attempting to authenticate. For null authentication attempts that do not contain a domain name, this variable will be unpopulated. This is useful to use as a filter to trend authentication failures when they are failing to a specific trusted domain.

NOTE: When filtering on the domainName value, you must include the backslash “\” character in the filter if using “==”. This is not required if using “contains” in place of “==”.

userName

EX: *userName == “User1”

EX: *userName contains “User1”

Contains the user attempting to authenticate. This is extremely useful to identify any trending patterns for specific users that are failing authentication.

originMachine

EX: *originMachine == “Win7Client22”

EX: *originMachine contains “Win7Client22”

Contains the name of the device the user is attempting to authenticate from (ie; the source machine or device). Note that this value is not always provided (some authentications from 3rd party operating systems for example). This is useful to trend on a specific source device or machine.

relayMachine

EX: *relayMachine == “USEXCHCAS01”

EX: *relayMachine contains “USEXCHCAS01”

Contains the machine that is proxying the authentication on behalf of the user and source machine. This will typically be an application server (Exchange, IIS, SharePoint, SQL, etc) or a domain controller in a trusted domain. This is useful to help trend authentication attempts from a specific application server in order to identify where bottlenecks may be occurring.

otherText

EX: *otherText contains “Package”

Contains additional text such as flags for authentication (ExFlags), or in the case of non-NTLM authentication, the authentication package being used (ie; Kerberos, Microsoft Unified Security Protocol Provider, etc). This is useful for narrowing out non-NTLM authentication requests.

errorCode

EX: *errorCode == “0x0”

EX: *errorCode == “0xC000005E”

EX: *errorCode contains “C000005E”

Contains the error code returned for the authentication attempt. Use “0x0” to identify successful authentications, or use the specific error code to identify specific failed authentication attempts. While you can use this method, it is typically unnecessary due to other methods provided to filter out specific errors as outlined below in this blog.

Summary

EX: *Summary contains “LOCKED OUT”

EX: *Summary contains “user1”

EX: *Summary contains “SOME-MACHINE”

EX: *Summary contains “failure”

This is a general filtering method provided by Message Analyzer. The Netlogon parser (all versions) exposes the information necessary for all filter areas mentioned above within the summary area. The takeaway regarding this filtering method is that you can use *Summary contains “<any string>” to find nearly all items of interest, however it may provide more information than you are looking for. An example of this would be if there is a user and machine with the same name; filtering on *Summary contains “samename” would bring back all items related to the user and to the machine with the same name of “samename”, which may not be the desired output.

In this version of the parser, I have put notes directly into the parser to help you with some very basic filtering; but so you don’t have to actually go open the parser. Here is the breakdown for filters related to accounts and potential security risks:

Filter

What the filter finds

*Summary contains "WARNING"

Filters for account issues (expired passwords, disabled account authentication attempts, invalid username, etc.)

*Summary contains "WRONG"

Filters "wrong password" authentication attempts

*Summary contains “LOCKED”

Filters for account lockout events

The table above, as mentioned, is only very basic filtering. So if we look at an example of filtering on “WARNING”, this is what we get:

image

Or if we filter on “WRONG”, this is what we get:

image

Now, let’s look at filtering a little more deeply. Let’s say we want to filter out account warnings specific to Kerberos PAC validation or Schannel authentication, in which case the username would be “(null)”. We can do that by using a simple filter of: *Summary contains "WARNING" and *Summary contains "(null)"

Alternatively you can adjust that filter to: *Summary contains "WARNING" and *userName contains "(null)"

Here is an example:

image

And if we wanted to filter this down even further to look only at Kerberos PAC validation account warnings, we could use the filter *Summary contains "WARNING" and *Summary contains "(null)" and *Summary contains "Kerberos". Alternatively, you could use the filter *Summary contains "WARNING" and *userName contains "(null)" and *otherText contains "Kerberos" as seen in the below example:

image

Here is the breakdown for basic items related to potential problems identified by the parser:

Filter

What the filter finds

*Summary contains "DIAGNOSIS"

Filters for all potential problems found

*Summary contains "failure"

Filters authentication failures

*Summary contains "authentication"

(can also use: *Summary contains "SamLogon:")

Filters all authentication calls

*Summary contains "failed"

General query for the term "failed"

*Summary contains "error"

General query for the term "error"

Filtering the output for troubleshooting issues is basically the same as we discussed above for some more basic scenarios. But let’s say we want to dig deeper and review multiple accounts….

Let’s say for instance we want to try to filter out account lockout issues (this could also be tied into the above topic of course) for a user named “user1” or for any Kerberos or Schannel authentication attempts. In that case, I would need to supply some parenthesis around some of the filter, which would result in a filter of:

*Summary contains "WRONG" and (*userName == "user1" or *userName == "(null)") or *Summary contains "LOCKED" and (*Summary contains "user1" or *Summary contains "(null)")

-or you could use-

*Summary contains "WRONG" and (*userName == "user1" or *userName == "(null)") or *Summary contains "LOCKED" and (*userName == "user1" or *userName == "(null)")

What does this filter do you might ask? Well, it looks for the “WRONG” in “WRONG PASSWORD” or it looks for the “LOCKED” in the summary wording of “ACCOUNT LOCKED OUT”, then, in the case of the first filter example, specifies that the words “user1” or “(null)” must be included in the summary as well. In the second example, it’s a bit more refined. It still looks for “WRONG” or “LOCKED” in the summary field, but it then looks specifically at the username variable to see if there are users named “user1” or “(null)” in those fields.

Now, and this is a bit outside of the scope of this blog, keep in mind that in the case of an account lockout, the DC you are looking at may not be the source of the lockout, so it may not contain all (or any) of the wrong password attempts that actually led to the account lockout. In that case, you will need to determine the DC that the account was locked out from and review the Netlogon logs on that DC as well. Here is a simple example:

image

In this example, you can see 2 wrong password requests for a user named user1 in the child.domain.com domain, followed by an account lockout. If your account lockout threshold is, let’s say 10, then the wrong password must have also been passed to other domain controllers as well, so there may be more hunting to do because this only accounts for 2/10 of the bad attempts.

The bottom line:

For simple filtering where you have known values you are looking for, you can use a simple filter with “and” or “or” separating what you are looking for. But if there are multiple constraints, such as searching for multiple string potentials in the summary, and then narrowing that view only to a specific machine/device (or more than 1) or user, you have to “and” the filter, open parenthesis, put in your additional constraints, then close the parenthesis.

Looking at the previous example provided you can see this syntax:

*Summary contains "WRONG" and (*userName == "user1" or *userName == "(null)") or *Summary contains "LOCKED" and (*userName == "user1" or *userName == "(null)")

Notice in the example there are constraints to look for any summary containing “WRONG”, with the additional constraint that the line also must contain the userName of “user1” or “(null)”, which is then followed by the search for the phrase “LOCKED” with the same userName constraints. You have to be specific!

Now, let’s look at the same example, but this time let’s say I don’t want to include the “(null)” user account, but I do want to see all other locked out accounts. For this, we use a syntax like this:

*Summary contains "WRONG" and (not *userName == "(null)") or *Summary contains "LOCKED" and (not *userName == "(null)")

-or-

*Summary contains "WRONG" and (not *userName contains "(null)") or *Summary contains "LOCKED" and (not *userName contains "(null)")

-or-

*Summary contains "WRONG" and (not *Summary contains "(null)") or *Summary contains "LOCKED" and (not *Summary contains "(null)")

A lot of options there, right! The result of this type of query, using the same sample file as we’ve been using, is this:

image

Notice in this example how we get only the authentication attempts for “user1”, BUT we also get back other lines for wrong passwords being submitted and account lockouts. That happens because those summaries contain the same keywords we are searching the summary for. It’s a bit of noise, but you could reduce that noise a bit by altering the filter to something like this:

*Summary contains "WRONG" and (not *userName == "(null)") and (*Summary contains "failure") or *Summary contains "LOCKED" and (not *userName == "(null)") and (*Summary contains "failure")

NOTE: You can also use other variations as discussed above!

This method results in output such as this:

image

 

How to update the Netlogon parser manually to v3.5

If you are using Message Analyzer 1.1, 1.2, or 1.3, but still want to take advantage of the new features introduced in the Netlogon parser v3.5, then you can follow the below 4 steps to implement the updated Netlogon parser. Please keep in mind that the Netlogon parser v3.5 is written for Message Analyzer 1.3 and beyond, so there may be bugs that were not identified in testing and are not covered in the above known issues list!

NOTE: No version of the Netlogon parser will function on any Message Analyzer version less than Message Analyzer 1.1. It is highly suggested to find a way around your deployment blocker so that you can upgrade to Message Analyzer 1.3.1 as soon as possible!

With that being said, here’s how you manually update the parser:

1. If Message Analyzer is running, please shut it down and ensure the process is no longer listed in Task Manager

2. Download the Netlogon-35-config.zip file in this blog (this zip file contains v3.5 of the Netlogon parser, as well as the “Netlogon Analysis” grid view files)

a. The files within Netlogon-35-config.zip are: Netlogon.config, Netlogon-Analysis-View.asset, and Netlogon-Analysis-View.metadata

3. Unzip Netlogon-35-config.zip to a location of your choosing

4. Copy the Netlogon.config file that you unzipped into:

a. If using Message Analyzer 1.2 or 1.3: %userprofile%\AppData\Local\Microsoft\MessageAnalyzer\OPNAndConfiguration\TextLogConfiguration\DevicesAndLogs (when prompted to overwrite the file, select the option to replace the file in the destination)

b. If using Message Analyzer 1.1: %userprofile%\AppData\Local\Microsoft\MessageAnalyzer\OPNAndConfiguration\TextLogConfiguration\AdditionalTextLogConfigurations (when prompted to overwrite the file, select the option to replace the file in the destination)

After following the above 4 steps, the Netlogon parser v3.5 should now be implemented and available for use once you reopen Message Analyzer.

How to add the new “Netlogon Analysis” grid view

As an added bonus to the new parser, an analysis grid view that contains a more refined view specific for the needs of analyzing Netlogon logs is available to download in this blog. This analysis grid contains the message number, the diagnosis (ie; diagnosis types), time elapsed, summary, trace source (the name of the source Netlogon log), and the trace source path (the path to the source Netlogon log).

Here is how you manually import the Netlogon Analysis grid view (assumes you have already imported v3.5 of the Netlogon parser and are running Message Analyzer 1.3 or 1.3.1):

1. Open Message Analyzer

2. Open a Netlogon log to begin a new session (you can drag and drop the file in or open it using the File menu or shortcuts on the start page); select the Netlogon parser and click Start

3. Above the analysis grid, click the Layout dropdown (or use the Session|Analysis Grid|Layout option)

4. Select Manage Layouts, then click Manage…

image

5. In the Manage View Layout screen, select Import

image

6. Browse to the path where you unzipped Netlogon-35-config.zip to find the file Netlogon-Analysis-View.asset, then select it and click Open, or simply double click the .asset file

7. In the Select Items to Import screen, just click the OK button

image

8. You should now be back in the Manage View Layout window; if you scroll down, you should see a new Netlogon category, with the Netlogon Analysis grid view listed

image

9. Click the OK button

10. You should now be able to select the Netlogon Analysis grid view. You can select this view from the “Layout” selection available above the analysis grid when opening a log file (1st screenshot) or from the Session à Analysis Grid à Layout menu (2ndscreenshot).

image

image

 

Reference links

Message Analyzer v1.3.1 download (highly recommended!)

http://www.microsoft.com/en-us/download/details.aspx?id=44226

New Features in the Netlogon Parser (v1.1.4) for Message Analyzer

http://blogs.technet.com/b/askpfeplat/archive/2015/01/19/new-features-in-the-netlogon-parser-v1-1-4-for-message-analyzer.aspx

Introducing the Netlogon Parser (v1.0.1) for Message Analyzer 1.1 (By: Brandon Wilson)

http://blogs.technet.com/b/askpfeplat/archive/2014/10/06/introducing-the-netlogon-parser-v1.0.1-for-message-analyzer-1.1.aspx

Troubleshooting Basics for the Netlogon Parser (v1.0.1) for Message Analyzer (By: Brandon Wilson)

http://blogs.technet.com/b/askpfeplat/archive/2014/11/10/troubleshooting-basics-for-the-netlogon-parser-v1-0-1-for-message-analyzer.aspx

Quick Reference: Troubleshooting Netlogon Error Codes (By: Brandon Wilson)

http://blogs.technet.com/b/askpfeplat/archive/2013/01/28/quick-reference-troubleshooting-netlogon-error-codes.aspx

Quick Reference: Troubleshooting, Diagnosing, and Tuning MaxConcurrentApi Issues (By: Brandon Wilson)

http://blogs.technet.com/b/askpfeplat/archive/2014/01/13/quick-reference-troubleshooting-diagnosing-and-tuning-maxconcurrentapi-issues.aspx

Message Analyzer Forum

http://social.technet.microsoft.com/Forums/en-US/home?forum=messageanalyzer

Message Analyzer blog site

http://blogs.technet.com/MessageAnalyzer

Memory usage with Message Analyzer

http://blogs.technet.com/b/messageanalyzer/archive/2015/07/06/memory-usage-with-message-analyzer.aspx

Just to recap; please send us any suggestions or problems you identify through the comments below, the Message Analyzer forum, via email to MANetlogon@microsoft.com, or using the integrated feedback button in Message Analyzer as seen below (circled in green at the top right)!

image

image

 

Thanks, and talk to you folks next time!

-Brandon Wilson

Leveraging Windows Native Functionality to Capture Network Traces Remotely

$
0
0

 

Happy Monday everybody! Victor Zapata here again to share some network tracing and PowerShell love.

Like many of you, I’m frequently called on to troubleshoot issues that involve network connectivity of some sort. These issues can range from poor network performance to complete network failure… and almost always require collecting network traces on two or more computers simultaneously.

I still remember the days where collecting simultaneous network traces required installing network sniffer software on multiple computers along with the seemingly impossible task of coordinating various resources to start and stop the data collections at the same time. Oh boy… what a hassle!

Well we can happily put those sad days in our rear view mirror! Beginning in Windows 7/Windows Server 2008 R2, installation of network sniffer software is no longer necessary. Netsh functionality was extended in Windows 7/Windows Server 2008 R2 and is now capable of leveraging Event Tracing for Windows (ETW) to collect network trace data. If you’re not familiar with this functionality, check out Netsh Commands for Network Trace in Windows Server 2008 R2 and Windows 7 on TechNet.

 

For this implementation, we will be using the following two commands to start and stop network tracing on Windows Servers & Clients.

To start network tracing: NETSH TRACE START CAPTURE=YES TRACEFILE=<PATH>\<FILENAME.etl>

To stop network tracing: NETSH TRACE STOP

NOTE: There are some scenarios where network connectivity is dropped briefly when starting a network trace session using Netsh.

 

Things get even better beginning with Windows Server 2012 and later because PowerShell remoting is enabled by default. Because of this, we can use PowerShell to start, stop and collect network trace data programmatically and all from a central location without having to do anything special.

Here are the prerequisites to get this working:

1. Windows Server 2008 R2 / Windows 7 or later

2. PowerShell Remoting must be enabled (“winrm quickconfig”)

3. Administrator rights on all target computers

4. Network connectivity to each target computer (Firewall allows PS Remoting)

 

The following is an example of how to script the network trace data collection. To keep things simple, no error handling is included in this example.

DISCLAIMER: The sample code described herein is provided on an "as is" basis, without warranty of any kind.

#************************************************************************************#

# #

# This sample PowerShell script performs the following tasks #

# #

# 1. defines output path & folder on each target computer #

# 2. creates the output path & folder on each target computer if not already present #

# 3. starts network tracing on each target computer #

# 4. stops network tracing on each target computer when “x” key is pressed by user #

# 5. copies network traces from each remote computer to local directory #

# #

#************************************************************************************#

#START NETWORK TRACES ON REMOTE COMPUTERS

#Specify target computer names

$computers= "COMPUTER1","COMPUTER2"," COMPUTER3","COMPUTER4"

#Drive letter on remote computers to create output folder under

$drive="C"

#Folder path on remote computers to save output file to

$directory="TEMP\TRACES"

$path= $drive + ":\" + $directory

#SCRIPTBLOCK TO BE EXECUTED ON EACH TARGET COMPUTER

$scriptBlockContent=

{

param ($localoutputpath,$tracefullpath)

#Verify that output path & folder exists. If not, create it.

if((Test-Path -isValid $localoutputpath))

{

New-Item -Path $localoutputpath -ItemType directory

}

#Start network trace and output file to specified path

netsh trace start capture=yes tracefile=$tracefullpath

}

#Loop to execute scriptblock on all remote computers

ForEach ($computer in $computers)

{

$file= $computer + ".etl"

$output= $path + "\" + $file

Invoke-Command -ComputerName $computer -ScriptBlock $ScriptBlockContent -ArgumentList $path, $output

}

#Loop to check for “X” key

While($True)

{

$Continue= Read-Host "Press 'X' To Stop the Tracing"

If ($Continue.ToLower() -eq "x")

{

#STOP NETWORK TRACES ON REMOTE COMPUTERS

#Run 'netsh trace stop' on each target computer

ForEach ($computer in $computers)

{

Invoke-Command -ComputerName $computer -ScriptBlock {netsh trace stop}

}

#COLLECT TRACES

#Copy network traces from each target computer to a folder on the local server

ForEach ($computer in $computers)

{

$file= $computer + ".etl"

$unc= "\\" + $computer + "\" + $drive + "$\" + $directory

#Specify directory on local computer to copy all network traces to

#NOTE: There is no check to verify that folder exists.

$localdirectory="C:\TRACES"

$tracefile= $unc + "\" + $file

Copy-Item $tracefile $localdirectory

Write-Host $file "copied successfully to" $localdirectory

}

break

}

}

 

When the script runs, network traces are kicked off on all target computers using the computer’s name in the output file name. The tracing status should show as “Running” on each target.

clip_image002

NOTE: When collecting a series of traces (multiple traces in the same day) you may wish to ensure name uniqueness by adding something like the current time to the output file name. For example, the following will set the variable $TIMESTAMP with the current Month, Date, Year, Hours, Minutes and Seconds as a string value.

$timestamp = ((get-date).toString('MMddyyyyHHmmss'))

Then simply include $TIMESTAMP in the file name. ( Ex. $file= $computer + $timestamp + ".etl" )

 

Once the network tracing is started on all targets, simply reproduce the issue then press the “X” key followed by ENTER to stop tracing on all target computers.

clip_image004

 

Next, NETSH correlates the data collected and stops the trace sessions on all targets

clip_image006

 

Lastly, all network traces are copied to a local directory defined in the script

clip_image008

 

That’s it! Now the fun begins! Let’s open each one of these puppies and resolve this thing!

 

Victor Zapata

PSA: Great Series on Windows Performance

Viewing all 501 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>