many reports of this on technet but no solution. I wonder if there is a way to instrument the build to get more details on the reason for the errors and disconnects of the sessions.
My environment:
site A:
- physical DC
- physical mgmt
- VMware (ESXi 5.0) server running multiple instances of Windows 2008 R2: Exchange (DAG with the Exchange running on VMware in site B), file server (DFS replication with file server running on VMware in site B), multiple Terminal Server instances (replicating using VEEAM to site B)
site B:
- physical DC
- VMware (ESXi 5.0) server running multiple instances on Windows 2008 R2: Exchange, file server, mgmt server, ...
site A and B are connected over Internet (30 Mbps) in a VPN tunnel. Purpose - data replication for DR
All hardware is enterprise scale with lots of resources available. VMware server in site A is a dual proc, 16 cores each, 96 GB RAM connected to external array with dual SAS. Mgmt network - 2 NICs. VM access network - 4 NICs. All NIC teams are to a properly configured ProCurve switch (non LACP trunks). All systems utilized at less than 50% peaks. Firmware and drivers all latest and greatest. Replication between site A and B - no issues at all unless network network disruption (rarely the case) - notification system immediately reports issues
Clients access multiple Terminal Servers (Windows 2008 R2) running on the VMware server in site A. Each of the Terminal Servers experiences from 1 to 4 termDD errors per day (different times of the day on different servers) and drop client connections. in 99% the binary of the event 56 is D00000B5 (The specified I/O operation on %hs was not completed before # the time-out period expired.) but occasionally D00A0006 (A close operation is pending on the Terminal Connection.) or D000020D (The transport connection has been reset.). Clients run either XP or Windows 7 and are on latest mstsc.exe and with no "Desktop Composition" etc running. RDP is un-encrypted and in its most basic configuration (Desktop Experience is enabled)
This thing is ridiculous. The network appears to be tip-top (at the same time one client accessing TS1 is dropped while the other accessing TS2 is happily working with no perf issues). We have tried many things including disallowing drive redirection (to eliminate potential issues associated with slower USB devices etc), moved certain TS virtual machines to dedicated VMware NICs, disabling replication between sites, ... Same thing. No changes.The standard MS solution of making sure network is OK does not appear to be applicable here.
I wish there was a way to turn more detailed debugging of what causes the termDD disconnects or maybe even changing the sensitivity of the disconnects - you think something is slow, OK, don't kill the connection, just keep going longer.
Ideas???
Thx much