Good morning, all. I've been trying to find a fix for this for the last 8 months with no success. I've read the questions asked by other users with the same issue but the answers haven't been helpful. I'll give a little background first.
We have several RDS server farms all with fully patched Windows Server 2008 Enterprise R2 SP1. We use a two-factor authentication system (smartcards) for users to log into these farms. We have Symantec Endpoint 11 for our enterprise AV solution (we'll be heading to most recent version in the first quarter next year). All of these servers are VMs and most of them were cloned from the first two servers built for our two-factor authentication system.
Our first farm, which is the most heavily used, has the most problems because of this issue. The others have it as well but not nearly as much. The other farms are clones from the first farm (yes, I went through all motions of renewing the SIDs and such).
Here's the problem: Sometimes within a few days, and other times as much as a month, something will trigger the LSM.exe process on a single server to have a memory leak. A server running for almost a week without the trigger will have the LSM.exe process at 4-5mb of memory usage and uses very little CPU. If the memory leak is triggered, the LSM.exe process will grow to 50mb+ memory usage in a matter of hours and use more and more CPU. Once it hits 80-90mb, there is a noticable degredation in performance. If it gets to 200+, the server becomes unresponsive and the CPU is pegged. The more people login and reconnect to sessions, the faster it climbs. The only way I know to reset the LSM.exe process is to restart the server.
I read a few months ago that Terminal Services Manager in Windows XP or 2003 can trigger this memory leak. I sent a notice out to our network group to please use the Remote Desktop Services Manger in Win7 or 2008 to manage these servers. I've seen a decline in the issue since then but it still occurs. It doesn't happen to all servers. Only one or two at a time. It is never an entire farm. As I said, it happens mostly on our most heavily used farm which was used to clone the others. The others haven't really had the issue because of the slower traffic on them.
Is there a patch that I'm missing? Is there a way to troubleshoot it further? Is there a way to create a dump file for the process like "userdump.exe" from XP and 2003? What am I missing? I'm attaching the specs of one of our servers to this question. The other servers have the same specs.