The Legend of Heroes: Trails of Cold Steel PC Guest Blog #1 – Performance

When XSEED approached me about contributing to their in-progress The Legend of Heroes: Trails of Cold Steel port, I was immediately excited about the prospect. The Trails in the Sky series features some of my favourite JRPGs on PC, so I looked forward to making this later game in the franchise the best it can be on PC.

Of course, back at that point I didn’t quite anticipate just how involved I would get – I was expecting to do some optimization here and there, maybe amounting to a week or two of full-time work. Reality would turn out different, and in this series of articles I’ll give you some idea of why.

The series is currently planned in 3 parts, leading up to the release of the game on August 2:

  • The first part, which you are reading right now, deals with performance aspects, primarily framerates and loading times.
  • The second part will describe the graphical enhancements and options available in the PC version, and how they came about.
  • Finally, the third part will go into some specific features of the PC port that aren’t direct graphical enhancements, and explain some of the challenges in implementing them.

The Beginning of the Performance Story

It seems appropriate for a story about a program to begin with loading, and the initial issue that I was consulted on were in fact loading times. In the PC version of the game at that point, even on a fast machine, loading would routinely take upwards of 20 seconds. And these were not some infrequent large loads, but rather loading which occurred e.g. every single time a battle started and ended. Additionally, significant loading stutters were present frequently throughout the game.

This was of course not an acceptable state of affairs. After a lengthy analysis, I figured out that the primary reason for both the stutters and the loading was that the game’s engine used Nvidia Cg (a – by now – very outdated and unsupported high-level shading language toolkit) to compile and load shaders at runtime. By caching and reusing shader compilation results, I was able to reduce loading times (after the initial load) to ~2-3 seconds, and also eliminate most stutter after a startup phase. Satisfied with the progress on the particular issue I was contacted about, I reported my findings and code.

Standards

Some time later, I was tasked with polishing up the game for its eventual release. While I spent some time improving graphical aspects in the game’s Cg-based version which existed at that point, with more playtesting I grew increasingly dissatisfied with its performance.

As the game was originally released on PS3 and even Vita (though the PC version only uses PS3-level assets and effects or better), you would expect a fast desktop system to churn through it with incredible ease. However, at that point I had already discovered a specific scene and camera perspective in which my PC dropped down to 45 frames per second, completely CPU limited. Using a variety of profiling tools I discovered that the issue was primarily related to how the OpenGL/Cg rendering backend of the engine managed shader state, ending up with dozens of state setting calls for each individual draw call. By doing some of the more obvious optimizations and tweaks, I brought performance in my testing scene up to 55 FPS. At that point, I estimated that by fully optimizing the Cg-based renderer I might get the game up to around 80 FPS on my PC in that scene at best.

That wasn’t going to be good enough.

I could never accept a port with my name on it for a game that ran on PS3 at 30 FPS which only gets up to 80 FPS on my fast 2015 desktop system. In fact, I’d personally like to run a game like this on a low-power portable like the GPD Win, and with that level of performance this wouldn’t be possible.

Changing Horses Midstream

The only true solution to the performance issue would be to completely replace the rendering backend. The underlying engine already had a DX11 backend, but unlike OpenGL/Cg it was clearly not used by Falcom during the development of Trails of Cold Steel, and the game and its assets used a very large array of features not available or not functioning in the same way in in the DX11 backend. As such, switching to the different renderer was actually a larger change in some ways than all of the PC porting work that had been done up to this point.

To give you a better understanding of what this means, here is the first screenshot I took during development of the DX11 version — and note that this was already after fixing a number of issues that would prevent the game from even starting:

 

In this screenshot you can see over a dozen separate rendering issues, some of which required fundamental engine extensions and reworking to fix. However, they were still just a subset of a final tally of 57 separate classes of rendering problems (not individual instances) related to changing the rendering backend. There’s no way I can go into all of them, but here is a particularly amusing one I was tracking at a much later point during development – as you can tell by everything no longer being a horrible mess:

 

The Result

Regardless of all these issues and the effort required for porting to an entirely different rendering backend, it was all worth it in the end. The following chart gives you an idea of the (CPU) performance of the game on my PC at various stages in development:

 

The current state of the game, designated as “Optimized DX11 version”, is more in line with what you would expect from a good PS3 to PC port.

I’d like to note one important fact about this chart: please don’t quote it as some kind of argument for how much faster DX11 is compared to OpenGL – this result is a direct consequence of how these APIs are used in their respective rendering backends in the underlying engine. I assume that the GL/Cg version is designed more as a development aid to very closely resemble the console targets than for performance on PC.

Reaching almost 300 FPS on a high-end PC is nice, but ultimately rather pointless in a turn-based JRPG. What is more interesting and the real fruit of all this effort is performance on a really low-end system, such as the GPD Win portable. This video shows the game running on that device, and as you can see the mission of smooth gameplay on a portable at native 1280×720 resolution was accomplished. In terms of settings, this video uses the game’s “portable” settings, and what exactly that means – and also some ways in which the PC version will allow you to spend the massive performance overhead on a fast desktop PC – will be the topic of my next post about Trails of Cold Steel on PC.