Doc generation performance

Topics: General Questions, Sandcastle
Nov 1, 2007 at 9:50 PM
I am evaluating several document generation options for my company, and I need clarification on what I perceive to be a serious performance issue with DocProject and/or SandCastle.

When I generate docs using Doc-O-Matic, the process is taking approximately 15 minutes (sometimes less) to complete. This results in approximately 12,000 topic files. When I generate docs with DocProject, it is taking hours (3+) to complete, with roughly the same number of topic files.

To be fair to DocProject and Sandcastle, I want to make sure there is nothing that I am doing wrong to cause the performance problems.

Should it be taking this long to generate this number of topic files?
Coordinator
Nov 2, 2007 at 3:32 AM
Hi,

15 minutes is certainly impressive :)

You're probably not doing anything wrong though. It seems that Sandcastle is just slow compared to some of the other tools out there. See the following threads for more information:

June CTP - Performance
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1794856&SiteID=1

Performance on large assemblies
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1281744&SiteID=1

- Dave
Nov 7, 2007 at 3:47 PM
Hello

I think this performance problem is caused by the generic generation of the html files.
I'm documenting only about 6 assemblies from our software (filtered by the API Topic management). If i do not filter this by API Topic Management, buildprocess just knocks me off (the topicmanagement is slow, ok, but if you did i once, it will not appear in the buildprocess and will not generate html files).

If you just want to document all the assemblies in your Docproject, there is only one way to have better performance.
Don't save your docproject on the harddrive, use solidstatedisc, RAMDrives, USB Sticks with SLC NAND chips and more than one controller, or RaidControllers with a Blocksize dedicated to your needs.

In my case the HTML Files are not bigger than 32 kilobytes, so it would be fine to initialize a Raidsystem with a blocksize = 32KB and save your project there (its still a mechanical drive and file "create, fill, save" operations are still expensive on that.
So for best performance, use a Drivetechnologie were it is not real expensive to create hundreds of thousend of very small files.

Or a RamDisk;
http://www.cenatek.com/ -> sorry for this ad, i do not earn money for this, but this is just a really god RAMDisk tool (the only i know which really works properly)
I think more speed is not possible (we sometimes use this RAMdisks in our asp.NET 2 environment).

And you can later copy the .chm file and the needed html files at a disk.

Hope this helps you? (I think there is no other way to have a better speed)

Greetings Cis
Coordinator
Nov 7, 2007 at 8:33 PM
Hi Cis,


I think this performance problem is caused by the generic generation of the html files.

Yes.


I'm documenting only about 6 assemblies from our software (filtered by the API Topic management). If i do not filter this by API Topic Management, buildprocess just knocks me off (the topicmanagement is slow, ok, but if you did i once, it will not appear in the buildprocess and will not generate html files).

The new dynamic filters feature in DocProject 1.9.0 will help because you can edit the filters in the Help\Settings\dynamicFilters.xml file without having to open the dialog. Although, the file isn't present until you save a filter in the dialog first, but you can create it manually instead. See my blog for more info.

Thanks for your suggestions about improving hardware :)

- Dave
Nov 8, 2007 at 3:53 PM
Hello Dave

I've pointed out a strange performance improvement on Step 8 of the Buildprocess (Helpfile Version 1).
First, don't laugh at me :-)

When the Buildstep 8 begins, there are some initializations and after that, he begins to build the Topics;
I mean this output statements
"Info: Building topic AllMembers.T:PROCOS.Components.UITools.Shared.ProcosPivot.Script.ScriptDimension
Warn: Invalid referenceLink element.
...
...."

When this buildings begin, and i open the Visual Studio 2005 / Tools / Options Menu, pointing on the Docproject entry there is a massive build speed improvement. I'ts enough to just open this window !?!?.

When the builder finishes step 8 and i have the window still open, the builder "stops" at the end of this process.
If i close the property window, he continues with step 9. I think this options window stops the builder doing some reparsing or what ever, and this speeds the process up?

I don't stay under drugs, (a little bit coffein) but nothing other ;-).
Do you have the same speedups?

Im using Visual Studio 2005 SP1, with Docproject 1.7 i think (if you could tell me where i can find the Version i will tell you exactly what version i have?)

Greetings

Confused Cis
Coordinator
Nov 8, 2007 at 9:43 PM
Hi Cis,

Sorry, but that is funny :)

I haven't tried it myself but I think the reason for the speed-up is that Build Assembler is being executed on a background-thread and when you open a modal dialog Visual Studio no longer has to process Windows messages for the main UI, which gives the background-thread more processor time. When Build Assembler is finished control is returned to the main UI thread, which is currently blocked by the dialog so it waits for you to close it before starting the next build step, #9.

I don't think it has anything to do with what version of DocProject you are using, and I don't think it's a bug. But if you find that it improves performance on your system then you should try building outside of VS using the DocProject External UI. I'd like to know what the difference is in performance on your system between building within VS and outside.

- Dave
Nov 9, 2007 at 8:18 AM
Hello

I've checked out the behavior with the standalone docproject.exe and there is exactly the same speedup (i just click view/build items....) and keep the window open until the builder finishes step 8.

Greetings
Coordinator
Nov 9, 2007 at 1:25 PM
Hi Cis,

Well that still makes sense because Build Assembler is still being executed in the background and showing a modal dialog on the UI thread is still probably giving it more processor time.

But I thought that because DocProject's UI has less stuff than VS, it may be faster even without the open-dialog trick :)

I suspect on multi-processor systems that opening the dialog may not have any effect, or at least not much at all. Do you have only a single processor?

- Dave
Coordinator
Nov 11, 2007 at 6:01 PM
Hi Cis,

I was able to reproduce this behavior - thanks for bringing it to my attention :)

Though I'm not sure at this point whether I'm going to be able to take advantage of the speed-up programmatically, but I'm looking into it.

- Dave
Coordinator
Nov 11, 2007 at 6:02 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Nov 13, 2007 at 8:35 AM
Edited Nov 13, 2007 at 12:22 PM
Hello Dave

@OffTopic
I've checked the issue-tracker, could you please kill this statement "I don't stay under drugs, (a little bit coffein) but nothing other ;-)."
This was only a "joke", i think this should not be in the "public issuetracker" ;-)

It would be nice if the following entry could be rubbished too (this is some company spec. stuff)
"T:PROCOS.Components.UITools.Shared.ProcosPivot.Script.ScriptDimension"

PROCOS is our Company...

Thanks

Greetings
Coordinator
Nov 13, 2007 at 2:34 PM
Ok, it's fixed :)
Coordinator
Nov 30, 2007 at 2:28 PM
Hi everyone,

To be perfectly honest, performance has not been a concern of mine since I've been focusing on process and integration. My plan was to establish a feature set that I was satisfied with and then work on performance, but it seems (now that I look at it) that DocProject's performance when building in Visual Studio or in the DocProject External UI is pretty terrible compared to building on the command-line.

I just ran a couple of tests, building a DocProject using MSBuild on the command-line; I was able to generate over 11000 topics in about 11 minutes (without code comments). I think that's impressive. My apologies to the Sandcastle team for placing most of the blame on their product when in fact it seems to be an issue with my code.

My recommendation to DocProject users is to build your DocProjects and DocSites on the command-line if performance is really an issue for you. For example:
msbuild.exe "my_docsite.csproj"
Inside VS or the External UI you can use Cis's workaround instead (see above), for now :)

Performance is still not my number one priority though (conceptual topics are) but I will try to address this for 1.10.0, or at most 1.11.0. If I had to guess though, I'd say the problem is probably related to the use of Application.DoEvents and thread context switches (remember, Build Assembler is being executed in a background thread and in its own AppDomain, and calls are being marshaled cross-AppDomain to the UI thread to write messages to the VS output window. My first attempt will be to redesign this to see if performance can be improved.)

Thanks for the feedback,
Dave
Coordinator
Jan 9, 2008 at 1:39 PM
Hi everyone,

I just wanted to post an update about the performance issues. Here's an example from a test build that I just ran in the 1.10.0 RC:

Step 7 of 9: Build Assembler {sandcastle.help1x.config}
 
Preparing...
Executing...
 
Topics processed: 12165
 
Step 7 Time Elapsed: 00:09:45.7160000
That's under 10 minutes for 12 165 topics, although XML documentation wasn't included in this test. A performance gain of roughly 8 times when compared to the last version of DocProject.

As you can see there's no information traced to the build output window and also warnings are no longer added to Visual Studio's Error List, by default. But there's a new project option named, Build assembler options that you can use to adjust these settings, which affect performance, behavior and diagnostic output of the Build Assembler step. By default only error information is traced and the step cannot be canceled; e.g., the UI is not responsive while the step is executing. However, there is a huge performance gain as you can see. This new default behavior may also have an affect on VS Express and command-line builds as well, though I haven't run any performance tests to find out.

It's worth noting that even with all of the features enabled (i.e., when the Build Assembler step functions normally like in previous releases) there is still a performance gain of up to 5 times, in my tests, due to a bug fix that you can implement yourself in 1.9.0 RC if you feel like rebuilding the source code. Just change the 200 to 0 in the BuildStepExecutionEngine class's ExecuteStepAsync method. Doing this allows writes to the UI thread to occur as fast as possible, instead of within 200 millisecond intervals.

For help building DocProject, see How To Use The Source Code.

- Dave