Querying AutoCAD objects of a particular type using .NET

Apr 8, 2013

AutoCAD, AutoCAD .NET, Commands, Drawing structure, LINQ, Runtime

Late last week I received an interesting email from Bruno Saboia, who's been experiencing some performance issues with code he'd written to return all the objects of a particular type – in his case, Lines – from the model-space of the active drawing.

We exchanged a few emails on the topic, and Bruno kindly allowed me to post his code and my suggestions for changes. In today's post, we're going to look at both the code and my suggestions, while in the next post (in this mini-series, at least) we'll look at a simple execution framework that can allows you to record crude performance timings of different command implementations.

Here's Bruno's original code, for us to look at in detail.

1 public static IEnumerable<Line> GetAllLines(Database db)
2 {
3   using (
4     var docLock =
5       Application.DocumentManager.MdiActiveDocument.LockDocument()
6     )
7   {
8     using (var tr = db.TransactionManager.StartTransaction())
9     {
10       var bt =
11         tr.GetObject(db.BlockTableId, OpenMode.ForRead)
12         as BlockTable;
13       var btr =
14         tr.GetObject(
15           bt[BlockTableRecord.ModelSpace],
16           OpenMode.ForRead
17         ) as BlockTableRecord;
18 
19       foreach (var obj in btr)
20       {
21         var line =
22           tr.GetObject(obj, OpenMode.ForRead) as Line;
23 
24         if (line != null)
25         {
26           yield return line;
27         }
28       }
29     }
30   }
31 }

The first thing to notice is that the function returns an IEnumerable. This is a "lazy" type – which actually just exposes an enumerator back to the calling function that can then be used to step through the results set – and this may actually have an impact on the logic of the code.

Skipping ahead, you'll see on line 26 that we "yield" the results to the calling function. What actually happens is that the calling function, as it steps through the code, essentially only executes the loop one iteration at a time. The reason IEnumerable is lazy – and in many situations therefore more efficient – is that we're executing "when needed" rather than "in case it's needed". This is a very good strategy if wanting to pass back what might be a very long – perhaps even infinite – set of data, such as paging through search results. It's also a very good way to spread the load of performing complex calculations throughout the execution of a loop. If the calling function is actually going to want to enumerate the whole list – which is likely in this case – then it may prove a better strategy to return a List<>. This thread provides some useful background information on this.

In this case, the code is returning an IEnumerable of Lines. I strongly recommend not doing this: the Lines have been opened by the current transaction, and so will become invalid as soon as the transaction completes. Because we're actually getting the results lazily – and therefore the calling function is getting the results one at a time, and so any function you call right afterwards will still have the transaction in scope – then this should be OK in this specific context. But code written in this way ends up being much less composable and reusable: it's far safer to have the function return a set of ObjectIds that can be opened and queried for information when needed.

There are different strategies for this, too: you might pass in a Transaction object into the function, to make sure it's clearly in scope, or you might pass in a lambda to evaluate and return the information you need to collect for each Line. In this case I'm going to go with returning an IEnumerable<ObjectId> and leave it to the calling function to (re)open the objects as it needs to.

Right then – onwards to lines 3-6: I would tend to avoid explicit locking of a document in a lower-level function. This is done automatically for the active document in all commands that are not registered as "session context" commands (i.e. any "document context" command has an implicit lock on the active document). I would assume that any session context command that calls this function would have locked the document manually beforehand. To be clear: this is not going to cause a measurable performance impact, but it's cleaner to remove it.

On line 8: we're using a traditional Transaction (started via StartTransaction()) to get the information we need for the contents of the model-space. Standard transactions provide some very helpful capabilities – for instance, they're nestable and can be aborted to roll-back any changes that have been made within the scope of that particular transaction – but they come with some performance overhead. As we're not making use of any advanced capabilities, we can gain some additional performance by using an OpenCloseTransaction (started via StartOpenCloseTransaction()) instead. See this post on the AutoCAD DevBlog for a more detailed discussion of the performance benefits of this change.

It should also be noted – and this can cause a big impact – that you should always call Commit() on a successful Transaction, even when it's nominally read-only in nature. If Commit() is not called then the Transaction will be aborted, and depending on what you're doing this can be a performance killer. This call would be added between lines 28 and 29, just before the Transaction drops out of scope.

Lines 10-17 are OK, although for the sake of being thorough I'll suggest a few minor changes. I very rarely bother doing this, but rather than opening the BlockTable to get the ObjectId of the model-space from it, it is also possible to use SymbolUtilityServices.GetBlockModelSpaceId(), passing in the database object. This calls through into native code to get the ObjectId of the model-space for that database, which may shave off a few virtual function calls, here and there. Once again, probably not a change with a measurable impact, but hey.

Another somewhat pedantic change would be to use a cast rather than using the "as" keyword (which checks runtime type information before performing any reference conversion) on lines 12 and 17. As we're not actually checking the pointer returned by "as" to see whether it's null, we may as well perform a traditional cast (which will avoid a check or two).

On lines 21 and 22, we open the object to check whether it is a Line or not. I tend to avoid this unless specifically opening the object for another reason (which admittedly Bruno wants to do, as he wants to collect some information about each of the Lines in the model-space). My personal preference is to change this line to check the ObjectClass property on the ObjectId, to see whether it IsDerivedFrom() the Line class. This avoids opening the object – which clearly will improve performance – although it shifts responsibility to the calling function to go ahead and open it, as needed.

So here's my version of the function. Remember that this doesn't do as much as the prior version – which leaves the Line open – and so any performance comparison will, of course, be apples-to-oranges:

public static IEnumerable<ObjectId> ObjectsOfType1(
  Database db, RXClass cls
)
{
  using (
    var tr = db.TransactionManager.StartOpenCloseTransaction()
  )
  {
    var btr =
      (BlockTableRecord)tr.GetObject(
        SymbolUtilityServices.GetBlockModelSpaceId(db),
        OpenMode.ForRead
      );
    foreach (ObjectId id in btr)
    {
      if (id.ObjectClass.IsDerivedFrom(cls))
      {
        yield return id;
      }
    }
    tr.Commit();
  }
}

You'll notice that the function has been generalised to take an RXClass for the type of object for which we wish to query. This just makes it a bit more flexible.

Building on this function, here's a slightly different version that functions nearly as quickly (I found it to run a few percent slower, on average) but uses LINQ:

public static IEnumerable<ObjectId> ObjectsOfType2(
  Database db, RXClass cls
)
{
  using (
    var tr = db.TransactionManager.StartOpenCloseTransaction()
  )
  {
    var btr =
      (BlockTableRecord)tr.GetObject(
        SymbolUtilityServices.GetBlockModelSpaceId(db),
        OpenMode.ForRead
      );
 
    var lineIds =
      from ObjectId id in btr
      where id.ObjectClass.IsDerivedFrom(cls)
      select id;
 
    tr.Commit();
 
    return lineIds;
  }
}

We could also force the ObjectsOfType2() function to return a List<ObjectId> by changing the last line to "return lineIds.ToList<ObjectId>();", but I found that this additional step slows down execution marginally. My somewhat limited knowledge of the underpinnings of LINQ leads me to believe that, as they stand, ObjectsOfType1() and ObjectsOfType2() are basically equivalent – you apparently also have lazy evaluation of select statements in LINQ – but I'd be very happy for someone to educate me if that isn't the case. For that matter, I'd be happy to hear from anyone who has further suggestions for Bruno – I may very well have missed something.

Here's the complete listing, with some commands implemented to call the various functions and measure/report the time taken for their execution:

using Autodesk.AutoCAD.ApplicationServices;
using Autodesk.AutoCAD.DatabaseServices;
using Autodesk.AutoCAD.Runtime;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System;
 
namespace ObjectEnumeration
{
  public class Commands
  {
    public void MeasureTime(Document doc, Func<int> func)
    {
      // Get the name of the running command(s)
      // (might also have queried the CommandMethod attribute
      // via reflection, but that would be a lot more work)
 
      var cmd = (string)Application.GetSystemVariable("CMDNAMES");
 
      // Start a Stopwatch to time the execution
 
      var sw = new Stopwatch();
      sw.Start();
 
      // Run the function, getting back the count of the results
 
      var cnt = func();
 
      // Stop the Stopwatch and print the results to the command-line
 
      sw.Stop();
      doc.Editor.WriteMessage(
        "\n{0}: found {1} lines in {2}.", cmd, cnt, sw.Elapsed
      );
    }
 
    [CommandMethod("LL1")]
    public void ListLines1()
    {
      var doc = Application.DocumentManager.MdiActiveDocument;
      MeasureTime(
        doc,
        () =>
        {
          var ids = GetAllLines(doc.Database);
          return ids.Count<Line>();
        }
      );
    }
 
    [CommandMethod("LL2")]
    public void ListLines2()
    {
      var doc = Application.DocumentManager.MdiActiveDocument;
      MeasureTime(
        doc,
        () =>
        {
          var ids =
            ObjectsOfType1(
              doc.Database, RXObject.GetClass(typeof(Line))
            );
          return ids.Count<ObjectId>();
        }
      );
    }
 
    [CommandMethod("LL3")]
    public void ListLines3()
    {
      var doc = Application.DocumentManager.MdiActiveDocument;
      MeasureTime(
        doc,
        () =>
        {
          var ids =
            ObjectsOfType2(
              doc.Database, RXObject.GetClass(typeof(Line))
            );
          return ids.Count<ObjectId>();
        }
      );
    }
 
    public static IEnumerable<Line> GetAllLines(Database db)
    {
      using (
        var docLock =
         Application.DocumentManager.MdiActiveDocument.LockDocument()
        )
      {
        using (var tr = db.TransactionManager.StartTransaction())
        {
          var bt =
            tr.GetObject(db.BlockTableId, OpenMode.ForRead)
            as BlockTable;
          var btr =
            tr.GetObject(
              bt[BlockTableRecord.ModelSpace],
              OpenMode.ForRead
            ) as BlockTableRecord;
 
          foreach (var obj in btr)
          {
            var line =
              tr.GetObject(obj, OpenMode.ForRead) as Line;
 
            if (line != null)
            {
              yield return line;
            }
          }
        }
      }
    }
 
    public static IEnumerable<ObjectId> ObjectsOfType1(
      Database db, RXClass cls
    )
    {
      using (
        var tr = db.TransactionManager.StartOpenCloseTransaction()
      )
      {
        var btr =
          (BlockTableRecord)tr.GetObject(
            SymbolUtilityServices.GetBlockModelSpaceId(db),
            OpenMode.ForRead
          );
        foreach (ObjectId id in btr)
        {
          if (id.ObjectClass.IsDerivedFrom(cls))
          {
            yield return id;
          }
        }
        tr.Commit();
      }
    }
 
    public static IEnumerable<ObjectId> ObjectsOfType2(
      Database db, RXClass cls
    )
    {
      using (
        var tr = db.TransactionManager.StartOpenCloseTransaction()
      )
      {
        var btr =
          (BlockTableRecord)tr.GetObject(
            SymbolUtilityServices.GetBlockModelSpaceId(db),
            OpenMode.ForRead
          );
 
        var lineIds =
          from ObjectId id in btr
          where id.ObjectClass.IsDerivedFrom(cls)
          select id;
 
        tr.Commit();
 
        return lineIds;
      }
    }
  }
}

Bear in mind that as we're dealing with lazy results, we need to include the Count<>() method in the measurement for the results to be representative (even if, as we've mentioned, Bruno's code includes the operation to open the Lines rather than just deal with ObjectIds). This clearly defeats the object of using a lazy structure in the first place, but it's ultimately more fair.

Now I don't actually think that these changes will have helped resolve the core problem in Bruno's application – for some reason he's seeing querying of 1K entities taking seconds to complete (whereas his code executes on 100K+ entities in tenths of a second on my system). I'm looking forward to understanding more about Bruno's specific problem, but felt that posting this in the meantime would be of interest to people.

On a side note, due to the performance issues he's been experiencing, Bruno looked into using Parallel.For – from the TPL – to harness multiple cores to speed up execution. Unfortunately this is not possible with AutoCAD: as we've seen in a number of previous posts, AutoCAD is not thread-safe and its APIs really need to be called on the UI thread.

That's it for today: please post comments if you have any further suggestions (or find any of the suggestions I've made to be inappropriate). In the post that follows from this, we'll look at extending the MeasureTime() method to keep track of execution information so we can (for instance) more easily bring it across into Excel for analysis.

43 responses to “Querying AutoCAD objects of a particular type using .NET”

Scott McFarlane

April 8, 2013 at 3:43 pm

I have encountered issues when using yield inside a transaction in this way. It works great if everything goes well, but have you tried to see what happens if an exception is thrown in the calling code?

Generally when you need this kind of pattern (getting all the objects of a particular type) you are ultimately going to *do something* with those objects. So I would recommend passing a delegate into these functions, such as Action<Line> (and the function itself would return void.) This still gives you the performance advantage of only iterating through the objects once, but it also gives you much more control over the behavior when exceptions occur.

Bruno, download the code from my AU2012 class CP2657. It's full of examples of this!

Reply
Kean Walmsley

April 8, 2013 at 3:48 pm

Great stuff - thanks for sharing that, Scott. 🙂

Kean

Reply
MexicanCustard

April 8, 2013 at 4:26 pm

Kean if you haven't seen it already check out
theswamp.org/index.php?topic=41371.msg473279#msg473279

Reply
Kean Walmsley

April 8, 2013 at 4:45 pm

Hadn't seen that - thank you.

It's not clear to me whether the additional boxing/unboxing that Tony's identified also happens on IsDerivedFrom() (I just had a quick skim through the thread - there's a lot there). It certainly contains some interesting analysis and information.

Kean

Reply
Bruno Saboia

April 8, 2013 at 11:52 pm

Hello Scott,

Thanks for the advice. Where can I get your class?

Also, I would like to say that it was other part of the code that was causing the huge delay. I will discuss it with Kean as well, I think it is very interesting.

Reply
Kean Walmsley

April 8, 2013 at 11:55 pm

Here you go, Bruno:

au.autodesk.com/?nd=class&session_id=10355

Kean

Reply
Bruno Saboia

April 9, 2013 at 12:03 am

Thanks 🙂

Reply
Bruno Saboia

April 9, 2013 at 12:10 am

Thanks for the great post, Kean.

Some people could ask "Why use an IEnumerator<t> instead of a List<t>?". My line of thought when approaching to that kind of problem is to use the most generic available class to represent my method's argument.

Why? If you specify that the user should pass a List<t>, for instance, you are creating a bound that is perhaps unnecessary. What if the user wants the GetAllLines data as a Queue, for instance? Then he would need to cast it, but as I see it, this constraint is somewhat clumsy from the code approach. I hope that this make sense to you.

But you are right: changing it to List<t> did impact my performance. Once again, thanks.

Reply
Tony Tanzillo

April 9, 2013 at 8:03 am

"In this case, the code is returning an IEnumerable of Lines. I strongly recommend not doing this: the Lines have been opened by the current transaction, and so will become invalid as soon as the transaction completes. Because we’re actually getting the results lazily – and therefore the calling function is getting the results one at a time, and so any function you call right afterwards will still have the transaction in scope"

Sorry, that's not true. If, for example, we use a foreach() loop to iterate over the objects yielded by Bruno's code, the transaction is still active within the body of the foreach() loop. The transaction isn't Disposed until the foreach() loop is completed or exited.

I would suggest spend spending a bit more time researching how LINQ works internally. What happens is that the compiler essentially rewrites the method that contains the yield return call, and from that, generates a class called an 'iterator', which implements IEnumerable. When foreach() iterates over an IEnumerable, it asks the IEnumerable for an instance of an IEnumerator, which is the object that is actually used to access the elements. The IEnumerator implements IDisposable, which is called by the code generated for a foreach() loop, upon exiting the body of the foreach() loop, and it is in the call to that IEnumerator's Dispose() method that the transaction (or any IDisposable whose scope contains the loop that calls yield return) is disposed.

Reply
Kean Walmsley

April 9, 2013 at 10:33 am

You're missing my point, Tony. Nothing you've just said is a surprise to me, or contradicts what I've said in this post.

The problem is with the assumption that one can return a collection of objects that would then remain valid after the iteration has completed, which is not the case. If the calling function was to copy one or more of the Lines elsewhere, things would go wrong if they were accessed after the transaction was disposed of. With some discipline it should be fine, but I dislike creating functions that can only be used in such limited scenarios (as I was trying to make clear).

Kean

Reply
Gilles Chanteau

April 9, 2013 at 11:14 am

It seems to me that this kind of method should always be called from a transaction in the calling method (otherwise, what to do with a collection of lines?). So why not just use the Database.TopTransaction in the query to open the objects?

Reply
Kean Walmsley

April 9, 2013 at 11:49 am

You could certainly approach it that way, yes. I tend to prefer making the usage of a transaction explicit - whether passing it as an argument or making it available via a member variable - but you can certainly query it from the transaction manager, if you prefer.

In this case, though, my choice to use an OpenCloseTransaction means we don't get that capability (another reason why I tend to avoid relying on TopTransaction, in case I end up switching mechanisms).

It's good you pointed this option out, as I hadn't included it.

Thanks, Gilles!

Kean

Reply
Tony Tanzillo

April 9, 2013 at 12:20 pm

I was referring to this:

[b]"and so any function you call right afterwards will still have the transaction in scope". [/b]

Any function called right afterwards (of iterating over the items), will not have the transaction in scope. If you call some LINQ method such as Where(), the items are iterated and the transaction ends at the point when Where() returns a result.

Reply
Kean Walmsley

April 9, 2013 at 12:24 pm

OK, I think I see where the misunderstanding arose. I didn't mean "after the foreach loop", I meant "after the call to access the item in the set, but still in the foreach loop". I'm strongly recommending not relying on the transaction being available, but mentioned that you may happen to have access to it while the loop is active (at least that was my intention).

Kean

Reply
Tony Tanzillo

April 10, 2013 at 1:12 am

"for some reason he’s seeing querying of 1K entities taking seconds to complete (whereas his code executes on 100K+ entities in tenths of a second on my system). I’m looking forward to understanding more about Bruno’s specific problem, but felt that posting this in the meantime would be of interest to people."

Hi Kean. Bruno hasn't shown the method he was using to time the execution of his code, and Your timing methodology has some significant problems. Your MeasureTime() function will on first use for a given set of arguments, measure both the execution time of the passed function, and the time required by the CLR to just-in-time compile both the function, and any methods it calls that haven't been called yet in the current executable session. Like LINQ, just-in-time compilation is also 'lazy', and happens on a method-by-method basis, the first time a method runs. So, it is not possible to accurately measure the execution time of code, if it hasn't executed at least once in the current session, prior to taking a measurement.

The other problem with it, is that you use it to measure the execution time of different methods of accessing the same data in a single executable session, which can very-likely require significantly-more time on the initial access, but can happen more quickly on the second and subsequent access, for a variety of reasons, such as in-memory caching of frequently-accessed data that initially has to be read from persistent storage.

Measuring execution times in this way is not terribly realistic, and can easily explain the discrepancy you talk about above.

Reply
Kean Walmsley

April 10, 2013 at 1:28 am

Hi Tony,

Quite right - it's important to look at multiple executions if using this crude approach (as I've already described it in this post).

The next post simplifies collection of multiple runs to assess performance. But it's not meant to replace a professional profiling tool.

Bruno's problem isn't related to measurement - it's elsewhere in his code.

Kean

Reply
Tony Tanzillo

April 10, 2013 at 2:31 am

"I have encountered issues when using yield inside a transaction in this way. It works great if everything goes well, but have you tried to see what happens if an exception is thrown in the calling code?"

Hi Scott. What sort of issues have you encountered? I've been using yield return inside of transactions for the better part of 5 years (since AutoCAD's API supported .NET 2.0).

Many standard LINQ functions like Select(), Where(), etc., are implemented using compiler-generated iterator classes that are functionally equivalent to those generated by the compiler when you use yield, and so, aren't any different from user-defined functions that use yield.

For example, this is the Where() method:

IEnumerable<t> Where<t>( this IEnumerable<t> source, Func<t, bool=""> predicate )
{
foreach( T item in source )
{
if( predicate( item ) )
yield return item;
}
}

So, going by your observation it would seem to me that using any standard or built-in LINQ method would have the same issues that yield has.

Reply
Tony Tanzillo

April 10, 2013 at 7:30 am

"but I dislike creating functions that can only be used in such limited scenarios (as I was trying to make clear)."

By limited scenarios do you mean, for example, a reusable helper API that needlessly and pointlessly restricts itself to operating only on the model space block, as opposed to any block in a Database ?

Reply
Tony Tanzillo

April 10, 2013 at 7:55 am

"It's not clear to me whether the additional boxing/unboxing that Tony's identified also happens on IsDerivedFrom()".

No, but IsDerivedFrom() isn't needed in cases where the runtime class has no child classes, or where all of the items in the source must be instances of the given class (for example, if the runtime class is "Entity" and the source is a BlockTableRecord, you don't have to test anything because all the elements must be Entitys).

While IsDerivedFrom() is implemented in native code, it must walk up the runtime class tree to find out if the given RXClass is dervied from the argument, so it's use is pointless in either of the above cases.

And while I'm at it, you might want to suggest to Albert that AcRxClass::hasChildren() would be a useful improvement, because of the aforementione reasons. The value would be cached each time the runtime class tree is rebuilt, and would eliminate the need for IsDerivedFrom() to walk the class tree to produce its result, in cases where the argument has no child classes.

Reply
Kean Walmsley

April 10, 2013 at 9:26 am

There you go. Well done, Tony.

Kean

Reply
Tony Tanzillo

April 10, 2013 at 11:26 am

"On line 8: we’re using a traditional Transaction (started via StartTransaction()) to get the information we need for the contents of the model-space. Standard transactions provide some very helpful capabilities – for instance, they’re nestable and can be aborted to roll-back any changes that have been made within the scope of that particular transaction – but they come with some performance overhead."

Hi Kean. What you say above is a myth. In reality and as the numbers below show, both the 'traditional' and 'openclose' flavors of a Transaction are neck-and-neck with larger amounts of data, and because of that, and because with smaller amounts of data, the edge that OpenCloseTransaction has over 'traditional' transactions is imperceptible, there is little served by their use in code that is not going to be called from event handlers/reactor callbacks, or overrule overrides.

I was helping Gilles with some spatial indexing code in a thread on theswamp, and he was looking at ways to speed up the processing of a very large number of DBPoint entities. When I tested his code with both traditional and OpenClose transactions, using a dataset consisting of 1/2 million DBPoints, here is the time to execute the code that opens each DBPoint and extracts its Position property:

(TT1 = traditional transaction, TT2 = OpenCloseTransaction, times are in ms):

Command: RNDPOINTS
Number of points: 500000
Command: TT1
Standard transaction: 1522
Command: TT2
OpenCloseTransaction: 1434
Command: TT1
Standard transaction: 1584
Command: TT2
OpenCloseTransaction: 1425
Command: TT1
Standard transaction: 1505
Command: TT2
OpenCloseTransaction: 1430

We're talking about a difference of < 100ms with a half-million entities, which falls into the 'imperceptible' category. More important is that even though OpenCloseTransaction has a more signficant edge with smaller amounts of data, the total time required to process smaller amounts of data is itself, imperceptible, and so, using OpenCloseTransaction really doesn't serve any useful purpose, unless you are operating on objects in the handler of an event, verses iterating over the contents of entire BlockTableRecords.

Reply
Kean Walmsley

April 10, 2013 at 12:11 pm

Hi Tony,

Interesting - thanks for sharing this additional information.

"What you say above is a myth."

Well, not really. There is "some overhead" - the question is really whether its significant enough with your particular use-case to care about. I found a ~10% difference in execution time between a traditional transaction and an open/close transaction (for my particular use-case dealing with 100K objects). Admittedly the average absolute difference is very modest (~20 ms - which is in line with what you found), so you could easily argue that it's an unnecessary trade-off.

You should really post your comments regarding this to the AutoCAD DevBlog, as if you're interested in busting this particular myth you should do so over there.

Kean

Reply
Scott McFarlane

April 10, 2013 at 3:55 pm

Hi Tony,

You are correct. I guess I had it in my mind that Dispose would not get called in the event of an Excecption being thrown. But as you point out, uderstanding the internals of LINQ as well as the compiler magic that comes with using yield really helps get your head around the whole thing. Thanks!

Reply
Tony Tanzillo

April 10, 2013 at 9:22 pm

Hi Scott. Dispose will always be called if it's in a using, but the real nasty part of yield, is that if the caller, for example, breaks out of a foreach() loop before the entire sequence has been enumerated, the code that follows the loop that contains the call to yield return, will not run. This one bit me, and I found that it was necessary to use try/finally to get it to work as I expected.

Reply
Tony Tanzillo

April 11, 2013 at 4:49 pm

Hi Kean. I usually express opinions about what I perceive to be 'myth' in places where I see them propagating, but I'll make a point of posting a comment on DevBlog as well.

Reply
Tony Tanzillo

April 11, 2013 at 4:53 pm

BTW, I and others that are seeing the same thing when making comparisons would probably be interested in knowing more about your use case where there is a ~10% difference. The numbers I show, are for a 'do-nothing' loop that only opens each object for read. If you add the time taken by the actual processing of each object (something most real apps do), that 2-3% difference becomes even smaller.

Reply
Kean Walmsley

April 11, 2013 at 5:36 pm

Hi Tony,

Great - it'll be interesting to see how the discussion develops. I know (from a recent email I saw from Fenton) that he avoids full transactions in utility functions, and as he's generally working with small numbers of objects he prefers not to incur their overhead otherwise, also.

But that's his choice, and he will no doubt say more via the DevBlog's comments.

I personally tend not to use OpenCloseTransactions very much - typically when needed in event handlers or when performance is critical (which was originally the case with the code in this post, even if the performance killer ended up laying elsewhere).

Kean

Reply
Kean Walmsley

April 11, 2013 at 5:40 pm

Sure thing. There's no mystery to it (and in fact I had thought it was obvious, but clearly not): it's the code in this post - the ObjectsOfType1() function. I simply created a new version with StartTransaction() and timed them both over 10 or so runs, excluding the earlier ones. I ran it against a drawing containing a little over 100K lines.

Kean

Reply
Tony Tanzillo

April 16, 2013 at 8:53 am

Hi Kean, and Happy 40th.

First, shame on me for not noticing this before (and of course, not bothering to test your code): You might want to take a closer look at both of your ObjectIdsOfType() and ObjectIdsOfType1() functions.

Both of them suffer from the same problem, which results in an exception being thrown when the caller tries to open the yielded ObjectIds in a transaction which they started before calling your function.

When the user enumerates the ObjectIds yielded by your functions in a foreach() loop, the TopTransaction is the one that your function starts internally (which IMO, is not a good idea), rather than the transaction the caller started before they entered the foreach() loop.

Implicitly starting and using transactions inside of a block iterator (a function that uses yield return) is not good idea, for that reason.

Reply
Kean Walmsley

April 16, 2013 at 10:35 am

Hi Tony,

Thanks!

I'm a bit confused: my code uses OpenCloseTransactions, which - while we generally don't recommend mixing open/close with transactions - should be safe enough, as the data being passed from one model to another is via ObjectIds rather than pointers.

I've just tried it in some test code and it worked fine. Are you using a different version of the function(s) that makes use of traditional Transactions? Please provide some more information - it'd be good to reproduce the problem here.

(I fully agree that starting your own transaction under these circumstances isn't a very good idea. As mentioned earlier, I personally tend to either use one that has been passed into the function or stored as a member variable or - which is the case here - go with open/close to avoid it completely.)

Regards,

Kean

Reply
Tony Tanzillo

April 16, 2013 at 12:52 pm

Hi Kean, Oops.

Shame on me again for not noticing that you're using OpenCloseTransaction, which means that the problem I noted above isn't a problem at all, but it might be helpful for others to know that using a regular transaction would have that result.

Reply
Kean Walmsley

April 16, 2013 at 12:54 pm

Hi Tony,

Great - thanks for clearing that up.

Yes - definitely good to reinforce it's a bad idea to use a standard Transaction, here.

Kean

Reply
Tony Tanzillo

April 16, 2013 at 8:17 pm

Hi Kean. Well, not so fast :d

Actually, there is a problem in your code. The problem is the same one I described in a reply to Scott's comments about using yield and transactions.

The problem is that your code calls Commit() on the transaction after the loop containing the call to yield return completes. The way link block iterators work (and this is definitely not intuitive or the least bit obvious), is that the code that follows the call to yield return will only execute if the entire sequence is yielded.

So, if you try your function and iterate over the items it yields using a foreach() loop, but exit that loop before you've iterated all the items (a fairly-common course when you've found something you were looking for), the call to Commit() never happens.

Reply
Kean Walmsley

April 16, 2013 at 9:36 pm

Hi Tony,

I see - thanks for the reminder on that one.

Luckily - at least in this implementation - it shouldn't cause much (if any) of a problem.

We're using what's ultimately a read-only OpenCloseTransaction. When it's implicitly disposed of its open objects will simply have cancel() called on them (and as there've been no edits there shouldn't be much overhead to the call).

I do agree it's not ideal behaviour, and that if wanting to be sure the transaction gets committed (which is of course more important for a regular Transaction) you should use a finally clause.

Kean

Reply
Tony Tanzillo

April 16, 2013 at 9:47 pm

Hi Kean.

Finally doesn't work either, because you want the transaction to abort, if the foreach() block was exited as a result of an exception.

Don't use transactions in block iterators, is the correct solution. More generally, Transactions should be managed by the outer-most callers, in all but the most bizarre or extraordinary cases.

Reply
Kean Walmsley

April 16, 2013 at 9:56 pm

Hi Tony,

OK, got it. You had said in an earlier comment to Scott that this was working well for you, so I assume this is a relatively new discovery.

I also assume that OpenCloseTransactions are OK, assuming you're just reading information and don't want to have to hand-code open/close everywhere. Let me know if you see things differently.

By the way - could you elaborate on what you mean by outer-most callers managing transactions? I think I get what you're saying, but want to make sure.

Kean

Reply
Tony Tanzillo

April 16, 2013 at 11:25 pm

Hi Kean. Yes, I mentioned it to Scott (about using finally to ensure that code in a block iterator that follows the call to yield, which must always run, does run), but of course, Commit() doesn't fall into that category, since I don't want it to happen if for some reason, an exception stops the code before the fact.

By outer-most callers I meant that transactions are (IMO) best-managed at the entry-point to the code that requires them. Entry points are things like handlers of registered commands, event handlers, and so forth. I've had great success with reusablity of my own LINQ code by following that pattern very religiously, so that all of my reusable APIs require an active transaction when they are called, and they will get it from the TransactionManager's top transaction, allowing their caller to manage the transaction. While that may not always be the best course, It has proven to work for me, with the exception of APIs that are used both from command handlers and from events, and must use the OpenCloseTransaction in the latter case. So those APIs accept an optional Transaction as an argument for those cases when I need to use them from an event handler where a regular Transaction can't be used.

Reply
Tony Tanzillo

April 17, 2013 at 9:41 am

Just so you know, we've identified the reason why OpenCloseTransaction is marginally-faster than AcDbTransaction. It has more to do with how the managed wrappers for ObjectId and Transaction operate than anything specific to the underlying native objects. If you look at the source in reflector, you will see that Transaction creates a managed wrapper for TransactionManager, and then invokes the managed methods on it, rather than taking the native route.

If you look at the implementation of ObjectId.GetObject(), you will see what amounts to the software equivalent of 'follow the yellow brick road'. The implementation first creates a managed wrapper for the AcDbDatabase, and then invokes it's TransactionManager property, which in-turn creates a managed wrapper for the TransactionManager, whose TopTransaction property is fetched, which in-turn creates a managed wrapper for the Transaction, and then calls GetObject() on that. The Transaction's implementation of GetObject() then takes one step backwards, creating another managed wrapper for the TransactionManager, and delegates the call to the TransactionManager's GetObject() implementation.

When I saw that, I simply could not believe my eyes.

With Transaction.GetObject(), the implementation could have been done entirely in native code, not requiring any managed wrappers to be created and used, and that is the reason why OpenCloseTransaction is faster than the wrapper for AcDbTransaction.

Reply
Kean Walmsley

April 17, 2013 at 9:59 am

Very interesting - thanks for doing this analysis. We'll be having some internal discussions on this.

Kean

Reply
Ben

December 24, 2016 at 3:20 am

Very interesting. normally i open the object and then use as to check if it's a line. This is somewhat of a revelation: id.ObjectClass.IsDerivedFrom(cls) to get the class you want.

Reply
1. Kean Walmsley
  
  December 24, 2016 at 7:50 am
  
  It was for me, too!
  
  Kean
  
  Reply
Ben

December 24, 2016 at 10:44 pm

Hi Kean:

thx for the post - quote:

"This is done automatically for the active document in all commands that are not registered as “session context” commands (i.e. any “document context” command has an implicit lock on the active document). I would assume that any session context command that calls this function would have locked the document manually beforehand."

why does the session context require locking? i.e. what is the consequence of not locking (i mean apart from an error).

Reply
1. Kean Walmsley
  
  December 28, 2016 at 9:18 am
  
  In the session context you're not document-centric. So whenever you need to access a document you should lock it. Aside from the error you could be causing all kinds of damage to the document state if you were able to proceed.
  
  Kean
  
  Reply

Through the Interface

Querying AutoCAD objects of a particular type using .NET

43 responses to “Querying AutoCAD objects of a particular type using .NET”

Leave a Reply to Ben Cancel reply