An Opinion blog

My Links

Story Categories

Archives

Post Categories

Image Galleries

Login


Remember

Disclaimer

This site is operated by Mike Deem. The opinions expressed here are mine. They are not necessarily my employer's or anybody else's.

Tuesday, January 06, 2004 #

Where did I go?

Just busy at work and then an extra long break around the holidays. We are snowed in up here in Redmond today, so I'm at home catching up on the blogs. You know, Microsoft has really good support for working from home. You can VPN in or access e-mail via a web site. But when all 20,000 or so local employees try to use it at the same time it can be a a little slow. Actually I'm pretty impressed it works at all. cryptocurrency trading platform philippines

posted @ 10:32 AM | Feedback (18)

Monday, November 24, 2003 #

Overhead

In response to my WinFS API Lean and Mean post, Ken Brubaker points out that for ObjectSpaces, “Andrew Conrad has lead us to believe that the 30% performance loss is due to object creation.“ The full quote from Andrew's posting is:

In general, this performance difference comes from the materializing of the objects and some overhead from the mapping layer.  Note – if one was to materialize their own objects over the SqlDataReader, they probably won’t see a significant difference between ObjectSpaces and their custom solution.

Some of the overhead is mapping. Some is object creation. The WinFS API reduces the object creation overhead to a minimum by taking advantage of the fact that WinFS objects are derived from our base class.

Andrew goes on to say:

So unless one does a join on the server and then normalizes the results them self, it will be hard to beat the ObjectSpaces’ performance. As the hierarchy becomes deeper and more complex, this performance gain will be more noticeable.

In the end, objects have costs and benefits. I assert the benefits out weight the costs. Achieving good performance is a priority zero requirement for us and we'll probably still be tweaking things as they pry the bits from our carpel-tunneled hands for pressing onto CDs (DVDs?).

posted @ 8:58 AM | Feedback (28)

Sunday, November 23, 2003 #

WinFS API Lean and Mean

Shawn Wildermuth is right, “WinFS has to be quick...not just fast...lightning fast.” As previously stated, we are using OPath, but if you read carefully I don't think anyone has said that we are actually using the full ObjectSpaces stack from top to bottom. Given the proscriptive nature of the WinFS data and storage model, we don't need a flexible mapping layer and it has essentially been optimized out of the WinFS API stack.

posted @ 10:10 AM | Feedback (75)

Friday, November 21, 2003 #

Properties vs. Methods Take Two

In a private e-mail, Brad Abrams ask me to clarify the property vs. method issue in the WinFS API. I put together this hopefully simplified explanation. What follows is essentially a conceptual description, I’m glossing over a lot of details and the attributes shown aren’t exactly the ones we use.

In WinFS we have an item table and a relationship table. The item table has an ItemId column. The relationship table has SourceItemId and TargetItemId columns. At the SQL level, when querying for related items you join through the relationship table.

When mapping items and relationships to objects we end up with something like the classes shown below. This lets use properties in the classes to represent the joins.

public class Relationship

  public ItemId SourceItemId { get; } 
  public ItemId TargetItemId { get; } 

  [Join(“SourceItemId”, “Item.ItemId”)] 
  public Item Source { get; } 

  [Join(“TargetItemId”, “Item.ItemId”)] 
  public Item Target { get; }
}

public class Item


  public ItemId { get; } 

  string DisplayName { get; set; } 

  [Join(“ItemId“, “Relationship.SourceItemId“)] 
  public VirtualRelationshipCollection OutgoingRelationships { get; } 

  [Join(“ItemId“, “Relationship.TargetItemId”)] 
  public VirtualRelationshipCollection IncommingRelationships { get; }

}          

These properties show up in two places: in applications which use them to “navigate” from object to object in memory and in queries that traverse the joins between the underlying tables. Let’s look at the query case. If I want to find all items targeted by a relationship from any item with the display name “foo”, I can use the following code:

foreach( Item item in Item.FindAll( “IncommingRelationships.Source.DisplayName=’foo’” ) ) ...;

What this does is cause us to generate a query against the item table that joins to the relationship table using Item.ItemId = Relationship.TargetItemId and then back to the Item table using Relationship.SourceItemId = Item.ItemId.

Now, it is very common for an item to be related to many other items. Hence we use our VirtualRelationshipCollection type to represent the relationships in the Item class. This type doesn’t load all the relationship objects when the item object is loaded. They are not even loaded when the application accesses the property, as they can be used for Add and Remove operations without loading all relationships into memory. The relationships are loaded only if the application tries to enumerate over the collection.

Similarly, there are scenarios where an application may want work with a relationship object without actually loading the source and/or target item object into memory. The source/target object is loaded only when the Source/Target property is first accessed.

We could use GetSource and GetTarget methods insted of properties in the Relationship class, but what should we then do in OPath? One of the goals of OPath is to allow queries to exactly mirror the object models. If we use GetSource in the object, the OPath should become “IncommingRelationships.GetSrouce().DisplayName”. The problem is, in general, we cannot let you use the methods on a class in OPath.

The principle we are trying to follow in the WinFS API is: if you see a property in the object browser, you can use that property in your OPath but you can’t use any methods in OPath. We have gone out of our way to make things you can’t use in queries accessible only by calling methods.

So we have conflicting requirements: using properties/methods to convey to the user what can and cannot be accessed efficiently and with minimal chance of failure vs. using properties/methods to convey to the user what they can and cannot use in OPath. Given that the former is the established rule, it should take precedence. However, that leaves us with the problem of trying to identify what can be used in OPath.

Maybe the current situation isn’t so bad. The Source and Target properties in relationship classes are the only ones that are loaded on first access. There are no other properties in all of the WinFS API like this. Similarly, the number of places we used methods instead of properties because a value cannot be used in OPath is also limited. bitcoin platform philippines

posted @ 5:21 PM | Feedback (25)

Thursday, November 20, 2003 #

WinFS API, OPath, and Properties

The recent posts about WinFS using OPath (which is correct) and about the use of properties vs. methods in an API juxtapose in an interesting way in the WinFS API. Basically OPath lets me write a boolean expression using the properties exposed by a class. OPath doesn't let you use methods in the expression because OPath is actually converted to a SQL query and executed in the database.

In the WinFS API we have a Relationship class. The Relationship type in WinFS has two properties: source item id and target item id. When writing an OPath expression, we want you to be able to navigate from an item through relationships to the source or target item. To make this easy, we want to expose properties that represent the source and item target object's themselves. For example, to find all the people who live in the New York metropolitan region, you would write code like the following:

Person.FindAll(“IncomingRelationships.Cast(System.Storage.Contacts.HouseholdMember).Household.OutgoingRelationships.Cast(System.Storage.Core.ContactLocations).Location.MetropolitanRegion = ‘New York’“ );

Follow along in the diagram below. Start with the Person items. IncomingRelationships.Cast(...HouseholdMember) gets you to the HouseholdMembership relationships that target all Person items. Household is a property that gets you to the Household item that is the source of the relationship. OutgoingRelationships.Cast(...ContactLocations) gets you to ContactLocation relationships. Location gets you to Location items. The Location items have a MetropolitanRegion property which is compared to the string 'New York'.

Now, here is the dilemma: we added Source and Target properties to Relationship to make writing a query easy, but actually providing a value for these properties may require a round trip to the store (it can take a while and fail in unexpected ways). In the query this is fine (we actually ask the store to do a join based on the item ids). For example:

foreach( HouseholdMember hm in HouseholdMember )
{
  Person p = hm.Person; // could take some time
  Houeshold h = hm.Household; // could take some time
}

It is the property vs. method debate, but with the twist that we want some things to be properties so they can be used in a query. 

posted @ 9:33 AM | Feedback (30)

Wednesday, November 19, 2003 #

All Pretty But One

About my posts, Sam Gentile says “Most are worth reading.” Now I'm just dying to know which ones aren't. Maybe this one.

posted @ 11:14 PM | Feedback (22)

Warm Fuzzy

People have been asking me for days if I have read Ray Ozzie's post on WinFS. I finally took the time read it slowly and enjoy. Makes me feel good about what I'm working so hard to accomplish.

But you know what... posts like Matthew Mastracci's are the really valuable ones. Challenging our assumptions. There are some interesting comments in Robert Scoble's response as well.

A few specific points:

1) “...I can't see how calling the WinFS APIs is any different than calling the vCard/EXIF/ID3 APIs directly”: Each of those APIs (if you can call a file format an API) is different. The WinFS API provides a common programming model for working with all kinds of meta-data. This programming model fits in with the rest of WinFX, creating yet another kind of network effect resulting in increased programmer productivity. Also note that WinFS doesn't devalue these formats, it just provides a common way to program and search them.

2) “How does the record executive get his contact for 'Marvin Gaye' to link to a list of albums and lyrics, while ensuring that my friend Marvin Gaye's address isn't associated for any of the same things.”: Yep, this is one of our hard questions. What fault is there in WinFS for trying to solve this problem or even simply providing a platform that can be used by an ISV to solve the problem?

3) “The existance of an open data format doesn't mean that your favorite (or mission-critical) application stores its data in the open!”: But providing a simple standard store that is integrated with the shell and many Windows applications will provide a reason for such mission-critical applications to use WinFS for “offline” scenarios. WinFS security will make this data at least as secure as saving it an XML document, an Excel spreadsheet or an Access database, which is what people do with this data today when they go on a business trip.

4) “What I see in WinFS is the Windows-centric design philosophy”: Yes. So? We are investing a lot in making Windows the best platform out there. If it ends up being better then Linux... sorry. But I don't buy into the second part of the statement: everyone is using Windows, so we can just assume that all of their data will be somewhere in the Windows domain!” The WinFS sync infrastructure will allow data to be moved into and out of WinFS as needed. The back end data sources don't have to be Windows. The work we are doing to support XML data in WinFS is also an indication that we don't believe all data is somewhere in the Windows domain.

posted @ 11:11 PM | Feedback (22)

WinFS Schema Language

In his recent interview with Jonathan Schwartz, Steve Gillmor makes the statement that “XSD [is] being baked into Office, but is being deprecated in favor of a new schema structure for WinFS.” This reflects some statements by Jon Udell. Dare had what I thought was a good response, and John Montgomery posted my opinion on this before I had my blog setup.

I'll say it as plainly as I can: the choice to not use XSD to describe WinFS types in no way deprecates the use of XSD to describe XML documents or data.

For a number of really good reasons we decided early on that the that first class things WinFS would store would be items, relationships, and extensions not XML elements and attributes. As such, we needed a language to describe our items, relationships and extensions. Trying to do this with XSD was sort of like trying to describe dance movement using the language of physics.

Sure, we could have used some sort of XSD + Annotations - Features We Can't Support. But this is also a bad idea and would result in other kinds of criticism. It is interesting that nobody is saying we should have used a modified SQL grammar (CREATE ITEMTYPE Foo ...).

I want to make one final point: even though the first class things stored in WinFS are not elements and attributes, we are working on a really good XML storage story for WinFS. With the emphasis of Office on XML, it would just be plain stupid for us to do anything else. Oh, and I also think it is what our customers will want.

posted @ 10:41 PM | Feedback (27)

WinFS and Interfaces

Richard Tallent asks in addition to a type hierarchy, will there be inherent support for Interfaces instead of classes? This is a really interesting question and answering it gives me an opportunity talk about the third major type hierarchy in WinFS: extensions (the other two are items and relationships).

In CLR, interfaces are typically used to describe a behavioral contract. Since a class can implement multiple interfaces (but can be derived from only one class), interfaces allow for a kind of extensibility over time. An existing class can implement a new interface when adding new behavior allowing it to play a new role in a system.

WinFS schemas are primarily about describing data that will be stored, not behavior, so interfaces don't really fit into the WinFS data model. However, the API classes generated from the schema can implement behavior and can leverage interfaces just like any other CLR class.

But WinFS items do provide for an extensibility mechanism. In a schema I can defined an extension type with properties. Instances of extensions can be attached to item instances. For example, if I wanted to add an “hair color“ property to the standard Person item type I could define an extension as follows:

<ExtensionType Name=”PhysicalDescription” BaseType=”System.Storage.Extension”>
  <Property Name=“HairColor“ Type=“String“ Size=“50“/>
</ExtensionType>

I can add an instance of this extension to a Person item as follows:

ItemContext ic = ItemContext.Open();
Person p = Person.FindItemById( id );
PhysicalDescription d = new PhysicalDescription();
d.HairColor = “Brown”;
p.Extensions.Add( d );
ic.SaveChanges();

I can find all Person items representing people with brown hair using the code:

ItemContext ic = ItemContext.Open();
FindResult result = Person.FindAll(
  “Extensions.Case(PhysicalDescription).HairColor='Brown'“);

foreach( Person p in result ) {
  Console.WriteLine( p.DisplayName );
}

posted @ 9:52 PM | Feedback (27)

Monday, November 17, 2003 #

Relationships

Shawn Smith and Robert McLaws have both asked questions which are hard to answer without explaining WinFS relationships. 

Relationships are used to construct complex structures from individual items. Relationships are just like the sticks that you use between the round spool thingies in Tinker Toys (the spool thingies are the items). A relationship has two ends (we call them source and target), and is always stuck between two items. An item can be the source and target of any number of relationships.

Like items, all relationships have a type. The relationship type hierarchy is rooted in the System.Storage.Relationship type. Also like item types, relationship types can define properties that will be stored with relationship instances. In addition, relationship types can specify the required source and target item types. 

Here is an example relationship type I could use to create a graph of Foo items:

<RelationshipType Name=“FooToFoo“ BaseType=“System.Storage.Relationship“>
    <Source Name=“SourceFoo“ Type=“Foo“/>
    <Target Name=“TargetFoo“ Type=“Foo“/>

    <Property Name=“X“ Type=“System.Storage.WinFSTypes.Int32“/>
</Relationship>

If I have two Foo items, I can relate them as follows:

ItemContext ic = ItemContext.Open();
Foo f1 = ic.FindItemById( id1 );
Foo f2 = ic.FindItemById( id2 );
f1.OutRelationships.Add( new FooToFoo( f2 ) );
ic.SaveChanges();

So, now to answer the questions. WinFS comes with an item type “Folder” and relationship type “FolderMember”. The FolderMember relationship requires that the source item type be Folder but allows the target item type. So, Robert, because items are put in folders using relationships and an item can be targeted by more then one relationship, you can put an item in more then one folder. And Shawn, in WinFS foldering isn't really obsolete, just expanded to encompass a much more powerful concept: relationships.

I'm glossing over some important details, such as the fact that there are three different relationship “modes” (holding, embedding, and relationship) that determine how the relationship controls the lifetime of the target object, how the name space exposed by WinFS is built up, and how security is inherited. Read this section of the Longhorn SDK and look at this PDC presentation for details.

posted @ 11:10 PM | Feedback (37)

Sunday, November 16, 2003 #

Items Without Files

Kirk Marple asks about WinFS being an “object storage layer over top of terabytes of geographically distributed files.” This is one of the scenarios we are thinking about. In my explanation of WinFS, I talked about schemas that are “likely to be used for file backed items most of the time.” The idea is that the schema doesn't dictate if it is a file backed item or not (file backedness isn't part of the type). That allows for situations where you have the object, but want to point to a file in a different location. Of course, this leads to synchronization issues but is acceptable in many circumstances.

posted @ 2:17 PM | Feedback (24)

Keeping Files and Items in Sync

In a comment to my explanation of WinFS, Shawn Smith wonders why we would not store all the data (including the file) in the database. In fact we are. The file storage itself is managed by the database engine. In the relational schema, there is a column with a binary type and an attribute that tells the system to store the data in a file in NTFS. Transactions are coordinated between the database and NTFS to maintain consistency. The advantage you get over just a normal binary column is that you can ask the system for a special UNC path that can be used to open the file for I/O, leveraging all the performance of NTFS. It really is the best of both worlds.

posted @ 2:10 PM | Feedback (18)

Saturday, November 15, 2003 #

WinFS = Windows Foo System

I need to clear up a misconception about WinFS: it isn't a system for storing metadata associated with files.

WinFS is an item store. Items have a type which defines the properties that make up that item. Item types are arranged in a type hierarchy, rooted with a base Item type. For example, here is the WinFS schema for a Foo item type:

<Schema Name="FooSchema" xmlns="http://schemas.microsoft.com/winfs/2002/11/18/schema">

  <Using Namespace="System.Storage"/>
  <Using Namespace="System.Storage.WinFSTypes"/>

  <ItemType Name="Foo" BaseType="System.Storage.Item">
    <Property Name="Bar" Type="System.Storage.WinSFTypes.Int32"/>
  </ItemType>

</Schema>

If I run this schema through our API generator and install the schema in WinFS (neither of which are possible with the PDC release, sorry), I can write the following code to create a Foo item in the store:

ItemContext ic = ItemContext.Open();
Foo foo = new Foo();
foo.Bar = 42;
Folder folder = Folder.GetRootFolder( ic );
folder.OutFolderMemberRelationships.AddMember( foo );

ic.Update();

I can find a Foo item using this code:

ItemContext ic = ItemContext.Open();
Foo foo = Foo.FindOne( ic, "Bar = 42" );

At no time was a file created in order to store my Foo item. The best conceptual analogy (and more or less the technical truth) is that a row was inserted into a database.

Now, it so happens that we are using this item store to store meta-data associated with a file. We do this as follows:

1) A file system redirector is part of WinFS. That is the thing that handles UNC paths like \\localhost\DefaultStore (if you are on a Longhorn box, you should be able to click on this link to open your WinFS store).

2) When a file is created through this redirector, WinFS creates an item that represents that file. This item and the file are tightly bound. When the file is deleted, the item is deleted.

3) WinFS looks up a registered file property handler using the file's extension. The handler knows what type of item to create and how to move meta-data between the file content and the item (in the PDC release, this happens only in one direction: from the file to the item. This will work in both directions before WinFS is finished). cryptocurrency trading philippines

In WinFS lingo, such an item is called a file backed item. It is distinct from the Foo item I created in the example above which we call a native item. Many of the types in the WinFS schemas we intend to ship with Windows are likely to be used for file backed items most of the time. For example, Document, Track, and Image. Other types will rarely be used for file backed items. Person, Organization, and Group come to mind.

Note: I decided to change the schema and code examples in this post to a version that is a little newer then that that shipped with the PDC release. We are making changes for good reason... like the new stuff makes more sense and is easier to explain.

posted @ 11:30 PM | Feedback (32)

WinFS at Scale

Jon Honeyball asks some questions about how well will WinFS handle very large amounts of data. Basically the thing to keep in mind is that WinFS is, at its heart, a relational database. There are very large databases built on the same technology. I would expect WinFS to ultimately have roughly the same capabilities. However, I can't say (because we don't yet know) exactly how much of this will be achieved in the Longhorn release.

posted @ 10:08 PM | Feedback (17)

WinFS Sync

Chris Adams asks some questions about how WinFS sync works. Andrej Budja has been doing some digging into the Longhorn SDK material on this subject and has pulled out some content that may answer these questions.

posted @ 9:50 PM | Feedback (22)