Tuesday, March 20, 2012

VirtualizingDataList – Displaying Large Datasets

In the last few posts I have introduced a number of features from the data framework included as part of the open-source Cocoon framework for Windows 8 Metro-style apps. Following an introduction to the framework I described the SimpleDataListSource and PagedDataListSource.
To recap, the IDataListSource implementations provide a simple way of accessing data from web APIs. These have been designed to reflect the typical structure of such APIs, generally resulting from a series of stateless HTTP requests to retrieve the data, often a page at a time.
In response to data provided from an IDataListSource, the DataList implementations are designed to request and display the information to the user. The aim here is to provide the user experience expected for a client application, with large scrolling lists of items and suitable support for background on-demand data retrieval.

Introducing The VirtualizingDataList

The first DataList implementation provided in the Cocoon data framework is the VirtualizingDataList<T> class. This can be used for situations where a large scrolling list only displays a small subset of the data at any one time. Only the data in view will be retrieved, with more data being fetched on-demand as the user scrolls through this list. In addition it supports the WinRT ISupportPlaceholder interface to indicate to the displaying UI to display a placeholder element whilst data is being retrieved in the background. This builds on top of Cocoon’s VirtualizingVector class so more information on data virtualization and placeholder support can be found in the associated post.
To use a VirtualizingDataList<T> you simply create a new instance, passing the IDataListSource via its constructor. It can then be bound to any of the many controls for displaying lists of information, most commonly the GridView and ListView controls.
If you are following the MVVM presentation pattern then you will expose the data list via a view model,

public class MyViewModel
{
    // *** Fields ***
 
    private IList<Person> employees;
 
    // *** Constructors ***
 
    public MyViewModel()
    {
        IDataListSource<Person> source = new EmployeesDataListSource();
        employees = new VirtualizingDataList<Person>(source);
    }
 
    // *** Properties ***
 
    public IList<Person> Employees
    {
        get
        {
            return employees;
        }
    }
}

Then in your XAML you can define a CollectionViewSource that binds to the data list, and connect that to your items control,

<UserControl.Resources>
    <CollectionViewSource x:Name="itemsViewSource" Source="{Binding Employees}"/>
</UserControl.Resources>
 
...
 
<GridView ItemsSource="{Binding Source={StaticResource itemsViewSource}}" ... />

In fact, this is exactly what the Visual Studio 11 Metro-style XAML templates will produce for you.

Summary

In the above discussing I have shown how easy it is to consume any IDataListSource and display it to the user through the fluid scrollable user experience expected of modern applications. When I introduced the Cocoon data framework I discussed “bridging the data divide” between the stateless HTTP calls of web APIs and the continuous scrollable lists displayed to the user. By coding the former as IDataListSource implementations, and the latter as DataLists, Cocoon provides the bridge to span these two worlds.

As usual the code is freely available for download from the Cocoon CodePlex site (to get the latest version go to the “Source Code” tab, select the first change set and use the “Download” link).

Notes

Please note that due to a known issue with the Visual Studio templates in the Consumer Preview (see this forum post) they disable virtualization support for the GridView. Therefore although the VirtualizingVector and VirtualizingDataList classes will still work, they unfortunately fetch all items in the collection rather than on demand. To re-enable the virtualization support you can remove the ScrollViewer that surrounds the GridView, however this will clip elements a little strangely. Hopefully this will be resolved in the final release of Visual Studio, although I am working on a temporary workaround.

4 comments:

jorney said...

I have tried to use Cocoon to show my local photos. when the APP runs up, it consumes 200MB memory, but when I scroll to the end, it consumes 1.4GB, and it will never decrease. is this the behavior as designed? thanks very much!

Unknown said...

jorney,

The short answer to your question is - yes.

The long answer is that the virtualization support in Cocoon is based upon the VirtualizingList implementation internally. This currently is fairly dumb and is optimised for the case where the user is likely to scroll through the first few pages of items in the list. As you have observed, once items are fetched they are kept in memory until the list is disposed.

Obviously for cases where the user scrolls through the whole list, this is a poor experience and one that I am looking to address in future versions of Cocoon. In particular,

1. Ensuring that if the user scrolls quickly past a large number of items only those that the user stops at are retrieved (should help for common cases with large lists)

2. A better virtualization strategy for VirtualizingList whereby items are disposed if they are not being used for a period of time.

I hope this answers your question - please be assured that the intent with Cocoon is to improve on these areas as the project develops.

GMan said...

Andy,
Are you able to put together a sample that illustrates the use of placeholders and data virutalization retrieving a simple in-memory list of 2000 items and displays them in a fluid listview with the most minimal code required? It's not just accessing over remote APIs that are slow, I am hitting a local DB via entity framework and just in-memory bindings are slow.

I'm also hoping to somehow get the collection of items in code behind prior to the data binding taking place because I then need to group the items for a new collection view source afterwards and no matter what I do my items.Count == 0 in your sample or in mine. It seems to fire the threading process to perform the API request as the late binding is occuring, but that's too late for me :(

Unknown said...

GMan,

Although it wasn't the main aim you should be able use the Cocoon data list support to access local data in the same way as it does remote data.

As far as I was aware the entity framework is not (currently) available from Metro-style applications so I'm not sure how this would look in practice. In priciple you have two options (both by implementing the PagedDataListSource),

1. Return a page size of one and retrieve each item from the database (asynchronously) in the FetchPageAsync(...) method.

2. For performance you may be better retrieving the items in batches, in which case you can choose a suitable page size and place the relevant logic in FetchPageAsync(...).

Regarding your second question, the VirtualizingDataList is designed to work well with UI controls which expect you to return the result synchronously. Therefore the first call to items.Count always returns zero (since we don't know how many items there will be). This kicks of the retrieval in the background, and when complete updates the Count property with the correct value before raising property and collection changed events. The UI control then re-queries the Count property and returns the actual value.

For your use-case you need to be able to await on fetching the count (using the new C# async support). You can do this by connecting to the IDataListSource directly with,

int count = await dataListSource.GetCountAsync();