Implement data access

  • 10/11/2018

Skill 4.5: Store data in and retrieve data from collections

C# programs can create variables that can store single data values, but there are many situations in which data values need to be grouped together. Objects provide one form of grouping. You can create types that describe a particular item, for example the MusicTrack type that has been used throughout the preceding examples.

Collections are a different way of grouping data. Collections are how to store a large number of objects that are usually all the same type, such as a list of MusicTrack instances. Unlike a database, which can be regarded as a service that provides data storage for an application, a collection is a structure that is stored in computer memory, alongside the program text and other variables.

You have used collections throughout the text so far. Now it is time to bring together your knowledge of how they work, the different collection types, and most importantly, how to select the right kind of collection for a particular job.

Store and retrieve data by using dictionaries, arrays, lists, sets, and queues

Before deciding what kind of data storage to use in your programs, you need an understanding of the collection types available to programs. Let’s take a look at each in turn, starting with the simplest.

Use an array

An array is the simplest way to create a collection of items of a particular type. An array is assigned a size when it is created and the elements in the array are accessed using an index or subscript value. An array instance is a C# object that is managed by reference. A program creates an array by declaring the array variable and then making the variable refer to an array instance. Square brackets ([ and ]) are used to declare the array and also create the array instance. The statements next create an array variable called intArray that can hold arrays of integer values. The array variable intArray is then made to refer to a new array that contains five elements.

int [] intArray;
intArray = new int[5];

These statements can be combined into a single statement:

int [] intArray = new int[5];

An array of value types (for example an array of integers) holds the values themselves within the array, whereas for an array of reference types (for example an array of objects) each element in the array holds a reference to the object. When an array is created, each element in the array is initialized to the default value for that type. Numeric elements are initialized to 0, reference elements to null, and Boolean elements to false. Elements in an array can be updated by assigning a new value to the element.

Arrays implement the IEnumerable interface, so they can be enumerated using the foreach construction.

Once created, an array has a fixed length that cannot be changed, but an array reference can be made to refer to a new array object. An array can be initialized when it is created. An array provides a Length property that a program can use to determine the length of the array.

Listing 4-55 creates a new array, puts values into two elements and then uses a for loop to print out the contents of the array. It replaces the existing array with a new one, which is initialized to a four-digit sequence, and then prints out the contents of the new array using a foreach construction.

LISTING 4-55 Array example

// Array of integers
int[] intArray = new int[5];

intArray[0] = 99; // put 99 in the first element
intArray[4] = 100; // put 100 in the last element

// Use an index to work through the array
for (int i = 0; i < intArray.Length; i++)
    Console.WriteLine(intArray[i]);

// Use a foreach to work through the array
foreach (int intValue in intArray)
    Console.WriteLine(intValue);

// Initilaise a new array
intArray = new int [] { 1,2,3,4};

// Use a foreach to work through the array
foreach (int intValue in intArray)
    Console.WriteLine(intValue);

Any array that uses a single index to access the elements in the array is called a one dimensional array. It is analogous to a list of items. Arrays can have more than one dimension.

Multi-dimensional arrays

An array with two dimensions is analogous to a table of data that is made up of rows and columns. An array with three dimensions is analogous to a book containing a number of pages, with a table on each page. If you find yourself thinking that your program needs an array with more than three dimensions, you should think about arranging your data differently. The code next creates a two-dimensional array called compass, which holds the points of the compass. Elements in the array are accessed using a subscript value for each of the array dimensions. Note the use of the comma between the brackets in the declaration of the array variable. This denotes that the array has multiple dimensions.

string [,] compass = new string[3, 3]
{
    { "NW","N","NE" },
    {"W", "C", "E" },
    { "SW", "S", "SE" }
};

Console.WriteLine(compass[0, 0]);  // prints NW
Console.WriteLine(compass[2, 2]);  // prints SE

Jagged arrays

You can view a two-dimensional array as an array of one dimensional arrays. A “jagged array” is a two-dimensional array in which each of the rows are a different length. You can see how to initialize one here:

int[][] jaggedArray = new int[][] 
{
    new int[] {1,2,3,4 },
    new int[] {5,6,7},
    new int[] {11,12}
}

Use an ArrayList

The usefulness of an array is limited by the way you must decide in advance the number of items that are to be stored in the array. The size of an array cannot be changed once it has been created (although you can use a variable to set the dimension of the array if you wish). The ArrayList class was created to address this issue. An ArrayList stores data in a dynamic structure that grows as more items are added to it.

Listing 4-56 shows how it works. An ArrayList is created and three items are added to it. The items in an ArrayList can be accessed with a subscript in exactly the same way as elements in an array. The ArrayList provides a Count property that can be used to count how many items are present.

LISTING 4-56 ArrayList example

ArrayList arrayList = new ArrayList();

arrayList.Add(1);
arrayList.Add(2);
arrayList.Add(3);

for (int i = 0; i < arrayList.Count; i++)
    Console.WriteLine(arrayList[i]);

The ArrayList provides an Add method that adds items to the end of the list. There is also an Insert method that can be used to insert items in the list and a Remove method that removes items. There is more detail on the operations that can be performed on an ArrayList in the “Add and remove items from a collection” section.

Items in an ArrayList are managed by reference and the reference that is used is of type object. This means that an ArrayList can hold references to any type of object, since the object type is the base type of all of the types in C#. However, this can lead to some programming difficulties. This is discussed later in the “Use typed vs. non-typed collections” section.

Use a List

The List type makes use of the “generics” features of C#. You can find out more about generics in the “Generic types” section in Skill 2.1. When a program creates a List the type of data that the list is to hold is specified using C# generic notation. Only references of the specified type can be added to the list, and values obtained from the list are of the specified type. Listing 4-57 shows how a List is used. It creates a list of names, adds two names, and then prints out the list using a for loop. It then updates one of the names in the list and uses a foreach construction to print out the changed list.

LISTING 4-57 List example

List<string> names = new List<string>();

names.Add("Rob Miles");
names.Add("Immy Brown");

for (int i = 0; i < names.Count; i++)
    Console.WriteLine(names[i]);

names[0] = "Fred Bloggs";
foreach (string name in names)
    Console.WriteLine(name);

The List type implements the ICollection and IList interfaces. You can find out more about these interfaces later in the “Implement collection interfaces” section.

Use a dictionary

A Dictionary allows you to access data using a key. The name Dictionary is very appropriate. If you want to look up a definition of a word, you find the word in a dictionary and read the definition. In the case of an application, the key might be an account number or a username. The data can be a bank account or a user record.

Listing 4-58 shows how a Dictionary is used to implement bank account management. Each Account object has a unique AccountNo property that can be used to locate the account. The program creates two accounts and then uses the AccountNo value to find them. You can think of the key as being a subscript that identifies the item in the dictionary. The key value is enclosed in square brackets like for an array or a list.

LISTING 4-58 Dictionary example

BankAccount a1 = new BankAccount { AccountNo = 1, Name = "Rob Miles" };
BankAccount a2 = new BankAccount { AccountNo = 2, Name = "Immy Brown" };

Dictionary<int, BankAccount> bank = new Dictionary<int, BankAccount>();

bank.Add(a1.AccountNo, a1);
bank.Add(a2.AccountNo, a2);

Console.WriteLine(bank[1]);

if (bank.ContainsKey(2))
    Console.WriteLine("Account located");

A dictionary can be used to implement data storage, but it is also useful in very many other contexts. Listing 4-59 shows how to use a dictionary to count the frequency of words in a document. In this case the dictionary is indexed on a word and contains an integer that holds the count of that word. The document is loaded from a file into a string. The program extracts each word from the string. If the word is present in the dictionary the count for that word is incremented. If the word is not present in the dictionary, an entry is created for that word with a count of 1.

LISTING 4-59 Word counter

Dictionary<string, int> counters = new Dictionary<string, int>();

string text = File.ReadAllText("input.txt");

string[] words = text.Split(new char[] { ‘ ‘, ‘.', ‘,' },
    StringSplitOptions.RemoveEmptyEntries);

foreach (string word in words)
{
    string lowWord = word.ToLower();
    if (counters.ContainsKey(lowWord))
            counters[lowWord]++;
        else
            counters.Add(lowWord, 1);
}

Here, you would like the word counter to produce a sorted list of word counts. List and array instances provide a Sort method that can be used to sort their contents. Unfortunately, the Dictionary class does not provide a sort behavior. However, you can use a LINQ query on a dictionary to produce a sorted iteration of the dictionary contents. This can be used by a foreach loop to generate sorted output. The code to do this is shown next. It requires careful study. Items in a Dictionary have Key and Value properties that are used for sorting and output. When trying the code on an early version of this text I found that the word “the” was used around twice as many times as the next most popular word, which was “a.”

var items = from pair in counters
            orderby pair.Value descending
            select pair;

foreach (var pair in items)
{
    Console.WriteLine("{0}: {1}", pair.Key, pair.Value);
}

Use a set

A set is an unordered collection of items. Each of the items in a set will be present only once. You can use a set to contain tags or attributes that might be applied to a data item. For example, information about a MusicTrack can include a set of style attributes. A track can be “Electronic,” “Disco,” and “Fast.” Another track can be “Orchestral,” “Classical,” and “Fast.” A given track is associated with a set that contains all of the attributes that apply to it. A music application can use set operations to select all of the music that meets particular criteria, for example you can find tracks that are both “Electronic” and “Disco.”

Some programming languages, such as Python and Java, provide a set type that is part of the language. C# doesn’t have a built-in set type, but the HashSet class can be used to implement sets. Listing 4-60 shows how to create three sets that contain strings that contain the names of style attributes that can be applied to music tracks. The first two, t1Styles and t2Styles, give style information for two tracks. The third set is the search set that contains two style elements that you might want to search for in tracks. The HashSet class provides methods that implement set operations. The IsSubSetOf method returns true if the given set is a subset of another. The program uses this method to determine which of the two tracks matches the search criteria.

LISTING 4-60 Set example

HashSet<string> t1Styles = new HashSet<string>();
t1Styles.Add("Electronic");
t1Styles.Add("Disco");
t1Styles.Add("Fast" );

HashSet<string> t2Styles = new HashSet<string>();
t2Styles.Add("Orchestral");
t2Styles.Add("Classical");
t2Styles.Add("Fast");

HashSet<string> search = new HashSet<string>();
search.Add("Fast");
search.Add("Disco");

if (search.IsSubsetOf(t1Styles))
    Console.WriteLine("All search styles present in T1");

if (search.IsSubsetOf(t2Styles))
    Console.WriteLine("All search styles present in T2");

Another set methods can be used to combine set values to produce unions, differences, and to test supersets and subsets.

Use a queue

A queue provides a short term storage for data items. It is organized as a first-in-first-out (FIFO) collection. Items can be added to the queue using the Enqueue method and read from the queue using the Dequeue method. There is also a Peek method that allows a program to look at an item at the top of the queue without removing it from the queue. A program can iterate through the items in a queue and a queue also provides a Count property that will give the number of items in the queue. Listing 4-61 shows the usage of a simple queue that contains strings.

LISTING 4-61 Queue example

Queue<string> demoQueue = new Queue<string>();

demoQueue.Enqueue("Rob Miles");
demoQueue.Enqueue("Immy Brown");

Console.WriteLine(demoQueue.Dequeue());
Console.WriteLine(demoQueue.Dequeue());

The program will print “Rob Miles” first when it runs, because of the FIFO behavior of a queue. One potential use of a queue is for passing work items from one thread to another. If you are going to do this you should take a look at the ConcurrentQueue, which is described in Skill 1.1.

Use a stack

A stack is very similar in use to a queue. The most important difference is that a stack is organized as last-in-first-out (LIFO). A program can use the Push method to push items onto the top of the stack and the Pop method to remove items from the stack. Listing 4-62 shows simple use of a stack. Note that the program prints out “Immy Brown” first, because that is the item on the top of the stack when the printing is first performed. There is a ConcurrentStack implementation that should be used if different Tasks are using the same stack.

LISTING 4-62 Stack example

Stack<string> demoStack = new Stack<string>();

demoStack.Push("Rob Miles");
demoStack.Push("Immy Brown");

Console.WriteLine(demoStack.Pop());
Console.WriteLine(demoStack.Pop());

Choose a collection type

The type of collection to use normally falls naturally from the demands of the application. If you need to store a list of values, use a List in preference to an array or ArrayList. An array is fixed in size and an ArrayList does not provide type safety. A List can only contain objects of the list type and can grow and contract. It is also very easy to remove a value from the middle of a list or insert an extra value.

Use an array if you are concerned about performance and you are holding value types, since the data will be accessed more quickly. The other occasion where arrays are useful is if a program needs to store two-dimensional data (for example a table of values made up of rows and columns). In this situation you can create an object that implements a row (and contains a List of elements in the row) and then stores a List of these objects.

If there is an obvious value in an object upon which it can be indexed (for example an account number or username), use a dictionary to store the objects and then index on that value. A dictionary is less useful if you need to locate a data value based on different elements, such as needing to find a customer based on their customer ID, name, or address. In that case, put the data in a List and then use LINQ queries on the list to locate items.

Sets can be useful when working with tags. Their built-in operations are much easier to use than writing your own code to match elements together. Queues and stacks are used when the needs of the application require FIFO or LIFO behavior.

Initialize a collection

The examples that you have seen have added values to collections by calling the collection methods to add the values. For example, use the Add method to add items to a List. However, there are quicker ways to initialize each type of object. Listing 4-63 shows the initialization process for each of the collections that we have just discussed.

LISTING 4-63 Collection initialization

int[] arrayInit = { 1, 2, 3, 4 };

ArrayList arrayListInit =   new ArrayList { 1, "Rob Miles", new ArrayList() };

List<int> listInit = new List<int>{ 1, 2, 3, 4 };

Dictionary<int, string> dictionaryInit =  new Dictionary<int, string> {
        {1, "Rob" },
        {2, "Immy" } };

HashSet<string> setInit = new HashSet<string> { "Electronic", "Disco", "Fast" };

Queue<string> queueInit = new Queue<string>( new string [] {"Rob", "Immy" });

Stack <string> stackInit = new Stack<string>(new string[] { "Rob", "Immy" });

Add and remove items from a collection

Some of the collection types that we have discussed contain support for adding and removing elements from the types. Here are the methods for each collection type.

Add and remove items from an array

The array class does not provide any methods that can add or remove elements. The size of an array is fixed when the array is created. The only way to modify the size of an existing array is to create a new array of the required type and then copy the elements from one to the other. The array class provides a CopyTo method that will copy the contents of an array into another array. The first parameter of CopyTo is the destination array. The second parameter is the start position in the destination array for the copied values. Listing 4-64 shows how this can be used to migrate an array into a larger one. The new array has one extra element, but you can make it much larger than this if required. Note that because arrays are objects managed by reference, you can make the dataArray reference refer to the newly created array.

LISTING 4-64 Grow an array

int[] dataArray= { 1, 2, 3, 4 };
int[] tempArray = new int[5];
dataArray.CopyTo(tempArray, 0);
dataArray= tempArray;

Add and remove items in ArrayList and List

There are a number of methods that can be used to modify the contents of the ArrayList and List collections. Listing 4-65 shows them in action.

LISTING 4-65 List modification

List<string> list = new List<string>();
list.Add("add to end of list");    // add to the end of the list
list.Insert(0, "insert at start"); // insert an item at the start
list.Insert(1, "insert new item 1"); // insert at position
list.InsertRange(2, new string[] { "Rob", "Immy" }); // insert a range
list.Remove("Rob");               // remove first occurrence of "Rob"
list.RemoveAt(0);                 // remove element at the start
list.RemoveRange(1, 2);           // remove two elements
list.Clear();                     // clear entire list

Add and remove items from a Dictionary

The Dictionary type provides Add and Remove methods, as shown in Listing 4-66.

LISTING 4-66 Dictionary modification

Dictionary<int, string> dictionary = new Dictionary<int, string>();
dictionary.Add(1, "Rob Miles");  // add an entry
dictionary.Remove(1);            // remove the entry with the given key

Add and remove items from a Set

The Set type provides Add, Remove and RemoveWhere methods. Listing 4-67 shows how they are used. The RemoveWhere function is given a predicate (a behavior that generates either true or false) to determine which elements are to be removed. In the listing the predicate is a lambda expression that evaluates to true if the element in the set starts with the character ‘R.’

LISTING 4-67 Set modification

HashSet<string> set = new HashSet<string>();
set.Add("Rob Miles");    // add an item
set.Remove("Rob Miles"); // remove an item
set.RemoveWhere(x => x.StartsWith("R")); // remove all items that start with ‘R'

Add and remove items in Queue and Stack

The most important aspect of the behavior of queues and stacks is the way that items are added and removed. For this reason, the only actions that allow their contents to be changed are the ones you have seen earlier.

Use typed vs. non-typed collections

When comparing the behavior of the ArrayList collection types we noted that there is nothing to stop a programmer putting any type of object in the same ArrayList. The code next adds an integer, a string, and even an ArrayList to the ArrayList called MessyList. While this may be something you want to do, It is not good programming practice.

ArrayList messyList = new ArrayList();
messyList.Add(1); // add an integer to the list
messyList.Add("Rob Miles"); // add a string to the list
messyList.Add(new ArrayList());  //add an ArrayList to the list

Another difficultly caused by the untyped storage provided by the ArrayList is that all of the references in the list are references to objects. When a program removes an item from an ArrayList it must cast the item into its proper type before it can be used. In other words, to use the int value at the subscript 0 of the messyList above, cast it to an int before using it, as shown here:

int messyInt = (int) messyList[0];

Note that if you are confused to see the value type int being used in an ArrayList, where the contents are managed by reference, you should read the “Boxing and unboxing” section in Skill 2.2.

These problems occur due to an ArrayList existing in an untyped collection. For this reason, the ArrayList has been superseded by the List type, which uses the generics features in later versions of C# to allow a programmer to specify the type of item that the list should hold. It is recommended that you use the List type when you want to store collections of items.

Implement custom collections

A custom collection is a collection that you create for a specific purpose that has behaviors that you need in your application. One way to make a custom collection is to create a new type that implements the ICollection interface. This can then be used in the same way as any other collection, such as with a program iterating through your collection using a foreach construction. We will describe how to implement a collection interface in the next section.

Another way to create a custom collection is to use an existing collection class as the base (parent) class of a new collection type. You can then add new behaviors to your new collection and, because it is based on an existing collection type, your collection can be used in the same way as any other collection.

Listing 4-68 shows how to create a custom MusicTrack store, which is based on the List collection type. The method RemoveArtist has been added to the new type so that a program can easily remove all the tracks by a particular artist. Note how the RemoveArtist method creates a list of items to be removed and then removes them. This is to prevent problems that can be caused by removing items in a collection at the same time as iterating through the collection. If you investigate the sample program for Listing 4-68, you will find that the TrackStore class also contains a ToString method and also a static GetTestTrackStore method that can be used to create a store full of sample tracks.

LISTING 4-68 Custom collection

class TrackStore : List<MusicTrack>
{
    public int RemoveArtist(string removeName)
    {
        List<MusicTrack> removeList = new List<MusicTrack>();
        foreach (MusicTrack track in this)
            if (track.Artist == removeName)
                removeList.Add(track);

        foreach (MusicTrack track in removeList)
            this.Remove(track);

        return removeList.Count;
    }
}

Implement collection interfaces

The behavior of a collection type is expressed by the ICollection interface. The ICollection interface is a child of the IEnumerator interface. Interface hierarchies work in exactly the same way as class hierarchies, in that a child of a parent interface contains all of the methods that are described in the parent. This means that a type that implements the ICollection interface is capable of being enumerated. For a more details on the IEnumerator interface, consult the “IEnumerable” section in Skill 2.4.

The class in Listing 4-69 implements the methods in the ICollection interface. It contains an array of fixed values that give four points of a compass. It can be used in the same way as any other collection, and can be enumerated as it provides a GetEnumerator method. Note that the collection interface does not specify any methods that determine how (or indeed whether) a program can add and remove values.

LISTING 4-69 ICollection interface

class CompassCollection : ICollection
{
    // Array containing values in this collection
    string[] compassPoints = { "North", "South", "East", "West" };

    // Count property to return the length of the collection
    public int Count
    {
        get { return compassPoints.Length; }
    }

    // Returns an object that can be used to syncrhonise 
    // access to this object
    public object SyncRoot
    {
        get { return this; }
    }

    // Returns true if the collection is thread safe
    // This collection is not
    public bool IsSynchronized
    {
        get { return false; }
    }

    // Provide a copyto behavior
    public void CopyTo(Array array, int index)
    {
        foreach (string point in compassPoints)
        {
            array.SetValue(point, index);
            index = index + 1;
        }
    }

    // Required for IEnumerate
    // Returns the enumerator from the embedded array
    public IEnumerator GetEnumerator()
    {
        return compassPoints.GetEnumerator();
    }
}

Note, that if you want the new collection type to be used with LINQ queries it must implement the IEnumerable<type> interface. This means that the type must contain a GetEnumerator<string> () method.