Implement data access
- 10/11/2018
- Skill 4.1: Perform I/O operations
- Skill 4.2: Consume data
- Skill 4.3: Query and manipulate data and objects by using LINQ
- Skill 4.4: Serialize and deserialize data by using binary serialization, custom serialization, XML Serializer, JSON Serializer, and Data Contract Serializer
- Skill 4.5: Store data in and retrieve data from collections
- Thought experiments
- Thought experiment answers
- Chapter summary
Skill 4.4: Serialize and deserialize data by using binary serialization, custom serialization, XML Serializer, JSON Serializer, and Data Contract Serializer
We have already explored the serialization process in Skill 2.5, “The Serializable attribute,” Skill 3.1, “JSON and C#,” Skill 3.1, “JSON and XML,” and Skill 3.1. “Validate JSON data.” You should read these sections before continuing with this section.
Serialization does not store any of the active elements in an object. The behaviors (methods) in a class are not stored when it is serialized. This means that the application deserializing the data must have implementations of the classes that can be used to manipulate the data after it has been read.
Serialization is a complex process. If a data structure contains a graph of objects that have a large number of associations between them, the serialization process will have to persist each of these associations in the stored file.
Serialization is best used for transporting data between applications. You can think of it as transferring the “value” of an object from one place to another. Serialization can be used for persisting data, and a serialized stream can be directed into a file, but this is not normally how applications store their state. Using serialization can lead to problems if the structure or behavior of the classes implementing the data storage changes during the lifetime of the application. In this situation developers may find that previously serialized data is not compatible with the new design.
Sample data
We are going to use some sample music track data to illustrate how serialization works. The code shown next is the MusicTrack, Artist, and MusicDataStore objects that you are going to be working with. The MusicDataStore type holds lists of MusicTrack and Artist values. It also holds a method called TestData that creates a test music store that can be used in our examples.
class Artist { public string Name { get; set; } } [Serializable] class MusicTrack { public Artist Artist { get; set; } public string Title { get; set; } public int Length { get; set; } } [Serializable] class MusicDataStore { List<Artist> Artists = new List<Artist>(); List<MusicTrack> MusicTracks = new List<MusicTrack>(); public static MusicDataStore TestData() { MusicDataStore result = new MusicDataStore(); // create the same test data set as used for the LINQ examples return result; } }
Use binary serialization
There are essentially two kinds of serialization that a program can use: binary serialization and text serialization. In Skill 4.1, “Convert text to binary data with Unicode,” we noted that a file actually contains binary data (a sequence of 8-bit values). A UNICODE text file contains 8-bit values that represent text. Binary serialization imposes its own format on the data that is being serialized, mapping the data onto a stream of 8-bit values. The data in a stream created by a binary serializer can only be read by a corresponding binary de-serializer. Binary serialization can provide a complete “snapshot” of the source data. Both public and private data in an object will be serialized, and the type of each data item is preserved.
Classes to be serialized by the binary serializer must be marked with the [Serializable] attribute as shown below for the Artist class.
[Serializable] class Artist { public string Name { get; set; } }
The binary serialization classes are held in the System.Runtime.Serialization.Formatters.Binary namespace. The code next shows how binary serialization is performed. It creates a test MusicDataStore object and then saves it to a binary file. An instance of the BinaryFormatter class provides a Serialize behavior that accepts an object and a stream as parameters. The Serialize behavior serializes the object to a stream.
MusicDataStore musicData = MusicDataStore.TestData(); BinaryFormatter formatter = new BinaryFormatter(); using (FileStream outputStream = new FileStream("MusicTracks.bin", FileMode.OpenOrCreate, FileAccess.Write)) { formatter.Serialize(outputStream, musicData); }
An instance of the BinaryFormatter class also provides a behavior called Deserialize that accepts a stream and returns an object that it has deserialized from that stream. Listing 4-48 shows how to serialize an object into a binary file. The code uses a cast to convert the object returned by the Deserialize method into a MusicDataStore. This sample file for this listing also contains the previously shown serialize code.
LISTING 4-48 Binary serialization
MusicDataStore inputData; using (FileStream inputStream = new FileStream("MusicTracks.bin", FileMode.Open, FileAccess.Read)) { inputData = (MusicDataStore)formatter.Deserialize(inputStream); }
If there are data elements in a class that should not be stored, they can be marked with the NonSerialized attribute as shown next. The tempData property will not be serialized.
[Serializable] class Artist { public string Name { get; set; } [NonSerialized] int tempData; }
Binary serialization is the only serialization technique that serializes private data members by default (i.e. without the developer asking). A file created by a binary serializer can contain private data members from the object being serialized. Note, however, that once an object has serialized there is nothing to stop a devious programmer from working with serialized data, perhaps viewing and tampering with the values inside it. This means that a program should treat deserialized inputs with suspicion. Furthermore, any security sensitive information in a class should be explicitly marked NonSerialized. One way to improve security of a binary serialized file is to encrypt the stream before it is stored, and decrypt it before deserialization.
Use custom serialization
Sometimes it might be necessary for code in a class to get control during the serialization process. You might want to add checking information or encryption to data elements, or you might want to perform some custom compression of the data. There are two ways that to do this. The first way is to create our own implementation of the serialization process by making a data class implement the ISerializable interface.
A class that implements the ISerializable interface must contain a GetObjectData method. This method will be called when an object is serialized. It must take data out of the object and place it in the output stream. The class must also contain a constructor that will initialize an instance of the class from the serialized data source.
The code in Listing 4-49 is an implementation of custom serialization for the Artist class in our example application. The GetObjectData method has two parameters. The SerializationInfo parameter info provides AddValue methods that can be used to store named items in the serialization stream. The StreamingContext parameter provides the serialization method with context about the serialization. The GetObjetData method for the Artist just stores the Name value in the Artist as a value that is called “name.”
LISTING 4-49 Custom serialization
[Serializable] class Artist : ISerializable { public string Name { get; set; } protected Artist(SerializationInfo info, StreamingContext context) { Name = info.GetString("name"); } protected Artist () { } [SecurityPermissionAttribute(SecurityAction.Demand, SerializationFormatter = true)] public void GetObjectData(SerializationInfo info, StreamingContext context) { info.AddValue("name", Name); } }
The constructor for the Artist type accepts info and context parameters and uses the GetString method on the info parameter to obtain the name information from the serialization stream and use it to set the value of the Name property of the new instance.
The GetObjectData method must access private data in an object in order to store it. This can be used to read the contents of private data in serialized objects. For this reason, the GetObjectData method definition should be preceded by the security permission attribute you can see in Listing 4-49 to control access to this method.
The second way of customizing the serialization process is to add methods that will be called during serialization. These are identified by attributes as shown in Listing 4-50. The OnSerializing method is called before the serialization is performed and the OnSerialized method is called when the serialization is completed. The same format of attributes is used for the deserialize methods. These methods allow code in a class to customize the serialization process, but they don’t have access to the serialization stream, only the streaming context information. If you run the example program in Listing 4-50 you will see messages displayed as each stage of serialization is performed on the data.
LISTING 4-50 Customization methods
[Serializable] class Artist { [OnSerializing()] internal void OnSerializingMethod(StreamingContext context) { Console.WriteLine("Called before the artist is serialized"); } [OnSerialized()] internal void OnSerializedMethod(StreamingContext context) { Console.WriteLine("Called after the artist is serialized"); } [OnDeserializing()] internal void OnDeserializingMethod(StreamingContext context) { Console.WriteLine("Called before the artist is deserialized"); } [OnDeserialized()] internal void OnDeserializedMethod(StreamingContext context) { Console.WriteLine("Called after the artist is deserialized"); } }
Manage versions with binary serialization
The OnDeserializing method can be used to set values of fields that might not be present in data that is being read from a serialized document. You can use this to manage versions. In Skill 4.3, in the “Modify data with LINQ to XML” section you added a new data element to MusicTrack. You added the “style” of the music, whether it is “Pop”, “Rock,” or “Classical.” Thiscauses a problem when the program tries to deserialize old MusicTrack data without this information.
You can address this by marking the new field with the [OptionalField] attribute and then setting a default value for this element in the OnDeserializing method as shown in Listing 4-51. The OnDeserializing method is performed during deserialization. The method is called before the data for the object is deserialized and can set default values for data fields. If the input stream contains a value for a field, this will overwrite the default set by OnDeserializing.
LISTING 4-51 Binary versions
[Serializable] class MusicTrack { public Artist Artist { get; set; } public string Title { get; set; } public int Length { get; set; } [OptionalField] public string Style; [OnDeserializing()] internal void OnDeserializingMethod(StreamingContext context) { Style = "unknown"; } }
Use XML serializer
You have already seen the XML serializer in use in Skill 3.1, in the “JSON and XML” section. A program can serialize data into an XML steam in much the same way as a binary formatter. Note, however, that when an XmlSerializer instance is created to perform the serialization, the constructor must be given the type of the data being stored. Listing 4-52 shows how this works.
LISTING 4-52 XML Serialization
MusicDataStore musicData = MusicDataStore.TestData(); XmlSerializer formatter = new XmlSerializer(typeof(MusicDataStore)); using (FileStream outputStream = new FileStream("MusicTracks.xml", FileMode.OpenOrCreate, FileAccess.Write)) { formatter.Serialize(outputStream, musicData); } MusicDataStore inputData; using (FileStream inputStream = new FileStream("MusicTracks.xml", FileMode.Open, FileAccess.Read)) { inputData = (MusicDataStore)formatter.Deserialize(inputStream); }
XML serialization is called a text serializer, because the serialization process creates text documents.
The serialization process handles references to objects differently from binary serialization. Consider the class in Listing 4-53. The MusicTrack type contains a reference to the Artist describing the artist that recorded the track.
LISTING 4-53 XML References
class MusicTrack { public Artist Artist { get; set; } public string Title { get; set; } public int Length { get; set; } }
If this is track is serialized using binary serialization, the Artist reference is preserved, with a single Artist instance being referred to by all the tracks that were recorded by that artist. However, if this type of track is serialized using XML serialization, a copy of the Artist value is stored in each track. In other words, a MusicTrack value is represented as follows, with the contents of the artist information copied into the XML that is produced as shown.
<MusicTrack> <ID>1</ID> <Artist> <Name>Rob Miles</Name> <Title>My Way</Title> <Length>164</Length> </MusicTrack>
When the XML data is deserialized each MusicTrack instance will contain a reference to its own Artist instance, which might not be what you expect. In other words, all of the data serialized using a text serializer is serialized by value. If you want to preserve references you must use binary serialization.
Note that the sample program in Listing 4-51 uses an ArtistID value to connect a given MusicTrack with the artist that recorded it.
Use JSON Serializer
The JSON serializer uses the JavaScript Object Notation to store serialized data in a text file. Note that we have already discussed this serializer in detail in Skill 3.1, “JSON and XML,” and Skill 3.1, “Validate JSON data.”
Use Data Contract Serializer
The data contract serializer is provided as part of the Windows Communication Framework (WCF). It is located in the System.Runtime.Serialization library. Note that this library is not included in a project by default. It can be used to serialize objects to XML files. It differs from the XML serializer in the following ways:
Data to be serialized is selected using an “opt in” mechanism, so only items marked with the [DataMember] attribute will be serialized.
It is possible to serialize private class elements (although of course they will be public in the XML text produced by the serializer).
The XML serializer provides options that allow programmers to specify the order in which items are serialized into the data file. These options are not present in the DataContract serializer.
The classes here have been given the data contract attributes that are used to serialize the data in them.
[DataContract] public class Artist { [DataMember] public int ID { get; set; } [DataMember] public string Name { get; set; } } [DataContract] public class MusicTrack { [DataMember] public int ID { get; set; } [DataMember] public int ArtistID { get; set; } [DataMember] public string Title { get; set; } [DataMember] public int Length { get; set; } }
Once the fields to be serialized have been specified they can be serialized using a DataContractSerializer. Listing 4-54 shows how this is done. Note that the methods to serialize and deserialize are called WriteObject and ReadObject respectively.
LISTING 4-54 Data contract serializer
MusicDataStore musicData = MusicDataStore.TestData(); DataContractSerializer formatter = new DataContractSerializer(typeof(MusicDataStore)); using (FileStream outputStream = new FileStream("MusicTracks.xml", FileMode.OpenOrCreate, FileAccess.Write)) { formatter.WriteObject(outputStream, musicData); } MusicDataStore inputData; using (FileStream inputStream = new FileStream("MusicTracks.xml", FileMode.Open, FileAccess.Read)) { inputData = (MusicDataStore)formatter.ReadObject(inputStream); }