Arrays, Linked Lists, and Dictionaries
Posted by in Computer Science onData Structures are digital collections that help programmers store, organize, and access data. Each data structure follows different rules, may have different interfaces, and are typically suited for different roles and purposes. Since they are all different, a developer should understand what each of those data structures is and how they can be used. Of those data structures, most will inherit from at least one of three main prototypes: the array, the dictionary, or the linked list. As such, it's important for a developer to understand what these structures are, when to use them, and (most importantly) when not to use them.
By the end of this tutorial, my goal is for you to be able to do two things. First, you should be able to describe the basic high-level differences between an array, a linked list, and a dictionary. Second, you should be able to explain a practical application for each structure. But first, let's examine why these structures exist at all.
Justifying Data Structures
The easiest way to justify the existence of these data structures is to take a look at a simple analogy: a deck of cards. A deck of poker cards has a specific set of attributes that we expect to be true before every game. We should expect that every card is one of four suits: hearts, clubs, spades, or diamonds. We expect there to be 52 unique cards, where each card holds a numerical value of 2 through 10, or else it should have a J, Q, K, or A. Finally, we should expect that the cards be in a random order. If we were to virtualize these rules using flat variables, we would need to define 52 separate variables:
// Really inefficient way to store variables
var card_1, card_2, card_3, /* ... */ card_52;
This format quickly becomes unwieldy to work with. Any code we wrote that interacted with this virtual deck of cards would need to track all 52 variables individually. We would need to maintain their state, their value, who had them, etc. manually. This is extremely inefficient, and data structures greatly simplify this process.
The Array
Without diving too deep into the theory and implementation (we'll save that for another day), an array can be described as a set of key-value pairs where each key is an integer and identifies exactly one value. Instead of defining 52 separate variables, we can define one array with 52 positions:
// Much better way to store variables
var cards = [];
There is a combination of three of traits that make arrays unique from dictionaries and linked lists. First, all of the indexes in an array are a continuous set of integers from the first index ("0" in JavaScript) through the nth index. Second, an array has a distinct length associated with it. Third, any element in an array can be accessed directly if you know its index without a significant performance overhead.
// Arrays have a distinct, calculable length
var array = [];
alert('array.length: ' + array.length); // array.length: 0
// All of the indexes in an array are continuous
array[1000] = true;
alert('array.length: ' + array.length); // array.length: 1001
// You can directly access any element in an array if you know its index
var startTimer = window.performance.now();
array[42] = 'the meaning of life';
var endTimer = window.performance.now();
alert('duration: ' + (endTimer - startTimer) + 'ms'); // Pretty close to zero
This combination of traits makes Arrays an ideal candidate for situations where you do not necessarily know which indexes you will be accessing; and where the order or number of elements in the collection needs to be constantly available. For example, each player must hold 5 cards from the deck, and you must be able to see and interact with any of the 5 cards in your hand. However, arrays are not the only way to hold this kind of data.
The Dictionary
Dictionaries are called by many different names in several different languages, but the most common variants I have encountered are "Collection", "Map", and "Associative Array". On the surface, a dictionary works very much like an array in C-type languages; and some languages, in fact, actually use Dictionaries in lieu of arrays. However, there are some important differences that distinguish a dictionary from a "true" array.
First, a dictionary is not inherently continuous, since you can arbitrarily assign any position n in the array without needing all of the indexes from 0 through n - 1 to exist. Second, since a dictionary doesn't need to fill in all of its indexes, it doesn't have a true "length" like an array does. Third, not every key in a dictionary needs to be an integer; these keys are often implemented with text strings, instead.
// In JavaScript, all Objects are dictionaries
// Dictionaries do not have a distinct length
var dictionary = {};
alert('typeof dictionary.length: ' + typeof dictionary.length); // typeof dictionary.length: undefined
// Dictionary keys do not need to be continuous, nor do their keys need to be integers
dictionary['string key'] = true;
alert('number of keys: ' + Object.keys(dictionary).length); // number of keys: 1
// A dictionary has a similar access overhead as an array
var startTimer = window.performance.now();
dictionary['fourty-two'] = 'the answer to life, the universe, and everything';
var endTimer = window.performance.now();
alert('duration: ' + (endTimer - startTimer) + 'ms'); // Also pretty close to zero
This combination of traits makes dictionaries ideal for situations where you need to be able to randomly access content in a data structure, you can't reliably ensure that the keys will be valid integers, and the number of elements in the collection is not important. For example, you could place each "hand" in a dictionary, where the keys corresponded to the player's name:
// a dictionary of arrays
var hands = {
"tom": [],
"dick": [],
"harry": []
};
The Linked Lists
A linked list is a type of data structure that is easy to expand, contract, and reindex; but they are typically less performant than arrays and dictionaries when performing random-access type tasks. Linked lists are rarely implemented in languages like JavaScript and PHP for performance reasons, but are often exceedingly useful in languages like C++ or Java. Despite this, a developer should understand how these structures differ from arrays and dictionaries.
First, a linked list does not hold direct references to every element in the list. Instead, a linked list consists of independent nodes that track a value and the position of the next node in the chain. Depending on the implementation, a node may track one neighbor, but could theoretically track many neighbors simultaneously. Since each node tracks its neighbor, the list only needs to track the first and/or last node in the chain to be able to access all of the nodes.
Second, the only way to access the nth node in the chain is to start at one end and count your way to it. Since the data structure doesn't track every node, random access in a linked list will always be slower than doing the same thing in an array or dictionary; so you should minimize the amount of random access you perform on a linked list.
Third, since the linked list only tracks the end(s) of a chain, the process of reindexing a linked list is markedly faster than the process of reindexing an array. This is especially important in languages like C, C++, or Java where indexes have a fixed length. If we were to use an array to represent our deck of cards, then we would always have to track what the "last" card in the deck was:
var top = 52;
var deck = [/* cards */];
Since we only access the top of the deck, we could use a linked list to track that effort for us. Fewer things to track means that fewer things can go wrong. Fewer opportunities for something to go wrong often translates into more stable, more intuitive code with fewer bugs.
var deck = new Stack(/* cards */);
var card = deck.drawCard();
This makes Linked Lists an ideal candidate for situations where you will only be accessing the first or last element in a collection, and you will be routinely adding and/or removing elements from one or both ends of the structure.
// Linked Lists are not standard structures in JavaScript
// These structures are being loaded in by a script
// a linked list is a chain of elements
var chain = new LinkedList(1000);
alert('typeof chain.head: ' + typeof chain.head);
// random access can be very slow
var startTimer = window.performance.now();
chain.get(1000);
var endTimer = window.performance.now();
alert('random access: ' + (endTimer - startTimer) + 'ms');
// they're great for queues and stacks
var startTimer = window.performance.now();
chain.pop();
var endTimer = window.performance.now();
alert('popped stack: ' + (endTimer - startTimer) + 'ms');
Summary
To summarize:
- Data Structures are tools that programmers can use to collect many related variables in one place to write cleaner, more maintainable code.
- Arrays are data structures that are great for when you need to be able to access any arbitrary value at any time, the order of values is important, and you care about how many values are in the data structure.
- Dictionaries are data structures that are great for when you need to be able to access values by a name instead of an index, you don't care about the order of values in the data structure, and the number of keys is not always important.
- Linked Lists are data structures that are great for when you need to be able to randomly insert or remove values from one point, the order of values is important, and you do not care about how many values are in the data structure.
Code for this Entry
Code for this entry can be found at GitHub: https://github.com/stevethedev/cs-data-structures-1