Tuesday, September 24, 2013

Java Collection : ArrayList, Vector, LinkedList, HashSet, LinkedHashSet, TreeSet, HashMap, Hashtable, LinkedHashMap, TreeMap

In this post, we'll dig into the basic collections that we use day to day java programming. I'll hope the this post will help to understand and remember the concept of collection framework.

What is Collection?
In simple term, you can say collection is a container — is simply an object that groups multiple elements into a single unit.
For instance, Collection correspond to a bag. Typically, they represent data items that form a natural group such as a collection of cards,a collection of letters, and a mapping of names to phone numbers.
  • Like C++'s Standard Template Library (STL)
  • Can grow as necessary
  • Contain only Objects
  • Heterogeneous
  • Can be made thread safe
  • Can be made not-modifiable
There are really three overloaded uses of the word "collection"
  1. collection(lowercase c), which represents any of the data structures in which objects are stored and iterated over.
  2. Collection (capital C), which is actually the java.util.Collection interface from which Set, List, and Queue extend. (That's right, extend, not implement. There are no direct implementations of Collection.)
  3. Collections (capital C and ends with s) is the java.util.Collections class that holds a pile of static utility methods for use with collections

The Collections Framework in the java.util package is loaded with interfaces and utilities.

What Is a Collections Framework?
A collections framework is a unified architecture for representing and manipulating collections. All collections frameworks contain the following:
  • Interfaces  Represent different types of collections, such as sets, lists, and maps. These interfaces form the basis of the framework.
  • Implementations  Primary implementations of the collection interfaces.
  • Algorithms Static methods that perform useful functions on collections, such as sorting a list.

Advantages of a collections framework
  • A usable set of collection interfaces : By implementing one of the basic interfaces -- CollectionSetList, or Map -- you ensure your class conforms to a common API and becomes more regular and easily understood
  • A basic set of collection implementations : Using an existing, common implementation makes your code shorter and quicker to download. Also, using existing Core Java code core ensures that any improvements to the base code will also improve the performance of your code.
  • Reduces programming effort by providing data structures and algorithms so you don't have to write them yourself.
  • Provides interoperability between unrelated APIs by establishing a common language to pass collections back and forth.
  • Reduces the effort required to learn APIs by requiring you to learn multiple ad hoc collection APIs.
  • Reduces the effort required to design and implement APIs by not requiring you to produce ad hoc collections APIs.

Collection is an interface with declarations of the methods common to most collections including add(), remove(), contains(), size(), and iterator().


Key Interfaces and Classes of the Collections Framework


Key Classes of the Collections Framework


Collections come in four basic flavors
  • Lists Lists of things (classes that implement List)
  • Sets Unique things (classes that implement Set).
  • Maps Things with a unique ID (classes that implement Map).
  • Queues Things arranged by the order in which they are to be processed.




List interface

A List cares about the index.

An ordered collection (also known as a sequence). The user of this interface has precise control over where in the list each element is inserted. The user can access elements by their integer index (position in the list), and search for elements in the list.








  • Lists typically allow duplicate elements
  • Access to elements via indexes, like arrays  add (int, Object), get(int), remove(int), set(int, Object) 
  • Search for elements : indexOf(Object), lastIndexOf(Object)
  • Specialized Iterator, call ListIterator
  • Extraction of sublist : subList(int fromIndex, int toIndex)
  • add(Object) adds at the end of the list
  • remove(Object) removes at the start of the list.
  • list1.equals(list2) the ordering of the elements is  taken into consideration
  • Extra requirements to the method hashCode
    list1.equals(list2) implies that list1.hashCode()==list2.hashCode()

The List interface provides a special iterator, called a ListIterator, that allows element insertion and replacement, and bidirectional access in addition to the normal operations that the Iterator interface provides. A method is provided to obtain a list iterator that starts at a specified position in the list.

Category of List operation


Positional access — manipulates elements based on their numerical position in the list

  • Object get(int index);
  • Object set(int index, Object element); // Optional
  • void add(int index, Object element); // Optional
  • Object remove(int index); // Optional
  • abstract boolean addAll(int index, Collection c);  // Optional

Search — searches for a specified object in the list and returns its numerical position.
  • int indexOf(Object o);
  • int lastIndexOf(Object o);

Iteration — extends Iterator semantics to take advantage of the list's sequential nature
  • ListIterator listIterator();
  • ListIterator listIterator(int index);

Range-view — performs arbitrary range operations on the list.
  • List subList(int from, int to);


==============================================================================================
Arraylist
This is Resizable-array implementation of the List interface
  • ArrayList is an array based implementation where elements can be accessed directly via the get and set methods.
  • Default choice for simple sequence.
  • It gives you fast iteration and fast random access
  • It is an ordered collection (by index), but not sorted
The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

We'll see fail-fast in the following example.
Output :


Choose this over a LinkedList when you need fast iteration but aren't as likely to be doing a lot of insertion and deletion

==============================================================================================
Vector


  • A Vector is basically the same as an ArrayList, but Vector methods are synchronized for thread safety.
  • You'll normally want to use ArrayList instead of Vector because the synchronized methods add a performance hit you might not need.



Output:


Difference between ArrayList and Vector? *interview question

  • Vectors are synchronized, ArrayLists are not.
    If multiple threads access an ArrayList concurrently then we must externally synchronize the block of code which modifies the list either structurally or simply modifies an element. Structural modification means addition or deletion of element(s) from the list. Setting the value of an existing element is not a structural modification.
  • Data Growth Methods
    A Vector defaults to doubling the size of its array, while the ArrayList increases its array size by 50 percent.
    Depending on how you use these classes, you could end up taking a large performance hit while adding new elements. It's always best to set the object's initial capacity to the largest capacity that your program will need.
Vector is often considered deprecated nowadays.

==============================================================================================

LinkedList

  • A LinkedList is ordered by index position, like ArrayList, except that the elements are doubly-linked to one another.
  • This linkage gives you new methods (beyond what you get from the List interface) for adding and removing from the beginning or end, which makes it an easy choice for implementing a stack or queue.

We'll see few methods example demo

Output:




Note that this implementation is not synchronized. If multiple threads access a linked list concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally


A palindrome is a word or phrase that reads the same forward and backward. For instance, "abcba". We'll see how to do this with the help of LinkedList and ListIterator


Example of ListIterator


Difference between ArrayList and LinkedList? *interview question
LinkedList and ArrayList are two different implementations of the List interface.
  • LinkedList implements it with a doubly-linked list.
  • ArrayList implements it with a dynamically resizing array.
  • LinkedList<E> allows for constant-time insertions or removals using iterators, but only sequential access of elements. In other words, you can walk the list forwards or backwards, but finding a position in the list takes time proportional to the size of the list.
  • ArrayList<E>, on the other hand, allow fast random read access, so you can grab any element in constant time. But adding or removing from anywhere but the end requires shifting all the latter elements over, either to make an opening or fill the gap.
  • Use LinkedList when you need fast insertion and deletion whereas which is slow in ArrayList
  • Use ArrayList when you need fast iteration and fast random access whereas which is slow in LinkedList

==============================================================================================


Set Interface


A Set cares about uniqueness—it doesn't allow duplicates. Your good friend the equals() method determines whether two objects are identical.

HashSet
  • A HashSet is an unsorted, unordered Set.
  • It uses the hashcode of the object being inserted, so the more efficient your hashCode() implementation the better access performance you'll get.
  • Use this class when you want a collection with no duplicates and you don't care about order when you iterate through it.
  • This class permits the null element
  • If you attempt to add an element to a set that already exists in the set, the duplicate element will not be added, and the add() method will return false
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets
Output
==============================================================================================



LinkedHashSet
  • A LinkedHashSet is an ordered version of HashSet that maintains a doubly-linked List across all elements. 
  • Use this class instead of HashSet when you care about the iteration order.
  • When you iterate through a HashSet the order is unpredictable, while a LinkedHashSet lets you iterate through the elements in the order in which they were inserted.
  • permits null elements

It provides constant-time performance for the basic operations (addcontains and remove), assuming the hash function disperses elements properly among the buckets. 

As you have already seen in the above program that order is unpredictable, now we'll in the below program LinkedHashSet maintain insertion order.

Output

Note HashSet, LinkedHashSet implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally 
Difference between HashSet and LinkedHashSet? *interview question
LinkedHashSet performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list with one exception:
  • Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. 
  • Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.
==============================================================================================

TreeSet

The TreeSet is one of two sorted collections (the other being TreeMap). It uses a Red-Black tree structure, and guarantees that the elements will be in ascending order, according to natural order.

  • Guarantees log(n) time cost for the basic operations (add, remove and contains).
  • Offers a few handy methods to deal with the ordered set like first(), last(), headSet(), and tailSet() etc
  • Tress set will not allow null object. if you try to add null value i will be throw null pointer exception

Note that the ordering maintained by a set must be consistent with equals if it is to correctly implement the Set interface.
This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal.

The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

Few functions of TreeSet with the help of example

  • ceiling(E e) : Returns the least element in this set greater than or equal to the given element, or null if there is no such element.
  • floor(E e) : Returns the greatest element in this set less than or equal to the given element, or null if there is no such element.
  • higher(E e) : Returns the least element in this set strictly greater than the given element, or null if there is no such element.
  • lower(E e) : Returns the greatest element in this set strictly less than the given element, or null if there is no such element.
  • SortedSet<E> headSet(E toElement) : Returns a view of the portion of this set whose elements are strictly less than toElement.
  • SortedSet<E> tailSet(E fromElement) : Returns a view of the portion of this set whose elements are greater than or equal to fromElement.


Output



Difference between HashSet and TreeSet? *interview question
HashSet
  • Hash set allow null object
  • class offers constant time performance for the basic operations (add, remove, contains and size).
  • Does not guarantee that the order of elements will remain constant over time
  • Iteration performance depends on the initial capacity and the load factor of the HashSet.
  • It's quite safe to accept default load factor but you may want to specify an initial capacity that's about twice the size to which you expect the set to grow
TreeSet
  • Tress set will not allow null object .if you try to add null value i will be throw null pointer exception
  • log(n) time cost for the basic operations (add, remove and contains).
  • elements of set will be sorted (ascending, natural, or the one specified by you via it's constructor)
  • Doesn't offer any tuning parameters for iteration performance
==============================================================================================


Map Interface

  • A Map cares about unique identifiers.
  • The Map implementations let you do things like search for a value based on the key, ask for a collection of just the values, or ask for a collection of just the keys.
  • Like Sets, Maps rely on the equals() method to determine whether two keys are the same or different.




  • A map cannot contain duplicate keys; each key can map to at most one value.
  • The Map interface provides three collection views, which allow a map's contents to be viewed as a set of keys, collection of values, or set of key-value mappings.
  • Great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map


HashMap

  • The HashMap gives you an unsorted, unordered Map, it does not guarantee that the order will remain constant over time.
  • HashMap allows one null key and multiple null values in a collection
  • When you need a Map and you don't care about the order (when you iterate through it), then HashMap is the way to go; the other maps add a little more overhead
  • The more efficient your hashCode() implementation, the better access performance you'll get.
  • The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.
  • This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.


An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs.

Output
==============================================================================================


Hashtable
  • Just as Vector is a synchronized counterpart to the sleeker, more modern ArrayList, Hashtable is the synchronized counterpart to HashMap.
  • Remember that you don't synchronize a class, so when we say that Vector and Hashtable are synchronized, we just mean that the key methods of the class are synchronized

How Hashtable differ from HashMap? *interview question
  • Hashtable doesn't let you have anything that's null whereas HashMap allows one null key and multiple null values in a collection
  • HashMap is not Synchronized  where as HashTable is Synchronized.
  • HashMap cannot be shared with multiple thread without proper synchronization where HashTable is thread safe and can be shared between multiple thread
  • Iterator in HashMap if fail fast where as Enumerator in HashTable is not fail fast


Output
==============================================================================================

LinkedHashMap
  • Like its Set counterpart, LinkedHashSet, the LinkedHash- Map collection maintains insertion order (or, optionally, access order). 
  • Slower than HashMap for adding and removing elements, you can expect faster iteration with a LinkedHashMap.
Example will be same as above.

TreeMap
  • TreeMap is a sorted Map.
And you already know that by default, this means "sorted by the natural order of the elements.
Like TreeSet, TreeMap lets you define a custom sort order (via a Comparable or Comparator) when you construct a TreeMap, that specifies how the elements should be compared to one another when they're being ordered

Output

The iterators returned by this all the above class's iterator  methods are fail-fast as you have already seen in the above examples.



Related Post
4 ways to iterate over Map in java
120+ Core java interview Question including Collection related also
7 new feature of Java 7
Garbage Collection from Interview perspective


If you know anyone who has started learning Java, why not help them out! Just share this post with them. 
Thanks for studying today!...

3 comments:

  1. Map is not a apart Of Collection.The content is really good carrying all the useful information.

    ReplyDelete
  2. Nice articles.You are explain very well.java collection programs http://www.javaproficiency.com/2015/05/java-collections-framework-tutorials.html

    ReplyDelete
  3. perfect explanation about java programming .its very useful.thanks for your valuable information.java training in chennai | java training in velachery

    ReplyDelete