A cache is a block of memory for temporary storage of data likely to be used again. The CPU and hard drive frequently use a cache, as do web browsers and web servers.
A cache is made up of a pool of entries. Each entry has a datum (a nugget of data) which is a copy of the datum in some backing store. Each entry also has a tag, which specifies the identity of the datum in the backing store of which the entry is a copy.
When the cache client (a CPU, web browser, operating system) wishes to access a datum presumably in the backing store, it first checks the cache. If an entry can be found with a tag matching that of the desired datum, the datum in the entry is used instead. This situation is known as a cache hit. So, for example, a web browser program might check its local cache on disk to see if it has a local copy of the contents of a web page at a particular URL. In this example, the URL is the tag, and the contents of the web page is the datum. The percentage of accesses that result in cache hits is known as the hit rate or hit ratio of the cache.
The alternative situation, when the cache is consulted and found not to contain a datum with the desired tag, is known as a cache miss. The previously uncached datum fetched from the backing store during miss handling is usually copied into the cache, ready for the next access.
During a cache miss, the CPU usually ejects some other entry in order to make room for the previously uncached datum. The heuristic used to select the entry to eject is known as the replacement policy. One popular replacement policy, least recently used (LRU), replaces the least recently used entry (see cache algorithms). More efficient caches compute use frequency against the size of the stored contents, as well as the latencies and throughputs for both the cache and the backing store. While this works well for larger amounts of data, long latencies, and slow throughputs, such as experienced with a hard drive and the Internet, it's not efficient to use this for cached main memory (RAM).[citation needed]
When a datum is written to the cache, it must at some point be written to the backing store as well. The timing of this write is controlled by what is known as the write policy.
In a write-through cache, every write to the cache causes a synchronous write to the backing store.
Alternatively, in a write-back (or write-behind) cache, writes are not immediately mirrored to the store. Instead, the cache tracks which of its locations have been written over (these locations are marked dirty). The data in these locations is written back to the backing store when those data are evicted from the cache, an effect referred to as a lazy write. For this reason, a read miss in a write-back cache (which requires a block to be replaced by another) will often require two memory accesses to service: one to retrieve the needed datum, and one to write replaced data from the cache to the store.
Data write-back may be triggered by other policies as well. The client may make many changes to a datum in the cache, and then explicitly notify the cache to write back the datum.
No-write allocation is a cache policy where only processor reads are cached, thus avoiding the need for write-back or write-through when the old value of the datum was absent from the cache prior to the write.
The data in the backing store may be changed by entities other than the cache, in which case the copy in the cache may become out-of-date or stale. Alternatively, when the client updates the data in the cache, copies of that data in other caches will become stale. Communication protocols between the cache managers which keep the data consistent are known as coherency protocols.
Source(s):
wikipedia