diff -puN /dev/null fs/reiser4/znode.c --- /dev/null Thu Apr 11 07:25:15 2002 +++ 25-akpm/fs/reiser4/znode.c Wed Mar 30 14:55:08 2005 @@ -0,0 +1,1141 @@ /* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by * reiser4/README */ /* Znode manipulation functions. */ /* Znode is the in-memory header for a tree node. It is stored separately from the node itself so that it does not get written to disk. In this respect znode is like buffer head or page head. We also use znodes for additional reiser4 specific purposes: . they are organized into tree structure which is a part of whole reiser4 tree. . they are used to implement node grained locking . they are used to keep additional state associated with a node . they contain links to lists used by the transaction manager Znode is attached to some variable "block number" which is instance of fs/reiser4/tree.h:reiser4_block_nr type. Znode can exist without appropriate node being actually loaded in memory. Existence of znode itself is regulated by reference count (->x_count) in it. Each time thread acquires reference to znode through call to zget(), ->x_count is incremented and decremented on call to zput(). Data (content of node) are brought in memory through call to zload(), which also increments ->d_count reference counter. zload can block waiting on IO. Call to zrelse() decreases this counter. Also, ->c_count keeps track of number of child znodes and prevents parent znode from being recycled until all of its children are. ->c_count is decremented whenever child goes out of existence (being actually recycled in zdestroy()) which can be some time after last reference to this child dies if we support some form of LRU cache for znodes. */ /* EVERY ZNODE'S STORY 1. His infancy. Once upon a time, the znode was born deep inside of zget() by call to zalloc(). At the return from zget() znode had: . reference counter (x_count) of 1 . assigned block number, marked as used in bitmap . pointer to parent znode. Root znode parent pointer points to its father: "fake" znode. This, in turn, has NULL parent pointer. . hash table linkage . no data loaded from disk . no node plugin . no sibling linkage 2. His childhood Each node is either brought into memory as a result of tree traversal, or created afresh, creation of the root being a special case of the latter. In either case it's inserted into sibling list. This will typically require some ancillary tree traversing, but ultimately both sibling pointers will exist and JNODE_LEFT_CONNECTED and JNODE_RIGHT_CONNECTED will be true in zjnode.state. 3. His youth. If znode is bound to already existing node in a tree, its content is read from the disk by call to zload(). At that moment, JNODE_LOADED bit is set in zjnode.state and zdata() function starts to return non null for this znode. zload() further calls zparse() that determines which node layout this node is rendered in, and sets ->nplug on success. If znode is for new node just created, memory for it is allocated and zinit_new() function is called to initialise data, according to selected node layout. 4. His maturity. After this point, znode lingers in memory for some time. Threads can acquire references to znode either by blocknr through call to zget(), or by following a pointer to unallocated znode from internal item. Each time reference to znode is obtained, x_count is increased. Thread can read/write lock znode. Znode data can be loaded through calls to zload(), d_count will be increased appropriately. If all references to znode are released (x_count drops to 0), znode is not recycled immediately. Rather, it is still cached in the hash table in the hope that it will be accessed shortly. There are two ways in which znode existence can be terminated: . sudden death: node bound to this znode is removed from the tree . overpopulation: znode is purged out of memory due to memory pressure 5. His death. Death is complex process. When we irrevocably commit ourselves to decision to remove node from the tree, JNODE_HEARD_BANSHEE bit is set in zjnode.state of corresponding znode. This is done either in ->kill_hook() of internal item or in kill_root() function when tree root is removed. At this moment znode still has: . locks held on it, necessary write ones . references to it . disk block assigned to it . data loaded from the disk . pending requests for lock But once JNODE_HEARD_BANSHEE bit set, last call to unlock_znode() does node deletion. Node deletion includes two phases. First all ways to get references to that znode (sibling and parent links and hash lookup using block number stored in parent node) should be deleted -- it is done through sibling_list_remove(), also we assume that nobody uses down link from parent node due to its nonexistence or proper parent node locking and nobody uses parent pointers from children due to absence of them. Second we invalidate all pending lock requests which still are on znode's lock request queue, this is done by invalidate_lock(). Another JNODE_IS_DYING znode status bit is used to invalidate pending lock requests. Once it set all requesters are forced to return -EINVAL from longterm_lock_znode(). Future locking attempts are not possible because all ways to get references to that znode are removed already. Last, node is uncaptured from transaction. When last reference to the dying znode is just about to be released, block number for this lock is released and znode is removed from the hash table. Now znode can be recycled. [it's possible to free bitmap block and remove znode from the hash table when last lock is released. This will result in having referenced but completely orphaned znode] 6. Limbo As have been mentioned above znodes with reference counter 0 are still cached in a hash table. Once memory pressure increases they are purged out of there [this requires something like LRU list for efficient implementation. LRU list would also greatly simplify implementation of coord cache that would in this case morph to just scanning some initial segment of LRU list]. Data loaded into unreferenced znode are flushed back to the durable storage if necessary and memory is freed. Znodes themselves can be recycled at this point too. */