Multi-Processor Locking in 4.4BSD Lite²

Abstract

In a multi-processor system locking between processors to serialize access to critical data structures and variables is necessary. In 4.4BSD Lite² this has been implemented by the addition of a generic lock manager, and the existing file system code has been updated to use it. In this paper we will attempt to describe the lock manager implementation thoroughly, and look at how some of the file system code uses it.

In particular, we will examine how vnode reclamation is handled in situations where more than one CPU may be trying to free a vnode. We will start by examining the low-level atomic locks in detail. After that we will see how the atomic locks are used by the generic locking package, and examine how multiple, shared and exclusive locks are implemented.

The locking package also supports upgrading a shared lock to an exclusive lock, by draining out other shared locks. This will also be examined in detail.

We will also examine how the locking code is currently used in the updated file system code, specifically in the functions vgone(), vclean(), vrele(), vget() and getnewvnode() to protect against multiple CPUs trying to reclaim a vnode at the same time.

Annotated pseudo-code

The following is pseudo-code describing the lock manager, as well as part of the file system code. Following that there will be annotations of each part of the pseudo-code, describing in detail what the code does.

Low-level locking code

All multi-processor systems need some amount of locking between processors to make sure access to some data structures or hardware is done atomically. The low-level locking code described here is responsible for serializing such access. Note that the low-level locking code is using spin-locks (a processor will busy-wait while trying to acquire such a lock); an operation that can tie up resources on some architectures. Because of this, and also to implement shared locks, the low-level locking routines described here is usually used only by a high-level lock manager and to protect short critical sections of code.

simple_lock_init

The simple_lock_init() function is used to initialize a low-level spinlock. This must be done before any other operation on a lock can be performed. The contents of a struct simplelock is highly architecture dependant, and usually consist of the smallest data type that can be atomically modified.

        /*
         * Initialize a simple lock structure so it can be used
         */
        void
        simple_lock_init(struct simplelock *lock)
        {
[1]             clear lock->data
        }
[1]
Initialize simple lock state by setting it to the equivalent of unlocked. This operation is machine dependant, as is the structure of simple locks.

simple_lock

The simple_lock() function is used to acquire a low-level spinlock. It will spin waiting for a lock to be released if it is currently held by another processor. This is the only kind of lock that can be used inside an interrupt handler. For a low-level lock to be used in an interrupt handler, care must be taken to disable the interrupt, acquire the lock, do any processing, release the lock and re-enable the interrupt. This is necessary to avoid deadlock between the interrupt handler and other code executing on the same processor. Note that if there are no other processors in a system and the lock has already obtained, the function will never return as the lock cannot be released.

The test_and_set() function must be able to test and set the contents of a struct simplelock atomically across processors regardless of what bus is used to connect them, and return TRUE as long as the lock is already being held by a process.

        /*
         * Attempt to acquire a lock, spin until lock obtained
         */
        void
        simple_lock(struct simplelock *lock)
        {
[2]             while (test_and_set returns TRUE)
                        ;
        }
[2]
Spin, trying to acquire a simple lock. This operation will busy-wait until such time as the lock could be obtained.

simple_lock_try

The simple_lock_try() function is used to acquire a low-level spinlock. If the lock has already been acquired the function will return failure immediately. Look at simple_lock() for a description of test_and_set().

        /*
         * Attempt to acquire a lock
         *
         * Return TRUE if success, FALSE otherwise
         */
        int
        simple_lock_try(struct simplelock *lock)
        {
[3]             if (test_and_set returns FALSE)
                        return TRUE
                return FALSE
        }
[3]
Call test_and_test() once, attempting to acquire a lock. If the lock is acquired, return TRUE, otherwise FALSE. This operation will never block.

simple_unlock

simple_unlock() will essentially perform the same operation as simple_lock_init() above. However, some architectures might need to have the state of a low-level spinlock propagated to other processors in ways that would not be needed for the initialization of a lock, but only when setting or clearing it.

Many multiprocessor architectures, like systems based on Intel®'s MP specification do not need to do this, but need only clear the contents of the struct simplelock.

        /*
         * Unlock
         */
        void
        simple_unlock(struct simplelock *lock)
        {
[4]             clear lock->data
        }
[4]
Reset a lock previously acquired with simple_lock() or simple_lock_try(). This will set the state of the lock to the same as when the lock was first initialized with simple_lock_init().

High-level locking code

The following section will describe the lock manager as found in 4.4BSD Lite². This lock manager supplies both exclusive and shared locks, with recursive exclusive locks within a single process. It also allows upgrading a shared lock to an exclusive lock, as well as downgrading an exclusive lock to a shared lock.

lockinit

As with the low-level spinlocks, a high-level lock needs to be initialized before use. In the 4.4BSD Lite² implementation, this will initialize the associated struct simplelock and save some parameters for the lock.

The parameters that can be specified for a high-level lock are priority, which is used to determine a process' priority when it is woken up after sleeping on a lock, a message used when a process goes to sleep waiting for a lock, so that the exact reason a process is sleeping can easily be identified, a timeout, and flags specifying options governing the lock's behavior.

        /*
         * Initialize a lock
         */
        void
        lockinit(struct lock *lock, int pri, char *msg, int timeout, int flags)
        {
[5]             call simple_lock_init on lock->simplelock
                set lock->priority to pri
                set lock->wmesg to msg
                set lock->timo to timeout
                set lock->flags to flags
        }
[5]
First call simple_lock_init() on the embedded low-level lock. Initialize the remaining fields in a lock structure from the arguments provided; the priority to wake up at after being put to sleep on a lock, the message to pass to the sleep routing when blocking on a lock, the timeout and any auxiliary flags on a lock.

lockmgr

The lockmgr() function is used to perform all operations on a high-level lock after it has been initialized with lockinit(). Operations supported are LK_SHARED to obtain a shared lock, LK_DOWNGRADE to downgrade an exclusive lock to a shared lock, LK_EXCLUPGRADE and LK_UPGRADE to upgrade a shared lock to an exclusive lock, LK_EXCLUSIVE to obtain an exclusive lock, LK_RELEASE to free a lock and LK_DRAIN to drain out all locks active locks.

The LK_EXCLUPGRADE and LK_UPGRADE calls are essentially the same, but LK_EXCLUPGRADE will fail if any other process has already requested an upgrade to an exclusive lock.

A process can also specify that it does not want to be put to sleep waiting for a lock, such that if a lock request can not be honored immediately, the lock manager will return an EBUSY error to the calling function. This is described by the boolean condition polling in the following pseudo-code.

If at any time the process is put to sleep waiting for a lock to become available, the message specified in lockinit() will be used to provide information about the reason the process was put to sleep. When a process wakes up after sleeping on a lock, it's priority will be set to that specified when the lock was first initialized.

        /*
         * Acquire, change or release a lock
         *
         * Return 0 if successful, error code if not
         */
        int
        lockmgr(struct lock *lock, u_int flags, struct simplelock *slock, struct proc *p)
        {
                clear error;
[6]             call simple_lock on lock->interlock
[7]             if (flags & LK_INTERLOCK)
                        call simple_unlock on slock
[8]             save combined new and old lock flags
                switch (lock request) {
                        case LK_SHARED:
[9]                             if (current process does not own a lock) {
                                        if (someone else is waiting)
[10]                                            if (polling) {
                                                        set error to EBUSY
                                                        break;
                                                }
[11]                                    sleep until exclusive locks and upgrades are cleared;
[12]                                    if (error is set)
                                                break;
[13]                                    increment shared lock count;
                                        break;
                                }
[14]                            increment shared lock count;
                        case LK_DOWNGRADE:
[15]                            increment shared lock count by exclusive lock count;
[16]                            reset exclusive lock count and flag;
[17]                            if (anyone waiting for exclusive locks to clear)
                                        wake them up;
                                break;
                        case LK_EXCLUPGRADE:
[18]                            if (anyone else is waiting for exclusive upgrade) {
                                        reduce shared lock count by one;
                                        set error to EBUSY;
                                        break;
                                }
                        case LK_UPGRADE:
[19]                            reduce shared lock count by one;
                                if (we would block)
[20]                                    if (polling) {
                                                set error to EBUSY;
                                                break;
                                        }
[21]                            if (noone is waiting for upgrade) {
                                        set WANT_UPGRADE flag
                                        sleep until shared locks are drained;
                                        if (error is set)
                                                break;
[22]                                    set exclusive lock flag and count;
                                        break;
                                }
[23]                            if (anyone else is waiting for upgrade)
                                        wake them up;
                        case LK_EXCLUSIVE:
[24]                            if (we have exclusive lock) {
                                        increment exclusive lock count;
                                        break;
                                }
                                if (we would block)
[25]                                    if (polling) {
                                                set error to EBUSY;
                                                break;
                                        }
[26]                            if (anyone is waiting for exclusive lock) {
                                        sleep until WANT_EXCLUSIVE is clear;
                                        if (error is set)
                                                break;
                                }
[27]                            set WANT_EXCLUSIVE flag;
                                sleep until shared locks and upgrades are drained;
                                reset WANT_EXCLUSIVE flag;
                                if (error is set)
                                        break;
[28]                            set exclusive lock count to one;
                                break;
                        case LK_RELEASE:
[29]                            if (exclusive locks exist) {
                                        decrement exclusive lock count;
[30]                                    if (lock count zero)
                                                clear exclusive lock flag;
                                }
[31]                            else if (shared locks exist)
                                        decrement shared lock count;
[32]                            if (anyone waiting for locks to drain)
                                        wake them up;
                                break;
                        case LK_DRAIN:
                                if (we would block)
[33]                                    if (polling) {
                                                set error to EBUSY;
                                                break;
                                        }
[34]                            sleep until waits for exclusive locks and upgrades are drained;
                                reset error;
[35]                            while (any exclusive locks, upgrades or shared locks exist) {
                                        set WAIT_DRAIN flag;
[36]                                    call simple_unlock on lock->simplelock;
[37]                                    sleep until a lock has drained;
                                        if (error is set)
                                                return error;
[38]                                    call simple_lock on lock->simplelock;
                                }
[39]                            set draining and exclusive lock flag;
                                set exclusive lock count to one;
                                break;
                }
[40]            if (we are draining locks and all locks are gone)
                        wake them up anyone waiting for locks to change;
[41]            call simple_unlock on lock->interlock
                return error;
        }
[6]
Make sure the low-level interlock is acquired to avoid other changes being done to the lock while we are changing it.
[7]
If we were passed a low-level lock that was held, release it now that we have made sure no changes will be made to the locking structure.
[8]
Store externally settable flags on the lock, combining the currently set flags and the flags passed into the lock manager.
[9]
If the current process is holding an exclusive lock already, changing the lock to a shared lock is simple.
[10]
Another process is holding an exclusive lock or is waiting to acquire one. If we would block waiting for the lock to clear and were told not to wait, return EBUSY to indicate this.
[11]
Go to sleep until exclusive locks are cleared and any upgrades are completed. This will loop until a lock could be acquired, incrementing the count of sleepers and releasing the interlock immediately before going to sleep, acquiring the interlock and decrementing the count of sleepers (in that order) when woken.
[12]
If we got an error while sleeping, waiting for the locks to clear, stop processing the lock request.
[13]
Increment the count of shared locks and break out of the switch statement to finish processing the lock request.
[14]
Increment the number of shared locks held and fall through to the code handling the downgrading of exclusive locks to shared locks.
[15]
Increment the shared lock count by the number of exclusive locks held by the calling process. Note that it is only possible to have more than one exclusive lock by the same process acquiring recursive locks.
[16]
Reset the count of exclusive locks held and clear the exclusive lock flag, changing the lock to a shared lock.
[17]
If anyone is waiting for the exclusive lock to be cleared, wake them up, then break out of the switch statement to finish processing the lock request.
[18]
If anyone else is waiting to upgrade a lock from shared to exclusive, decrement the shared lock count (releasing the shared lock) and return EBUSY. Otherwise fall through to the generic upgrade code.
[19]
Decrement the shared lock count, releasing the shared lock held by the current process.
[20]
If we are just polling for the upgrade and someone else has an exclusive lock or is waiting for an upgrade to exclusive lock, return EBUSY instead of blocking.
[21]
If no other process is waiting for a lock upgrade, we can do the upgrade to exclusive lock in a simpler and faster way. Otherwise, set the flag indicating that we are waiting to do an upgrade of the lock and go to sleep until all shared locks have been drained. If we get an error, abort processing and return the error code.
[22]
Set the exclusive lock flag and count of all currently held exclusive locks to 1 and break out of the switch statement to complete processing of the lock request.
[23]
If anyone else is waiting for an upgrade to exclusive lock, and we held the last shared lock (released above), wake them up now. Then fall through to the exclusive lock code to obtain the exclusive lock.
[24]
If we already have an exclusive lock, this is a recursive lock request. Just increase the count of exclusive locks held and break out of the switch statement to finish lock processing.
[25]
If we would block waiting for the lock (someone already has a shared or exclusive lock) and we are polling, just return EBUSY.
[26]
If anyone else has already started processing an exclusive lock request, go to sleep until is has been cleared. If we get an error while waiting, abort processing.
[27]
Set the WANT_EXCLUSIVE flag on the lock and go to sleep until all other upgrades and shared locks have been drained. When we wake up, clear the WANT_EXCLUSIVE flag. If we get an error while sleeping, abort processing. Note that we clear the WANT_EXCLUSIVE flag regardless of whether we get an error or not.
[28]
Set exclusive lock flag, set exclusive lock count to one and assert that we are holding the exclusive lock. Break out of the switch to finish processing the lock request.
[29]
If we are holding any exclusive locks, decrement the exclusive lock count.
[30]
If the exclusive lock count reaches zero, clear the exclusive lock flag.
[31]
If we do not hold any exclusive locks, but we do hold shared locks, decrement the shared lock count.
[32]
If anyone is waiting for locks to drain, wake them up before breaking out of the switch statement and finish processing the lock request.
[33]
If we would block waiting for locks or upgrades to clear, and we are just polling, return EBUSY.
[34]
Go to sleep until all upgrades and requests waiting for exclusive locks have been drained.
[35]
Clear the error status and repeat steps [36] through [38] while there are any exclusive locks, shared locks or lock upgrades in progress.
[36]
Set the WAIT_DRAIN flag and release the interlock on the lock structure.
[37]
Go to sleep until the lock flags have been changed, signaling a change in upgrades or exclusive locks. If we get an error while sleeping, abort processing.
[38]
Acquire the interlock on the lock structure again, making sure we hold it while we continue the processing of lock structure members which might otherwise change.
[39]
Set the draining and exclusive lock flags, making sure no other process squires a lock until we have finished processing the request.
[40]
If we have been draining locks, and there is no one waiting for exclusive locks or upgrades, clear the draining flag on the lock and wake up any processes waiting for the lock to change.
[41]
Release the interlock we held throughout processing the request and return any error status that might have been set.

lockstatus

The lockstatus() function is used to retrieve the status of a lock. The lock must have been previously initialized by lockinit(), but need not otherwise have been used.

The function will return a status value indicating whether any locks are held, and if so, what kind of lock (shared or exclusive). All accesses to internal lock data are protected by asserting the high-level lock's interlock while examining the lock data.

        /*
         * Get the status of a lock
         */
        int
        lockstatus(struct lock *lock)
        {
[42]            call simple_lock on lock->simplelock;
[43]            if (lock->exclusive_count) {
                        call simple_unlock on lock->simplelock;
                        return exclusive status;
                }
[44]            if (lock->shared_count) {
                        call simple_unlock on lock->simplelock;
                        return shared status;
                }
[45]            call simple_unlock on lock->simplelock;
                return unlocked status;
        }
[42]
Call simple_lock() on the embedded interlock to guarantee the lock does not change out from under us.
[43]
If there are any exclusive locks held, unlock the embedded interlock and return a value indicating that exclusive locks are held.
[44]
If there are any shared locks held, unlock the embedded interlock and return a value indicating that shared locks are held.
[45]
Unlock the embedded interlock and return a value indicating that no locks are currently held.

File system code

The file system code in 4.4BSD Lite² has been modified to be multi-processor safe. In the following section we will examine how the vnode reclamation code uses locks around critical sections to serialize access to vnodes.

vgone

The system calls vgone() to eliminate all activity on a vnode. This is done whenever a vnode will be re-used, possibly for a different file system. The operation needs to be protected to avoid race conditions where the system might attempt to re-use the vnode due to a process re-opening the file previously associated with it on a different processor or another processor attempts to reclaim the vnode at the same time.

We do this by utilizing the struct simplelock embedded in the vnode and using the low-level lock functions to serialize access to it. Note that a low-level spinlock may not be held for the duration of the reclamation process. We also have to verify that the vnode is not already being reclaimed after acquiring the spinlock. If we determine that reclamation is already in progress, we release the spinlock and go to sleep until the reclamation process has completed.

We will also move the vnode to the head of the free list if it is already on it, to force the system to re-use it as soon as possible, since it is no longer associated with any underlying file system. Access to the vnode free list also needs to be serialized using a struct simplelock.

At the end of processing, the vnode will be marked as being "BAD". We have already associated the vnode with the "dead" file system by calling vclean() on it.

        /*
         * Eliminate all activity associated with a vnode
         * in preparation for re-use
         */
        void
        vgone(struct vnode *vp)
        {
[46]            call simple_lock on vp->interlock;
[47]            if (vgone or vclean in progress) {
                        call simple_unlock on vp->interlock;
[48]                    wait for vgone or vclean to finish;
                        return;
                }
[49]            call vclean on vp to close file;
[50]            remove from mount list if on it;
                if (vnode on freelist) {
                        call simple_lock on free_list_lock;
[51]                    move vnode to head of free list;
                        call simple_unlock on free_list_lock;
                }
[52]            set vnode type to BAD;
        }
[46]
Acquire the vnode interlock to ensure the vnode does not change out from under us.
[47]
If the vnode is already being reclaimed we must let that complete and then release the vnode interlock.
[48]
Sleep until the concurrent vgone() or vclean() is done and return.
[49]
Use vclean() to close the underlying file and clear out any underlying file system specific data from the vnode.
[50]
If the vnode is on the mount point vnode list, remove it.
[51]
If the vnode is already on the free list, we want to move it to the head of the list so that it will be re-used sooner. Lock the free list, move the vnode to the head of the list and unlock the free list.
[52]
Set the vnode type to "BAD". It has already been associated with the "dead" file system by the call to vclean().

vclean

The vclean() function is used to disassociate the attached underlying file system from a vnode. Note that the embedded struct simplelock is always held on entry to the function. We also set the VX_LOCK flag on the vnode, marking it as currently being reclaimed before we continue processing the reclamation.

If we have been asked to close the file associated with the vnode, we will invalidate all associated buffers, forcing them to be written to the underlying file system. If the file associated with the vnode was open on entry, we call the underlying file system, closing the file, and deactivating the vnode. This operation will release the embedded struct simplelock. If the vnode is not currently active, we release the embedded lock.

We then call the underlying file system to reclaim the file system specific portion of the vnode, and if the vnode was active we call vrele() to release it and invalidate any references to the vnode in the buffer cache.

If there were any lock structures associated with the vnode, we free them, and point the vnode at the "dead" file system so that any attempt to do operations on the vnode will fail.

The vnode has now been completely reclaimed, and we can clear the lock flag we set at the start of processing and wake up any processes waiting for the reclamation to complete.

        /*
         * Disassociate the underlying file system from a vnode.
         * The vnode interlock is held on entry
         */
        void
        vclean(struct vnode *vp, int flags, struct proc *p)
        {
[53]            save vnode active count;
[54]            obtain lock on vnode;
[55]            if (vnode is closing)
                        invalidate associated buffers;
[56]            if (vnode is active) {
[57]                    if (vnode is closing)
                                call close routine;
[58]                    inactivate vnode;
                }
                else
                        release lock on vnode;
[59]            reclaim vnode;
                if (vnode was active)
[60]                    call vrele to release vnode;
[61]            purge cache information about vnode;
[62]            free lock structures associated with vnode;
[63]            make vnode reference dead filesystem;
[64]            clear the reclaimation lock;
                wake up any sleepers on vnode;
        }
[53]
Save away the current active count on the vnode so we can use this to decide how to proceed after we have deactivated it. Also increment the reference count on the vnode so that no other process tries to recycle it before we have completed.
[54]
Mark the vnode as being locked for reclamation. Note that the low-level vnode interlock has already been acquired at this point. This lock ensures that no new processes will attempt to acquire any locks on the vnode. Thus, once all shared and exclusive high-level locks has been drained, it will be safe to deallocate the high-level lock structures in the VOP_RECLAIM() routine.
[55]
If the file associated with the vnode is being closed, invalidate all associated buffers in the buffer cache.
[56]
Use the active count saved above to determine whether or not the vnode is currently active. If it is not, release the vnode interlock.
[57]
If we are closing the associated file, call the underlying file system to close it. This will release the vnode interlock that was acquired on entry.
[58]
Mark the vnode as inactive.
[59]
Call the underlying file system to reclaim the file system specific portion of the vnode.
[60]
If the active count saved above indicated that the vnode was currently active, call vrele() to release it.
[61]
Purge all information about the vnode from the vnode cache.
[62]
Free all lock structures associated with the vnode.
[63]
Point the vnode at the "dead" file system so that any references to the vnode will fail.
[64]
Clear the lock flag that was previously set in [52]. If any processes are waiting for the lock to be cleared, wake them up.

vrele

vrele() is called to remove references to a vnode. Changing the reference count has to be done inside a spinlock to serialize access to the vnode. If we removed the last reference to the vnode, acquire a spinlock on the vnode free list, append the vnode to the free list, and, release the free list spinlock. If we can get an exclusive lock on the vnode, call the underlying file system to inform it that we removed the last reference to the vnode.

        /*
         * Release a vnode.  If count drops to zero, call inactive
         * routine and return to free list
         */
        void
        vrele(struct vnode *vp)
        {
                call simple_lock on vp->interlock;
[65]            decrement vp->usecount;
[66]            if (vp->usecount > 0) {
                        call simple_unlock on vp->interlock;
                        return;
                }
[67]            call simple_lock on free_list_lock;
[68]            append vp to free list;
                call simple_unlock on free_list_lock;
[69]            if (exclusive lock on vp succeeds)
                        call inactive on vp;
        }
[65]
After setting the vnode interlock, decrement the count of active references to the vnode.
[66]
If there are still active references to the vnode, release the vnode interlock and return.
[67]
Acquire the lock on the free list to make sure no other processor attempts to add or remove vnodes from it.
[68]
Add the vnode to the vnode free list and release the lock on the free list.
[69]
Try to acquire an exclusive lock on the vnode; if successful mark it as inactive. Note that if the exclusive lock fails, the vnode interlock is released. The vnode inactivation will release the vnode interlock.

vget

The vget() function is called when we want to reuse a specific vnode, i.e. when we open a file that has been open recently and for which the associated vnode has not been reclaimed for a different file.

The first thing that is done is to acquire the vnode interlock unless it was already set before vget() was called. We then check to see whether reclamation is in progress by vclean(). If it is, we flag that we want to be woken up once reclamation has finished, after which we release the vnode interlock. We then go to sleep until the reclamation process has finished and return ENOENT to indicate the vnode is no longer usable.

If there are currently no references to the vnode, it is on the vnode free list. Acquire a lock on the free list, remove the vnode and release the lock.

Increment the reference count on the vnode, and see of we were told to acquire a high-level lock on it. If so, try acquiring the requested locks, and call vrele() is not successful. Note that acquiring the high-level lock has the side effect of clearing the vnode interlock. Return the status from attempting to acquire the high level vnode lock.

All processing has now been completed, release the vnode interlock and return.

        /*
         * Grab a particular vnode from the free list, increment its
         * reference count and lock it. If the vnode lock bit is set the

         * vnode is being eliminated in vgone. The process is awakened
         * when the transition is completed, and an error returned to
         * indicate that the vnode is no longer usable (possibly having
         * been changed to a new file system type).
         */
        int
        vget(struct vnode *vp, int flags, struct proc *p)
        {
[70]            if (vnode interlock not set)
                        acquire interlock;
[71]            if (vnode reclamation in progress) {
                        flag we want it;
                        release vnode interlock;
[72]                    sleep until vnode reclamation done;
                        return ENOENT status;
                }
[73]            if (no references to vnode) {
                        acquire vnode free list lock;
[74]                    remove vnode from free list;
                        release vnode free list lock;
                }
                increment number of references to vnode;
[75]            if (high-level locks wanted) {
                        if (we cannot acquire the high-level lock)
                                call vrele on vnode;
                        return status from acquiring high-level lock;
                }
[76]            release vnode interlock;
                return OK status;
        }
[70]
Unless we are told that the vnode interlock was already set at entry, acquire the vnode interlock now.
[71]
If the VXLOCK flag is set in the vnode, reclamation is in progress. We then set the VXWANT flag indicating that we will be sleeping until reclamation is complete and release the interlock.
[72]
Go to sleep on the vnode until the reclamation has been completed. When we wake up, return ENOENT indicating that the vnode is no longer usable.
[73]
If there were currently no references to the vnode, it has been put on the free list and we need to take it off.
[74]
Acquire a lock on the vnode free list, remove the specified vnode from it and release the lock.
[75]
If we were told to acquire a high-level lock on the vnode, attempt to do so now. Note that attempting to acquire the high-level lock will release the vnode interlock we acquired at the start. If we were unable to acquire the high-level lock, call vrele() to put the vnode back on the free list. Return the status code from attempting to acquire the high-level lock on the vnode.
[76]
All processing has been completed, and the vnode is ready for reuse. Release the interlock and return with the good news.

getnewvnode

The getnewvnode() is used to allocate a new vnode. It will attempt to reclaim vnodes from the vnode free list unless the free list is currently empty or the current number of vnodes is less than the desired number of vnodes on the system.

Start out by acquiring the lock on the vnode free list. Then, if the free list is empty and the current number of vnodes is less than twice the desired maximum number of vnodes on the system, or the current count is less than the desired number of vnodes, release the lock on the vnode free list. Allocate a new vnode and clear it and increment the total number of vnodes on the system.

If we did not allocate a new vnode as outlined above, we need to find one on the vnode free list. Walk down the free list, attempting to acquire the interlock on each vnode in turn. If we find a vnode we could lock, stop the process. If we walk to the end of the free list without being able to lock a vnode, release the lock on the free list and return ENFILE status.

Having found a vnode on the free list, we take it off the list. We also mark the vnode so that vgone() will not move it on the free list. We are now done with the free list, and release the lock we acquired at the start of processing.

If the vnode found above has not been completely cleared out, we now call vgone() on it to release any file system specific data associated with it. Note that this will release the vnode interlock. If the vnode had already been cleared out, we simply release the interlock.

Regardless of whether we allocated a new vnode or picked one out from the free list, we now set values according to the arguments passed. We start by setting the type to NONE, indicating that it is not currently associated with any file. We call cache_purge() to make sure any old data associated with the vnode has removed from the cache. We then set up the vnode operations table according to the vops argument passed and insert the vnode on the specified mount queue.

Last, we set the current number of references to the vnode to one and return status indicating success.

        /*
         * Return the next vnode from the free list
         */
        int
        getnewvnode(enum vtagtype tag, struct mount *mp, int (**vops)(), struct vnode **vpp)
        {
                acquire lock on vnode free list;
[77]            if ((vnode free list empty && current vnode count < twice the desired count) ||
                    current vnode count < desired vnode count) {
                        release lock on vnode free list;
[78]                    allocate new vnode and clear it;
                        increment the vnode count;
                }
                else {
[79]                    walk vnode free list until we can acquire interlock on vnode;
[80]                    if (no vnode could be locked) {
                                release lock on vnode free list;
                                return ENFILE status;
                        }
[81]                    remove vnode from vnode free list;
[82]                    mark vnode so vgone() will skip it over;
                        release lock on vnode free list;
[83]                    if (vnode not pointing to dead file system)
                                call vgone to release associated file system data;
                        else
                                release vnode interlock;
[84]                    clear vnode data;
                }
[85]            set vnode type to NONE;
[86]            call cache_purge on vnode;
                initialize vnode from arguments;
[87]            insert vnode into the specified mount queue;
                set reference count on vnode to 1;
                return OK;
        }
[77]
Start out by acquiring a lock on the vnode free list. Then, if the current number of vnodes is less than the desired number of vnodes or the free list is empty and the current number of vnodes is less than twice the number of desired number, allocate a new vnode.
[78]
Release the lock on the vnode free list and allocate a new vnode. Clear the newly allocated vnode and increment the total count of vnodes we have.
[79]
We have found that we need to get a vnode from the free list. Start walking down the free list, attempting to acquire an interlock on each vnode visited. If we find one we are able to lock, stop walking the free list and continue processing.
[80]
If we could not find and acquire the interlock on any vnode above, release the lock on the free list and return ENFILE, indicating that no more new vnodes can be allocated or opened.
[81]
Having found a usable vnode on the free list, take it off the free list.
[82]
Mark the vnode with the value 0xdeadb which is special-cased in vgone(), making sure it does not get get reordered on the free list.
[83]
If the vnode from the free list was not marked as "BAD", indicating it has been disassociated from the underlying file system, call vgone() on it to release any associated file system data. This has the side effect of releasing the interlock on the vnode. If it had already been disassociated from the underlying file system, just release the interlock acquired above.
[84]
Now that we are ready to reuse the vnode, clear out fields that might have been used previously.
[85]
Set the vnode type to "NONE", indicating it is not currently associated with any specific file type.
[86]
Call cache_purge() on the vnode, ensuring that any data previously associated with it is invalidated. Then use the arguments passed into getnewvnode() to associate the vnode with the specified file system.
[87]
Use insmntque() to insert the vnode onto the specified mount queue and set the reference count to one. Return success to the calling function.