From b6bde0dc4079a17c0ce1e6d604f5c9a5e2ae5a8b Mon Sep 17 00:00:00 2001 From: Daniel Hillenbrand Date: Tue, 2 Apr 2013 16:17:47 +0000 Subject: block: add row iosched Change-Id: I01857784f0506a0b443dd01c801908e0beea5f66 --- Documentation/block/row-iosched.txt | 116 ++++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 Documentation/block/row-iosched.txt (limited to 'Documentation') diff --git a/Documentation/block/row-iosched.txt b/Documentation/block/row-iosched.txt new file mode 100644 index 00000000000..272887944fc --- /dev/null +++ b/Documentation/block/row-iosched.txt @@ -0,0 +1,116 @@ +Introduction +============ + +The ROW scheduling algorithm will be used in mobile devices as default +block layer IO scheduling algorithm. ROW stands for "READ Over WRITE" +which is the main requests dispatch policy of this algorithm. + +The ROW IO scheduler was developed with the mobile devices needs in +mind. In mobile devices we favor user experience upon everything else, +thus we want to give READ IO requests as much priority as possible. +The main idea of the ROW scheduling policy is: +If there are READ requests in pipe - dispatch them but don't starve +the WRITE requests too much. + +Software description +==================== +The requests are kept in queues according to their priority. The +dispatching of requests is done in a Round Robin manner with a +different slice for each queue. The dispatch quantum for a specific +queue is defined according to the queues priority. READ queues are +given bigger dispatch quantum than the WRITE queues, within a dispatch +cycle. + +At the moment there are 6 types of queues the requests are +distributed to: +- High priority READ queue +- High priority Synchronous WRITE queue +- Regular priority READ queue +- Regular priority Synchronous WRITE queue +- Regular priority WRITE queue +- Low priority READ queue + +If in a certain dispatch cycle one of the queues was empty and didn't +use its quantum that queue will be marked as "un-served". If we're in a +middle of a dispatch cycle dispatching from queue Y and a request +arrives for queue X that was un-served in the previous cycle, if X's +priority is higher than Y's, queue X will be preempted in the favor of +queue Y. This won't mean that cycle is restarted. The "dispatched" +counter of queue X will remain unchanged. Once queue Y uses up it's quantum +(or there will be no more requests left on it) we'll switch back to queue X +and allow it to finish it's quantum. + +For READ requests queues we allow idling in within a dispatch quantum in +order to give the application a chance to insert more requests. Idling +means adding some extra time for serving a certain queue even if the +queue is empty. The idling is enabled if we identify the application is +inserting requests in a high frequency. + +For idling on READ queues we use timer mechanism. When the timer expires, +if there are requests in the scheduler we will signal the underlying driver +(for example the MMC driver) to fetch another request for dispatch. + +The ROW algorithm takes the scheduling policy one step further, making +it a bit more "user-needs oriented", by allowing the application to +hint on the urgency of its requests. For example: even among the READ +requests several requests may be more urgent for completion then others. +The former will go to the High priority READ queue, that is given the +bigger dispatch quantum than any other queue. + +ROW scheduler will support special services for block devices that +supports High Priority Requests. That is, the scheduler may inform the +device upon urgent requests using new callback make_urgent_request. +In addition it will support rescheduling of requests that were +interrupted. For example, if the device issues a long write request and +a sudden high priority read interrupt pops in, the scheduler will +inform the device about the urgent request, so the device can stop the +current write request and serve the high priority read request. In such +a case the device may also send back to the scheduler the reminder of +the interrupted write request, such that the scheduler may continue +sending high priority requests without the need to interrupt the +ongoing write again and again. The write remainder will be sent later on +according to the scheduler policy. + +Design +====== +Existing algorithms (cfq, deadline) sort the io requests according LBA. +When deciding on the next request to dispatch they choose the closest +request to the current disk head position (from handling last +dispatched request). This is done in order to reduce the disk head +movement to a minimum. +We feel that this functionality isn't really needed in mobile devices. +Usually applications that write/read large chunks of data insert the +requests in already sorted LBA order. Thus dealing with sort trees adds +unnecessary complexity. + +We're planing to try this enhancement in the future to check if the +performance is influenced by it. + +SMP/multi-core +============== +At the moment the code is acceded from 2 contexts: +- Application context (from block/elevator layer): adding the requests. +- Underlying driver context (for example the mmc driver thread): dispatching + the requests and notifying on completion. + +One lock is used to synchronize between the two. This lock is provided +by the underlying driver along with the dispatch queue. + +Config options +============== +1. hp_read_quantum: dispatch quantum for the high priority READ queue +2. rp_read_quantum: dispatch quantum for the regular priority READ queue +3. hp_swrite_quantum: dispatch quantum for the high priority Synchronous + WRITE queue +4. rp_swrite_quantum: dispatch quantum for the regular priority + Synchronous WRITE queue +5. rp_write_quantum: dispatch quantum for the regular priority WRITE + queue +6. lp_read_quantum: dispatch quantum for the low priority READ queue +7. lp_swrite_quantum: dispatch quantum for the low priority Synchronous + WRITE queue +8. read_idle: how long to idle on read queue in Msec (in case idling + is enabled on that queue). +9. read_idle_freq: frequency of inserting READ requests that will + trigger idling. This is the time in Msec between inserting two READ + requests -- cgit v1.2.3