There are many angles you can look at the system to predict in performance, the model baron has published for example is good for measuring scalability of the system as concurrency growths. In many cases however we’re facing a need to answer a question how much load a given system can handle when load is low and we might not be able to perform reliable benchmark.
Before I get into further details I’d like to look at basics – what resources are really needed to provide resource for given query ? It surely needs CPU cycles, it may need disk IO. You may also need other resources such as network IO or memory to store temporary table, but let us ignore them for a moment. The amount of resources system has will place a limit on amount of queries system can ran, for example if we have query which requires 1 CPU second and 1 IO to execute and we have 16 core system with hard drive which can do 100 IOPS we will be consuming all CPU when we’re running 16 seconds per second.
Of course no system scale perfectly, so you would unlikely be able to get 16 queries per second on such system. There are internal scaling aspect of the system which include both latching as well as inevitable application specific scalability restrictions, such as row level locks. There is also a load aspect – “random arrivals” tell us the number of work system has to do will vary significantly all the time. Baron’s model deals with some of these pretty well and for sake of this discussion we will diminish it to workload and hardware specific constant. For example we can use factor of 0.7 and state we can
safely run 0.7*16 ~= 12 queries per second. For some ideas about what factor may make sense for your system check out Thinking Clearly About Performance by Cary Millsap.
So how we can use this data to estimate capacity of MySQL system ? We can look at CPU and IO consumption per Query and compare it to estimated (or benchmarked) system performance to provide our estimates.
If we’re running Innodb with MySQL we can use Innodb_data_reads , Innodb_data_writes, Innodb_os_log_fsyncs for your disk IO estimation. When you can divide it per number of “Questions” or “Com_select” to get amount of IO per query or per Select. It is good to check it over certain intervals – some workloads will have this as a very stable value for others it might go back and forth a lot.
How to get CPU consumption per query ? You can take a look at procfs for MySQL process:
root@ubuntu:/var/log/mysql# cat /proc/19018/stat 19018 (mysqld) S 1 19018 19018 0 -1 4202752 198731 0 0 0 347 5303 0 0 20 0 20 0 75673117 472850432 12361 18446744073709551615 4194304 11903564 140737335329904 140737335328304 139790070763411 0 552967 4096 26345 18446744073709551615 0 0 17 0 0 0 0 0 0
The #14 and #15 here is kernel and user CPU usage of MySQL process in 1/100 of the second. (This is pretty idle test system). So 347 and 5303 correspond to 3.47 seconds of user time and 53.03 system time consumed by the process. Collecting these at regular intervals and correlating to number of queries running will give average CPU usage per query.
If you’re running Percona Server you can get the value from User Statistics
*************************** 1. row *************************** USER:user TOTAL_CONNECTIONS: 1 CONCURRENT_CONNECTIONS: 0 CONNECTED_TIME: 800 BUSY_TIME: 775 CPU_TIME: 49 BYTES_RECEIVED: 21847267 BYTES_SENT: 336986112 BINLOG_BYTES_WRITTEN: 0 ROWS_FETCHED: 485139 ROWS_UPDATED: 0 TABLE_ROWS_READ: 610954 SELECT_COMMANDS: 181243 UPDATE_COMMANDS: 0 OTHER_COMMANDS: 0 COMMIT_TRANSACTIONS: 181243 ROLLBACK_TRANSACTIONS: 0 DENIED_CONNECTIONS: 0 LOST_CONNECTIONS: 0 ACCESS_DENIED: 0 EMPTY_QUERIES: 13099
In this case I can see this user took 49 CPU seconds per 181243 select queries which is about 270us per select query. We can also get “BUSY TIME” here, subtracting CPU time from it we get “Wait Time”
which is in this case 775-49=726 seconds or 4005us per select. Wait time is often IO (which you can see separately through number of IOPS) but it also can be row level locks, etc. The ratio between Wait Time and CPU time is very helpful to see how “wait free your system is”. If it system is already have low wait ratio increasing amount of memory for example is unlikely to help.
One helpful way to use this information is to compare systems with different memory amount having same workload. You will often see increasing amount of memory not only helps you to reduce wait time and number of IOs per query as well as increase CPU time spent. This is because IO handling requires significant number of CPU.
With Percona Server, enabled full query logging and log_slow_verbosity=full you can also get great amount of related data from mk-query-digest report:
# Overall: 1.79M total, 115 unique, 0 QPS, 0x concurrency ________________ # Attribute total min max avg 95% stddev median # ============ ======= ======= ======= ======= ======= ======= ======= # Exec time 8338s 1us 284s 5ms 13ms 298ms 185us # Lock time 71s 0 3ms 39us 54us 28us 35us # Rows sent 7.15M 0 56.36k 4.19 0.99 360.95 0.99 # Rows examine 33.77M 0 2.21M 19.81 0.99 3.53k 0.99 # Rows affecte 0 0 0 0 0 0 0 # Rows read 5.25M 0 507.86k 3.08 1.96 469.31 0.99 # Bytes sent 8.77G 11 55.41M 5.15k 3.88k 258.55k 1.46k # Merge passes 0 0 0 0 0 0 0 # Tmp tables 21.36k 0 2 0.01 0 0.11 0 # Tmp disk tbl 21.21k 0 2 0.01 0 0.11 0 # Tmp tbl size 554.04M 0 12.83M 324.99 0 32.65k 0 # Query size 294.90M 14 783 158.14 621.67 149.04 107.34 # InnoDB: # IO r bytes 14.81G 0 1.10G 8.69k 15.96k 910.44k 0 # IO r ops 947.67k 0 70.34k 0.54 0.99 56.64 0 # IO r wait 7127s 0 266s 4ms 12ms 238ms 0 # pages distin 17.51M 1 44.28k 10.27 9.83 250.90 7.70 # queue wait 0 0 0 0 0 0 0 # rec lock wai 0 0 0 0 0 0 0 # Boolean: # Filesort 0% yes, 99% no # Full scan 0% yes, 99% no # Tmp table 1% yes, 98% no # Tmp table on 1% yes, 98% no
In this case I can see there is average time per query is 5ms; it is requiring in average 0.54 read operation per second which takes 4ms which adds up pretty well. We can also see what the query in average examines 20 rows, which means about 1 IO per 40 rows… which amounts to pretty IO bound load for me.
But average is only average. It is a lot more interesting to look at Per-query information from mk-query digest (I omit queries text for client privacy)
# Query 1: 0 QPS, 0x concurrency, ID 0x382A5F3785EB3CEE at byte 114085880 # Scores: Apdex = 1.00 [1.0], V/M = 0.02 # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 98 1753697 # Exec time 52 4337s 71us 466ms 2ms 12ms 7ms 176us # Lock time 96 69s 17us 3ms 39us 49us 27us 33us # Rows sent 21 1.57M 0 1 0.94 0.99 0.24 0.99 # Rows examine 4 1.57M 0 1 0.94 0.99 0.24 0.99 # Rows affecte 0 0 0 0 0 0 0 0 # Rows read 32 1.70M 0 3 1.02 1.96 0.33 0.99 # Bytes sent 32 2.88G 80 127.65k 1.72k 3.88k 1.23k 1.46k # Merge passes 0 0 0 0 0 0 0 0 # Tmp tables 0 0 0 0 0 0 0 0 # Tmp disk tbl 0 0 0 0 0 0 0 0 # Tmp tbl size 0 0 0 0 0 0 0 0 # Query size 61 180.07M 95 142 107.67 107.34 3.05 107.34 # InnoDB: # IO r bytes 50 7.50G 0 128.00k 4.49k 15.96k 8.37k 0 # IO r ops 50 480.18k 0 8 0.28 0.99 0.52 0 # IO r wait 55 3949s 0 466ms 2ms 12ms 7ms 0 # pages distin 65 11.43M 1 18 6.84 9.83 2.20 7.70 # queue wait 0 0 0 0 0 0 0 0 # rec lock wai 0 0 0 0 0 0 0 0 # String: # Databases # Hosts localhost # InnoDB trxID 3BBF3B55 (1/0%), 3BBF3B5A (1/0%)... 1753695 more # Last errno 0 # Users user # Query_time distribution # 1us # 10us # # 100us ################################################################ # 1ms ############# # 10ms ###### # 100ms # # 1s # 10s+
We can see this query takes 2ms to respond in average. Most of which is taken by IO and also what this query takes 0.28 IOs per query in average. It is also simple query which touches less than 1 row in average which makes it very IO bound.
So what If I am planning for load growth and need to have system handle another 1000 of such queries per second ? I will need to do another 280 reads per second which you can use to guess whenever current IO subsystem can handle it or whenever it needs an increase.
The query time distribution histogram is also very interesting here we can see this query which analyzes no more than 1 row may take up to 8 io requests (could happen due to looking to undo space etc) and can take between up to 10-100 ms. The queries which are in 100us range are ones where no IO needed to happen so such histogram also gives us a good clue how many queries needed no io, needed 1 IO which was not queued (less than 10 ms) or needed more than that.
Going from this we can also estimate the cost of such of such query. Lets assume it is restricted by IO performance (which it is in this disk) and having and the cost of system which can run 1000 IOPs
costs $500 to run per month (including leasing, power etc). Such system will be able to do 2592000000 IOPS per month and (using our 0.7 factor) we have 6480000000 queries such system can run comfortably in a month. This gives us 12960000 or about 13M queries per dollar.
As a summary it is often very helpful to take a close look at your workload and get an understanding how much your queries (at least most important ones) cost you in terms of CPU and IO. From this you can very easily understand what kind of hardware will take you to reach appropriate performance, what kind of hardware provides better balance of CPU vs IO utilization as well as as simple as how much does it cost to run a query. With Cloud Computing being hot a lot of Directors would like to know the costs in “utility” model and you do not have to be in the cloud to provide them with estimates.
The post Modeling MySQL Capacity by Measuring Resource Consumptions appeared first on MySQL Performance Blog.