Package cherrypy :: Package lib :: Module cpstats
[hide private]
[frames] | no frames]

Source Code for Module cherrypy.lib.cpstats

  1  """CPStats, a package for collecting and reporting on program statistics. 
  2   
  3  Overview 
  4  ======== 
  5   
  6  Statistics about program operation are an invaluable monitoring and debugging 
  7  tool. Unfortunately, the gathering and reporting of these critical values is 
  8  usually ad-hoc. This package aims to add a centralized place for gathering 
  9  statistical performance data, a structure for recording that data which 
 10  provides for extrapolation of that data into more useful information, 
 11  and a method of serving that data to both human investigators and 
 12  monitoring software. Let's examine each of those in more detail. 
 13   
 14  Data Gathering 
 15  -------------- 
 16   
 17  Just as Python's `logging` module provides a common importable for gathering 
 18  and sending messages, performance statistics would benefit from a similar 
 19  common mechanism, and one that does *not* require each package which wishes 
 20  to collect stats to import a third-party module. Therefore, we choose to 
 21  re-use the `logging` module by adding a `statistics` object to it. 
 22   
 23  That `logging.statistics` object is a nested dict. It is not a custom class, 
 24  because that would 1) require libraries and applications to import a third- 
 25  party module in order to participate, 2) inhibit innovation in extrapolation 
 26  approaches and in reporting tools, and 3) be slow. There are, however, some 
 27  specifications regarding the structure of the dict. 
 28   
 29      { 
 30     +----"SQLAlchemy": { 
 31     |        "Inserts": 4389745, 
 32     |        "Inserts per Second": 
 33     |            lambda s: s["Inserts"] / (time() - s["Start"]), 
 34     |  C +---"Table Statistics": { 
 35     |  o |        "widgets": {-----------+ 
 36   N |  l |            "Rows": 1.3M,      | Record 
 37   a |  l |            "Inserts": 400,    | 
 38   m |  e |        },---------------------+ 
 39   e |  c |        "froobles": { 
 40   s |  t |            "Rows": 7845, 
 41   p |  i |            "Inserts": 0, 
 42   a |  o |        }, 
 43   c |  n +---}, 
 44   e |        "Slow Queries": 
 45     |            [{"Query": "SELECT * FROM widgets;", 
 46     |              "Processing Time": 47.840923343, 
 47     |              }, 
 48     |             ], 
 49     +----}, 
 50      } 
 51   
 52  The `logging.statistics` dict has four levels. The topmost level is nothing 
 53  more than a set of names to introduce modularity, usually along the lines of 
 54  package names. If the SQLAlchemy project wanted to participate, for example, 
 55  it might populate the item `logging.statistics['SQLAlchemy']`, whose value 
 56  would be a second-layer dict we call a "namespace". Namespaces help multiple 
 57  packages to avoid collisions over key names, and make reports easier to read, 
 58  to boot. The maintainers of SQLAlchemy should feel free to use more than one 
 59  namespace if needed (such as 'SQLAlchemy ORM'). Note that there are no case 
 60  or other syntax constraints on the namespace names; they should be chosen 
 61  to be maximally readable by humans (neither too short nor too long). 
 62   
 63  Each namespace, then, is a dict of named statistical values, such as 
 64  'Requests/sec' or 'Uptime'. You should choose names which will look 
 65  good on a report: spaces and capitalization are just fine. 
 66   
 67  In addition to scalars, values in a namespace MAY be a (third-layer) 
 68  dict, or a list, called a "collection". For example, the CherryPy StatsTool 
 69  keeps track of what each request is doing (or has most recently done) 
 70  in a 'Requests' collection, where each key is a thread ID; each 
 71  value in the subdict MUST be a fourth dict (whew!) of statistical data about 
 72  each thread. We call each subdict in the collection a "record". Similarly, 
 73  the StatsTool also keeps a list of slow queries, where each record contains 
 74  data about each slow query, in order. 
 75   
 76  Values in a namespace or record may also be functions, which brings us to: 
 77   
 78  Extrapolation 
 79  ------------- 
 80   
 81  The collection of statistical data needs to be fast, as close to unnoticeable 
 82  as possible to the host program. That requires us to minimize I/O, for example, 
 83  but in Python it also means we need to minimize function calls. So when you 
 84  are designing your namespace and record values, try to insert the most basic 
 85  scalar values you already have on hand. 
 86   
 87  When it comes time to report on the gathered data, however, we usually have 
 88  much more freedom in what we can calculate. Therefore, whenever reporting 
 89  tools (like the provided StatsPage CherryPy class) fetch the contents of 
 90  `logging.statistics` for reporting, they first call `extrapolate_statistics` 
 91  (passing the whole `statistics` dict as the only argument). This makes a 
 92  deep copy of the statistics dict so that the reporting tool can both iterate 
 93  over it and even change it without harming the original. But it also expands 
 94  any functions in the dict by calling them. For example, you might have a 
 95  'Current Time' entry in the namespace with the value "lambda scope: time.time()". 
 96  The "scope" parameter is the current namespace dict (or record, if we're 
 97  currently expanding one of those instead), allowing you access to existing 
 98  static entries. If you're truly evil, you can even modify more than one entry 
 99  at a time. 
100   
101  However, don't try to calculate an entry and then use its value in further 
102  extrapolations; the order in which the functions are called is not guaranteed. 
103  This can lead to a certain amount of duplicated work (or a redesign of your 
104  schema), but that's better than complicating the spec. 
105   
106  After the whole thing has been extrapolated, it's time for: 
107   
108  Reporting 
109  --------- 
110   
111  The StatsPage class grabs the `logging.statistics` dict, extrapolates it all, 
112  and then transforms it to HTML for easy viewing. Each namespace gets its own 
113  header and attribute table, plus an extra table for each collection. This is 
114  NOT part of the statistics specification; other tools can format how they like. 
115   
116  You can control which columns are output and how they are formatted by updating 
117  StatsPage.formatting, which is a dict that mirrors the keys and nesting of 
118  `logging.statistics`. The difference is that, instead of data values, it has 
119  formatting values. Use None for a given key to indicate to the StatsPage that a 
120  given column should not be output. Use a string with formatting (such as '%.3f') 
121  to interpolate the value(s), or use a callable (such as lambda v: v.isoformat()) 
122  for more advanced formatting. Any entry which is not mentioned in the formatting 
123  dict is output unchanged. 
124   
125  Monitoring 
126  ---------- 
127   
128  Although the HTML output takes pains to assign unique id's to each <td> with 
129  statistical data, you're probably better off fetching /cpstats/data, which 
130  outputs the whole (extrapolated) `logging.statistics` dict in JSON format. 
131  That is probably easier to parse, and doesn't have any formatting controls, 
132  so you get the "original" data in a consistently-serialized format. 
133  Note: there's no treatment yet for datetime objects. Try time.time() instead 
134  for now if you can. Nagios will probably thank you. 
135   
136  Turning Collection Off 
137  ---------------------- 
138   
139  It is recommended each namespace have an "Enabled" item which, if False, 
140  stops collection (but not reporting) of statistical data. Applications 
141  SHOULD provide controls to pause and resume collection by setting these 
142  entries to False or True, if present. 
143   
144   
145  Usage 
146  ===== 
147   
148  To collect statistics on CherryPy applications: 
149   
150      from cherrypy.lib import cpstats 
151      appconfig['/']['tools.cpstats.on'] = True 
152   
153  To collect statistics on your own code: 
154   
155      import logging 
156      # Initialize the repository 
157      if not hasattr(logging, 'statistics'): logging.statistics = {} 
158      # Initialize my namespace 
159      mystats = logging.statistics.setdefault('My Stuff', {}) 
160      # Initialize my namespace's scalars and collections 
161      mystats.update({ 
162          'Enabled': True, 
163          'Start Time': time.time(), 
164          'Important Events': 0, 
165          'Events/Second': lambda s: ( 
166              (s['Important Events'] / (time.time() - s['Start Time']))), 
167          }) 
168      ... 
169      for event in events: 
170          ... 
171          # Collect stats 
172          if mystats.get('Enabled', False): 
173              mystats['Important Events'] += 1 
174   
175  To report statistics: 
176   
177      root.cpstats = cpstats.StatsPage() 
178   
179  To format statistics reports: 
180   
181      See 'Reporting', above. 
182   
183  """ 
184   
185  # -------------------------------- Statistics -------------------------------- # 
186   
187  import logging 
188  if not hasattr(logging, 'statistics'): logging.statistics = {} 
189   
190 -def extrapolate_statistics(scope):
191 """Return an extrapolated copy of the given scope.""" 192 c = {} 193 for k, v in list(scope.items()): 194 if isinstance(v, dict): 195 v = extrapolate_statistics(v) 196 elif isinstance(v, (list, tuple)): 197 v = [extrapolate_statistics(record) for record in v] 198 elif hasattr(v, '__call__'): 199 v = v(scope) 200 c[k] = v 201 return c
202 203 204 # --------------------- CherryPy Applications Statistics --------------------- # 205 206 import threading 207 import time 208 209 import cherrypy 210 211 appstats = logging.statistics.setdefault('CherryPy Applications', {}) 212 appstats.update({ 213 'Enabled': True, 214 'Bytes Read/Request': lambda s: (s['Total Requests'] and 215 (s['Total Bytes Read'] / float(s['Total Requests'])) or 0.0), 216 'Bytes Read/Second': lambda s: s['Total Bytes Read'] / s['Uptime'](s), 217 'Bytes Written/Request': lambda s: (s['Total Requests'] and 218 (s['Total Bytes Written'] / float(s['Total Requests'])) or 0.0), 219 'Bytes Written/Second': lambda s: s['Total Bytes Written'] / s['Uptime'](s), 220 'Current Time': lambda s: time.time(), 221 'Current Requests': 0, 222 'Requests/Second': lambda s: float(s['Total Requests']) / s['Uptime'](s), 223 'Server Version': cherrypy.__version__, 224 'Start Time': time.time(), 225 'Total Bytes Read': 0, 226 'Total Bytes Written': 0, 227 'Total Requests': 0, 228 'Total Time': 0, 229 'Uptime': lambda s: time.time() - s['Start Time'], 230 'Requests': {}, 231 }) 232 233 proc_time = lambda s: time.time() - s['Start Time'] 234 235
236 -class ByteCountWrapper(object):
237 """Wraps a file-like object, counting the number of bytes read.""" 238
239 - def __init__(self, rfile):
240 self.rfile = rfile 241 self.bytes_read = 0
242
243 - def read(self, size=-1):
244 data = self.rfile.read(size) 245 self.bytes_read += len(data) 246 return data
247
248 - def readline(self, size=-1):
249 data = self.rfile.readline(size) 250 self.bytes_read += len(data) 251 return data
252
253 - def readlines(self, sizehint=0):
254 # Shamelessly stolen from StringIO 255 total = 0 256 lines = [] 257 line = self.readline() 258 while line: 259 lines.append(line) 260 total += len(line) 261 if 0 < sizehint <= total: 262 break 263 line = self.readline() 264 return lines
265
266 - def close(self):
267 self.rfile.close()
268
269 - def __iter__(self):
270 return self
271
272 - def next(self):
273 data = self.rfile.next() 274 self.bytes_read += len(data) 275 return data
276 277 278 average_uriset_time = lambda s: s['Count'] and (s['Sum'] / s['Count']) or 0 279 280
281 -class StatsTool(cherrypy.Tool):
282 """Record various information about the current request.""" 283
284 - def __init__(self):
285 cherrypy.Tool.__init__(self, 'on_end_request', self.record_stop)
286
287 - def _setup(self):
288 """Hook this tool into cherrypy.request. 289 290 The standard CherryPy request object will automatically call this 291 method when the tool is "turned on" in config. 292 """ 293 if appstats.get('Enabled', False): 294 cherrypy.Tool._setup(self) 295 self.record_start()
296
297 - def record_start(self):
298 """Record the beginning of a request.""" 299 request = cherrypy.serving.request 300 if not hasattr(request.rfile, 'bytes_read'): 301 request.rfile = ByteCountWrapper(request.rfile) 302 request.body.fp = request.rfile 303 304 r = request.remote 305 306 appstats['Current Requests'] += 1 307 appstats['Total Requests'] += 1 308 appstats['Requests'][threading._get_ident()] = { 309 'Bytes Read': None, 310 'Bytes Written': None, 311 # Use a lambda so the ip gets updated by tools.proxy later 312 'Client': lambda s: '%s:%s' % (r.ip, r.port), 313 'End Time': None, 314 'Processing Time': proc_time, 315 'Request-Line': request.request_line, 316 'Response Status': None, 317 'Start Time': time.time(), 318 }
319
320 - def record_stop(self, uriset=None, slow_queries=1.0, slow_queries_count=100, 321 debug=False, **kwargs):
322 """Record the end of a request.""" 323 resp = cherrypy.serving.response 324 w = appstats['Requests'][threading._get_ident()] 325 326 r = cherrypy.request.rfile.bytes_read 327 w['Bytes Read'] = r 328 appstats['Total Bytes Read'] += r 329 330 if resp.stream: 331 w['Bytes Written'] = 'chunked' 332 else: 333 cl = int(resp.headers.get('Content-Length', 0)) 334 w['Bytes Written'] = cl 335 appstats['Total Bytes Written'] += cl 336 337 w['Response Status'] = getattr(resp, 'output_status', None) or resp.status 338 339 w['End Time'] = time.time() 340 p = w['End Time'] - w['Start Time'] 341 w['Processing Time'] = p 342 appstats['Total Time'] += p 343 344 appstats['Current Requests'] -= 1 345 346 if debug: 347 cherrypy.log('Stats recorded: %s' % repr(w), 'TOOLS.CPSTATS') 348 349 if uriset: 350 rs = appstats.setdefault('URI Set Tracking', {}) 351 r = rs.setdefault(uriset, { 352 'Min': None, 'Max': None, 'Count': 0, 'Sum': 0, 353 'Avg': average_uriset_time}) 354 if r['Min'] is None or p < r['Min']: 355 r['Min'] = p 356 if r['Max'] is None or p > r['Max']: 357 r['Max'] = p 358 r['Count'] += 1 359 r['Sum'] += p 360 361 if slow_queries and p > slow_queries: 362 sq = appstats.setdefault('Slow Queries', []) 363 sq.append(w.copy()) 364 if len(sq) > slow_queries_count: 365 sq.pop(0)
366 367 368 import cherrypy 369 cherrypy.tools.cpstats = StatsTool() 370 371 372 # ---------------------- CherryPy Statistics Reporting ---------------------- # 373 374 import os 375 thisdir = os.path.abspath(os.path.dirname(__file__)) 376 377 try: 378 import json 379 except ImportError: 380 try: 381 import simplejson as json 382 except ImportError: 383 json = None 384 385 386 missing = object() 387 388 locale_date = lambda v: time.strftime('%c', time.gmtime(v)) 389 iso_format = lambda v: time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(v)) 390
391 -def pause_resume(ns):
392 def _pause_resume(enabled): 393 pause_disabled = '' 394 resume_disabled = '' 395 if enabled: 396 resume_disabled = 'disabled="disabled" ' 397 else: 398 pause_disabled = 'disabled="disabled" ' 399 return """ 400 <form action="pause" method="POST" style="display:inline"> 401 <input type="hidden" name="namespace" value="%s" /> 402 <input type="submit" value="Pause" %s/> 403 </form> 404 <form action="resume" method="POST" style="display:inline"> 405 <input type="hidden" name="namespace" value="%s" /> 406 <input type="submit" value="Resume" %s/> 407 </form> 408 """ % (ns, pause_disabled, ns, resume_disabled)
409 return _pause_resume 410 411
412 -class StatsPage(object):
413 414 formatting = { 415 'CherryPy Applications': { 416 'Enabled': pause_resume('CherryPy Applications'), 417 'Bytes Read/Request': '%.3f', 418 'Bytes Read/Second': '%.3f', 419 'Bytes Written/Request': '%.3f', 420 'Bytes Written/Second': '%.3f', 421 'Current Time': iso_format, 422 'Requests/Second': '%.3f', 423 'Start Time': iso_format, 424 'Total Time': '%.3f', 425 'Uptime': '%.3f', 426 'Slow Queries': { 427 'End Time': None, 428 'Processing Time': '%.3f', 429 'Start Time': iso_format, 430 }, 431 'URI Set Tracking': { 432 'Avg': '%.3f', 433 'Max': '%.3f', 434 'Min': '%.3f', 435 'Sum': '%.3f', 436 }, 437 'Requests': { 438 'Bytes Read': '%s', 439 'Bytes Written': '%s', 440 'End Time': None, 441 'Processing Time': '%.3f', 442 'Start Time': None, 443 }, 444 }, 445 'CherryPy WSGIServer': { 446 'Enabled': pause_resume('CherryPy WSGIServer'), 447 'Connections/second': '%.3f', 448 'Start time': iso_format, 449 }, 450 } 451 452
453 - def index(self):
454 # Transform the raw data into pretty output for HTML 455 yield """ 456 <html> 457 <head> 458 <title>Statistics</title> 459 <style> 460 461 th, td { 462 padding: 0.25em 0.5em; 463 border: 1px solid #666699; 464 } 465 466 table { 467 border-collapse: collapse; 468 } 469 470 table.stats1 { 471 width: 100%; 472 } 473 474 table.stats1 th { 475 font-weight: bold; 476 text-align: right; 477 background-color: #CCD5DD; 478 } 479 480 table.stats2, h2 { 481 margin-left: 50px; 482 } 483 484 table.stats2 th { 485 font-weight: bold; 486 text-align: center; 487 background-color: #CCD5DD; 488 } 489 490 </style> 491 </head> 492 <body> 493 """ 494 for title, scalars, collections in self.get_namespaces(): 495 yield """ 496 <h1>%s</h1> 497 498 <table class='stats1'> 499 <tbody> 500 """ % title 501 for i, (key, value) in enumerate(scalars): 502 colnum = i % 3 503 if colnum == 0: yield """ 504 <tr>""" 505 yield """ 506 <th>%(key)s</th><td id='%(title)s-%(key)s'>%(value)s</td>""" % vars() 507 if colnum == 2: yield """ 508 </tr>""" 509 510 if colnum == 0: yield """ 511 <th></th><td></td> 512 <th></th><td></td> 513 </tr>""" 514 elif colnum == 1: yield """ 515 <th></th><td></td> 516 </tr>""" 517 yield """ 518 </tbody> 519 </table>""" 520 521 for subtitle, headers, subrows in collections: 522 yield """ 523 <h2>%s</h2> 524 <table class='stats2'> 525 <thead> 526 <tr>""" % subtitle 527 for key in headers: 528 yield """ 529 <th>%s</th>""" % key 530 yield """ 531 </tr> 532 </thead> 533 <tbody>""" 534 for subrow in subrows: 535 yield """ 536 <tr>""" 537 for value in subrow: 538 yield """ 539 <td>%s</td>""" % value 540 yield """ 541 </tr>""" 542 yield """ 543 </tbody> 544 </table>""" 545 yield """ 546 </body> 547 </html> 548 """
549 index.exposed = True 550
551 - def get_namespaces(self):
552 """Yield (title, scalars, collections) for each namespace.""" 553 s = extrapolate_statistics(logging.statistics) 554 for title, ns in sorted(s.items()): 555 scalars = [] 556 collections = [] 557 ns_fmt = self.formatting.get(title, {}) 558 for k, v in sorted(ns.items()): 559 fmt = ns_fmt.get(k, {}) 560 if isinstance(v, dict): 561 headers, subrows = self.get_dict_collection(v, fmt) 562 collections.append((k, ['ID'] + headers, subrows)) 563 elif isinstance(v, (list, tuple)): 564 headers, subrows = self.get_list_collection(v, fmt) 565 collections.append((k, headers, subrows)) 566 else: 567 format = ns_fmt.get(k, missing) 568 if format is None: 569 # Don't output this column. 570 continue 571 if hasattr(format, '__call__'): 572 v = format(v) 573 elif format is not missing: 574 v = format % v 575 scalars.append((k, v)) 576 yield title, scalars, collections
577
578 - def get_dict_collection(self, v, formatting):
579 """Return ([headers], [rows]) for the given collection.""" 580 # E.g., the 'Requests' dict. 581 headers = [] 582 for record in v.itervalues(): 583 for k3 in record: 584 format = formatting.get(k3, missing) 585 if format is None: 586 # Don't output this column. 587 continue 588 if k3 not in headers: 589 headers.append(k3) 590 headers.sort() 591 592 subrows = [] 593 for k2, record in sorted(v.items()): 594 subrow = [k2] 595 for k3 in headers: 596 v3 = record.get(k3, '') 597 format = formatting.get(k3, missing) 598 if format is None: 599 # Don't output this column. 600 continue 601 if hasattr(format, '__call__'): 602 v3 = format(v3) 603 elif format is not missing: 604 v3 = format % v3 605 subrow.append(v3) 606 subrows.append(subrow) 607 608 return headers, subrows
609
610 - def get_list_collection(self, v, formatting):
611 """Return ([headers], [subrows]) for the given collection.""" 612 # E.g., the 'Slow Queries' list. 613 headers = [] 614 for record in v: 615 for k3 in record: 616 format = formatting.get(k3, missing) 617 if format is None: 618 # Don't output this column. 619 continue 620 if k3 not in headers: 621 headers.append(k3) 622 headers.sort() 623 624 subrows = [] 625 for record in v: 626 subrow = [] 627 for k3 in headers: 628 v3 = record.get(k3, '') 629 format = formatting.get(k3, missing) 630 if format is None: 631 # Don't output this column. 632 continue 633 if hasattr(format, '__call__'): 634 v3 = format(v3) 635 elif format is not missing: 636 v3 = format % v3 637 subrow.append(v3) 638 subrows.append(subrow) 639 640 return headers, subrows
641 642 if json is not None:
643 - def data(self):
644 s = extrapolate_statistics(logging.statistics) 645 cherrypy.response.headers['Content-Type'] = 'application/json' 646 return json.dumps(s, sort_keys=True, indent=4)
647 data.exposed = True 648
649 - def pause(self, namespace):
650 logging.statistics.get(namespace, {})['Enabled'] = False 651 raise cherrypy.HTTPRedirect('./')
652 pause.exposed = True 653 pause.cp_config = {'tools.allow.on': True, 654 'tools.allow.methods': ['POST']} 655
656 - def resume(self, namespace):
657 logging.statistics.get(namespace, {})['Enabled'] = True 658 raise cherrypy.HTTPRedirect('./')
659 resume.exposed = True 660 resume.cp_config = {'tools.allow.on': True, 661 'tools.allow.methods': ['POST']}
662