pctr man page on MirBSD

Man page or keyword search:  
man Server   6113 pages
apropos Keyword Search (all sections)
Output format
MirBSD logo
[printable version]

PCTR(4)			BSD Programmer's Manual (i386)		       PCTR(4)

NAME
     pctr - driver for CPU performance counters

SYNOPSIS
     pseudo-device pctr 1

DESCRIPTION
     The pctr device provides access to the performance counters on Intel
     brand processors, and to the TSC on others.

     Intel processors have two 40-bit performance counters which can be pro-
     grammed to count events such as cache misses, branch target buffer hits,
     TLB misses, dual-issues, interrupts, pipeline flushes, and more.

     There is one ioctl call to read the status of all counters, and one ioctl
     call to program the function of each counter. All require the following
     includes:

	   #include <sys/types.h>
	   #include <machine/cpu.h>
	   #include <machine/pctr.h>

     The current state of all counters can be read with the PCIOCRD ioctl,
     which takes an argument of type struct pctrst:

	   #define PCTR_NUM 2
	   struct pctrst {
		   u_int pctr_fn[PCTR_NUM];
		   pctrval pctr_tsc;
		   pctrval pctr_hwc[PCTR_NUM];
		   pctrval pctr_idl;
	   };

     In this structure, ctr_fn contains the functions of the two counters, as
     previously set by the PCIOCS0 and PCIOCS1 ioctls (see below). pctr_hwc
     contains the actual value of the two hardware counters. pctr_tsc is a
     free-running, 64-bit cycle counter. Finally, pctr_idl is a 64-bit count
     of idle-loop iterations.

     The functions of the two counters can be programmed with ioctls PCIOCS0
     and PCIOCS1, which require a writeable file descriptor and take an argu-
     ment of type unsigned int . The meaning of this integer is dependent on
     the particular CPU.

  Time stamp counter
     The time stamp counter is available on all machines with Pentium and Pen-
     tium Pro counters, as well as on some 486s and non-intel CPUs. It is set
     to zero at boot time, and then increments with each cycle. Because the
     counter is 64-bits wide, it does not overflow.

     The time stamp counter can be read directly from user-mode using the
     rdtsc() macro, which returns a 64-bit value of type pctrval. The follow-
     ing example illustrates a simple use of rdtsc to measure the execution
     time of a hypothetical subroutine called functionx():

	   void
	   time_functionx(void)
	   {
		   pctrval tsc;

		   tsc = rdtsc();
		   functionx();
		   tsc = rdtsc() - tsc;
		   printf ("Functionx took %qd cycles.\n", tsc);
	   }

     The value of the time stamp counter is also returned by the PCIOCRD
     ioctl, so that one can get an exact timestamp on readings of the hardware
     event counters.

  Pentium counters
     The Pentium counters are programmed with a 9 bit function. The top three
     bits contain the following flags:

     P5CTR_K  Enables counting of events that occur in kernel mode.

     P5CTR_U  Enables counting of events that occur in user mode. You must set
	      at least one of P5CTR_U and P5CTR_K to count anything.

     P5CTR_C  When this flag is set, the counter attempts to count the number
	      of cycles spent servicing a particular event, rather than simply
	      the number of occurrences of that event.

     The bottom 6 bits set the particular event counted. Here is the event
     type of each permissible value for the bottom 6 bits of the counter func-
     tion:

	   0x00	 Data read
	   0x01	 Data write
	   0x02	 Data TLB miss
	   0x03	 Data read miss
	   0x04	 Data write miss
	   0x05	 Write (hit) to M or E state lines
	   0x06	 Data cache lines written back
	   0x07	 Data cache snoops
	   0x08	 Data cache snoop hits
	   0x09	 Memory accesses in both pipes
	   0x0a	 Bank conflicts
	   0x0b	 Misaligned data memory references
	   0x0c	 Code read
	   0x0d	 Code TLB miss
	   0x0e	 Code cache miss
	   0x0f	 Any segment register load
	   0x12	 Branches
	   0x13	 BTB hits
	   0x14	 Taken branch or BTB hit
	   0x15	 Pipeline flushes
	   0x16	 Instructions executed
	   0x17	 Instructions executed in the V-pipe
	   0x18	 Bus utilization (clocks)
	   0x19	 Pipeline stalled by write backup
	   0x1a	 Pipeline stalled by data memory read
	   0x1b	 Pipeline stalled by write to E or M line
	   0x1c	 Locked bus cycle
	   0x1d	 I/O read or write cycle
	   0x1e	 Non-cacheable memory references
	   0x1f	 AGI (Address Generation Interlock)
	   0x22	 Floating-point operations
	   0x23	 Breakpoint 0 match
	   0x24	 Breakpoint 1 match
	   0x25	 Breakpoint 2 match
	   0x26	 Breakpoint 3 match
	   0x27	 Hardware interrupts
	   0x28	 Data read or data write
	   0x29	 Data read miss or data write miss

  Pentium Pro counters
     The Pentium Pro counter functions contain several parts. The most signi-
     ficant byte (an 8-bit integer shifted left by P6CTR_CM_SHIFT) contains a
     counter mask . If non-zero, this sets a threshold for the number of times
     an event must occur in one cycle for the counter to be incremented. The
     counter mask can therefore be used to count cycles in which an event oc-
     curs at least some number of times. The next byte contains several flags:

     P6CTR_U   Enables counting of events that occur in user mode.

     P6CTR_K   Enables counting of events that occur in kernel mode. You must
	       set at least one of P6CTR_K and P6CTR_U to count anything.

     P6CTR_E   Counts edges rather than cycles. For some functions this allows
	       you to get an estimate of the number of events rather than the
	       number of cycles occupied by those events.

     P6CTR_EN  Enable counters. This bit must be set in the function for
	       counter 0 in order for either of the counters to be enabled.
	       This bit should probably be set in counter 1 as well.

     P6CTR_I   Inverts the sense of the counter mask . When this bit is set,
	       the counter only increments on cycles in which there are no
	       more events than specified in the counter mask.

     The next byte, also known as the unit mask, contains flags specific to
     the event being counted. For events dealing with the L2 cache, the fol-
     lowing flags are valid:

     P6CTR_UM_M	 Count events involving modified cache lines.

     P6CTR_UM_E	 Count events involving exclusive cache lines.

     P6CTR_UM_S	 Count events involving shared cache lines.

     P6CTR_UM_I	 Count events involving invalid cache lines.
     To measure all L2 cache activity, all these bits should be set. They can
     be set with the macro P6CTR_UM_MESI which contains the bitwise or of all
     of the above.

     For event types dealing with bus transactions, there is another flag that
     can be set in the unit mask:

     P6CTR_UM_A	 Count all appropriate bus events, not just those initiated by
		 the processor.

     Finally, the least significant byte of the counter function is the event
     type to count. The following values are available:

     0x03 LD_BLOCKS
	   Number of store buffer blocks.
     0x04 SB_DRAINS
	   Number of store buffer drain cycles.
     0x05 MISALIGN_MEM_REF
	   Number of misaligned data memory references.
     0x06 SEGMENT_REG_LOADS
	   Number of segment register loads.
     0x10 FP_COMP_OPS_EXE (ctr0 only)
	   Number of computational floating-point operations executed.
     0x11 FP_ASSIST (ctr1 only)
	   Number of floating-point exception cases handled by microcode.
     0x12 MUL (ctr1 only)
	   Number of multiplies.
     0x13 DIV (ctr1 only)
	   Number of divides.
     0x14 CYCLES_DIV_BUSY (ctr0 only)
	   Number of cycles during which the divider is busy.
     0x21 L2_ADS
	   Number of L2 address strobes.
     0x22 L2_DBUS_BUSY
	   Number of cycles during which the data bus was busy.
     0x23 L2_DBUS_BUSY_RD
	   Number of cycles during which the data bus was busy transferring
	   data from L2 to the processor.
     0x24 L2_LINES_IN
	   Number of lines allocated in the L2.
     0x25 L2_M_LINES_INM
	   Number of modified lines allocated in the L2.
     0x26 L2_LINES_OUT
	   Number of lines removed from the L2 for any reason.
     0x27 L2_M_LINES_OUTM
	   Number of modified lines removed from the L2 for any reason.
     0x28 L2_IFETCH/mesi
	   Number of L2 instruction fetches.
     0x29 L2_LD/mesi
	   Number of L2 data loads.
     0x2a L2_ST/mesi
	   Number of L2 data stores.
     0x2e L2_RQSTS/mesi
	   Number of L2 requests.
     0x43 DATA_MEM_REFS
	   All memory references, both cacheable and non-cacheable.
     0x45 DCU_LINES_IN
	   Total lines allocated in the DCU.
     0x46 DCU_M_LINES_IN
	   Number of M state lines allocated in the DCU.
     0x47 DCU_M_LINES_OUT
	   Number of M state lines evicted from the DCU. This includes evic-
	   tions via snoop HITM, intervention or replacement.
     0x48 DCU_MISS_OUTSTANDING
	   Weighted number of cycles while a DCU miss is outstanding.
     0x60 BUS_REQ_OUTSTANDING
	   Number of bus requests outstanding.
     0x61 BUS_BNR_DRV
	   Number of bus clock cycles during which the processor is driving
	   the BNR pin.
     0x62 BUS_DRDY_CLOCKS/a
	   Number of clocks during which DRDY is asserted.
     0x63 BUS_LOCK_CLOCKS/a
	   Number of clocks during which LOCK is asserted.
     0x64 BUS_DATA_RCV
	   Number of bus clock cycles during which the processor is receiving
	   data.
     0x65 BUS_TRAN_BRD/a
	   Number of burst read transactions.
     0x66 BUS_TRAN_RFO/a
	   Number of read for ownership transactions.
     0x67 BUS_TRANS_WB/a
	   Number of write back transactions.
     0x68 BUS_TRAN_IFETCH/a
	   Number of instruction fetch transactions.
     0x69 BUS_TRAN_INVAL/a
	   Number of invalidate transactions.
     0x6a BUS_TRAN_PWR/a
	   Number of partial write transactions.
     0x6b BUS_TRANS_P/a
	   Number of partial transactions.
     0x6c BUS_TRANS_IO/a
	   Number of I/O transactions.
     0x6d BUS_TRAN_DEF/a
	   Number of deferred transactions.
     0x6e BUS_TRAN_BURST/a
	   Number of burst transactions.
     0x6f BUS_TRAN_MEM/a
	   Number of memory transactions.
     0x70 BUS_TRAN_ANY/a
	   Number of all transactions.
     0x79 CPU_CLK_UNHALTED
	   Number of cycles during which the processor is not halted.
     0x7a BUS_HIT_DRV
	   Number of bus clock cycles during which the processor is driving
	   the HIT pin.
     0x7b BUS_HITM_DRV
	   Number of bus clock cycles during which the processor is driving
	   the HITM pin.
     0x7e BUS_SNOOP_STALL
	   Number of clock cycles during which the bus is snoop stalled.
     0x80 IFU_IFETCH
	   Number of instruction fetches, both cacheable and non-cacheable.
     0x81 IFU_IFETCH_MISS
	   Number of instruction fetch misses.
     0x85 ITLB_MISS
	   Number of ITLB misses.
     0x86 IFU_MEM_STALL
	   Number of cycles that the instruction fetch pipe stage is stalled,
	   including cache misses, ITLB misses, ITLB faults, and victim cache
	   evictions.
     0x87 ILD_STALL
	   Number of cycles that the instruction length decoder is stalled.
     0xa2 RESOURCE_STALLS
	   Number of cycles during which there are resource-related stalls.
     0xc0 INST_RETIRED
	   Number of instructions retired.
     0xc1 FLOPS (ctr0 only)
	   Number of computational floating-point operations retired.
     0xc2 UOPS_RETIRED
	   Number of UOPs retired.
     0xc4 BR_INST_RETIRED
	   Number of branch instructions retired.
     0xc5 BR_MISS_PRED_RETIRED
	   Number of mispredicted branches retired.
     0xc6 CYCLES_INT_MASKED
	   Number of processor cycles for which interrupts are disabled.
     0xc7 CYCLES_INT_PENDING_AND_MASKED
	   Number of processor cycles for which interrupts are disabled and
	   interrupts are pending.
     0xc8 HW_INT_RX
	   Number of hardware interrupts received.
     0xc9 BR_TAKEN_RETIRED
	   Number of taken branches retired.
     0xca BR_MISS_PRED_TAKEN_RET
	   Number of taken mispredicted branches retired.
     0xd0 INST_DECODER
	   Number of instructions decoded.
     0xd2 PARTIAL_RAT_STALLS
	   Number of cycles or events for partial stalls.
     0xe0 BR_INST_DECODED
	   Number of branch instructions decoded.
     0xe2 BTB_MISSES
	   Number of branches that miss the BTB.
     0xe4 BR_BOGUS
	   Number of bogus branches.
     0xe6 BACLEARS
	   Number of times BACLEAR is asserted.

     Events marked /mesi require the P6CTR_UM_[MESI] bits in the unit mask .
     Events marked /a can take the P6CTR_UM_A bit.

     Unlike the Pentium counters, the Pentium Pro counters can be read direct-
     ly from user-mode without need to invoke the kernel. The macro rdpmc(ctr)
     takes 0 or 1 as an argument to specify a counter, and returns that
     counter's 40-bit value (which will be of type pctrval). This is generally
     preferable to making a system call as it introduces less distortion in
     measurements. However, you should be aware of the possibility of an in-
     terrupt between invocations of rdpmc() and/or rdtsc().

FILES
     /dev/pctr

ERRORS
     [ENODEV]  An attempt was made to set the counter functions on a CPU that
	       does not support counters.

     [EINVAL]  An invalid counter function was provided as an argument to the
	       PCIOCS0 or PCIOCS1 ioctl.

     [EPERM]   An attempt was made to set the counter functions, but the dev-
	       ice was not open for writing.

SEE ALSO
     pctr(1), ioctl(2)

HISTORY
     A pctr device first appeared in OpenBSD 2.0.

AUTHORS
     The pctr device was written by David Mazieres <dm@lcs.mit.edu>.

BUGS
     Not all counter functions are completely accurate. Some of the functions
     don't seem to make any sense at all.

MirOS BSD #10-current	       August 15, 1996				     5
[top]

List of man pages available for MirBSD

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net