kprofile man page on Tru64

Man page or keyword search:  
man Server   12896 pages
apropos Keyword Search (all sections)
Output format
Tru64 logo
[printable version]

uprofile(1)							   uprofile(1)

NAME
       uprofile,  kprofile - Profile a program (uprofile) or kernel (kprofile)
       with Alpha on-chip performance counters

SYNOPSIS
       uprofile [-v] [-quiet] [-dirname path] [-[no]pids] [-all	  |  -each   |
       -one]  [-stride n]  [-average]  [-pixie]	 [-display   | prof-option...]
       [statistic...] program [argument...]

       kprofile [-v] [-quiet] [-dirname path] [-[no]pids] [-all	  |  -each   |
       -one]  [-stride n]  [-average]  [-pixie]	 [-display   | prof-option...]
       [-k kernel_name] [-t] [-ra] [statistic...] [program [argument...]]

DESCRIPTION
       See prof_intro(1) for an introduction to	 the  application  performance
       tuning tools provided with Tru64 UNIX.

       The  uprofile  command  uses  the Alpha on-chip performance counters to
       produce a finely-grained program-counter profile of a user program. The
       command	runs  the  program you specify with the arguments you specify,
       collecting the selected statistics on the  program's  process  and  its
       descendants.  It	 writes	 the  profile  data  to	 the umon.out file, by
       default. If the program calls shared libraries, those libraries are not
       profiled.

       The  kprofile  command  uses  the Alpha on-chip performance counters to
       produce a detailed program-counter profile of the kernel. If you	 spec‐
       ify  a  program, kprofile runs the program with the arguments you spec‐
       ify, and it collects the selected statistics  on	 the  kernel  for  the
       duration	 of  the program's execution. If you do not specify a program,
       kprofile collects the selected statistics on the kernel until you enter
       Ctrl/C  or the kprofile process receives a SIGTERM signal. Note that if
       SIGINT (usually generated by entering a Ctrl/C at the controlling  ter‐
       minal)  is  currently being ignored, it will continue to be ignored and
       SIGTERM must be used to terminate data collection.  kprofile writes the
       profile data to the kmon.out file, by default.

       If  you	specify	 -display or any of the prof-options, the uprofile and
       kprofile commands display the profile by runnning the prof  tool	 (with
       any specified prof-options).

       You  can also run the prof command separately, to help analyze the data
       in the umon.out or kmon.out file. The following examples	 show  how  to
       invoke the prof command to analyze data in the respective files: % prof
       a.out umon.out % prof /vmunix kmon.out

       The CPU-time profile displayed by prof will not be accurate if the  CPU
       speed of the processors that executed the application are not the same,
       as in certain multiprocessor systems containing EV67 or	later  proces‐
       sors.  The  inaccuracy may be avoided by using the hiprof (sampling) or
       cc -p/-pg profilers, or by running the application on a subset  of  the
       processors:  Select  a single processor using the runon command.	 Check
       the processor speeds using the psrinfo -v command and run the  applica‐
       tion in a processor set comprising only processors that run at the same
       speed (see processor_sets(4))

OPERANDS
       The name of an event that your particular Alpha hardware	 can  profile,
       as detailed in the STATISTICS section, below. If no statistic is named,
       machine cycles are counted, giving a CPU-time  profile.	One  statistic
       can  be	specified  for	each of the hardware counters on your machine.
       The name of the executable to run while profiling operations are	 being
       performed.   An	argument  to pass to the program that is run. Multiple
       arguments can be specified, as needed by the program.

OPTIONS
       Options	can  be	 abbreviated   to   three   characters,	  except   the
       prof-options, which can be abbreviated (usually to one character) as in
       a prof command. For example, -qui is interpreted as quiet,  but	-q  is
       interpreted  as	-quit.	(See  the  -display  option  for the supported
       prof-options.)

       For options that specify a procedure name (proc),  C++  procedures  can
       omit the argument type list, though this will match all overloaded pro‐
       cedures with that name. To select a  specific  procedure,  specify  the
       full  symbol name (as printed by the nm command). Symbol names contain‐
       ing spaces, *, and so on must be quoted.	 Engages verbose  mode,	 which
       prints  some useful information about the program being profiled.  Pre‐
       vents informational and progress messages from being  printed.	Speci‐
       fies  the  directory path in which the profiling data file or files are
       created.	 [Disables] or enables the addition of the  process-id	number
       to  the name of the profiling data file or files.  Specifies which mode
       to use for profiling on multiprocessor machines. Using the -all	option
       (the  default) aggregates the data for all CPUs into one umon.out file.
       Using the -each option collects separate	 profiles  for	each  CPU  and
       writes  the output into a set of files named umon.out.n, where n is the
       CPU number. Using the -one option profiles only the  current  CPU.  For
       the  -one  option to work, the uprofile or kprofile program must be run
       using the runon command.	 Sets the granularity of  the  sample  counts,
       where  n is the number of consecutive instructions grouped together for
       each sample count. The default is -stride  4.  The  -asm,  -heavy,  and
       -lines  prof-options  need a separate sample count for each instruction
       (for their reports to  be  precise  enough),  so	 these	options	 imply
       -stride	1.   This  makes  the  output  file four times bigger than the
       default size. The -stride argument must be a power of two (for example,
       1,  2,  4, 8).  Attempts to average samples within basic blocks so that
       each instruction within a basic block will show the same number of sam‐
       ples. Ensures fine grain profiles by setting stride to 1.  Produces and
       files similar to those produced by running an  executable  instrumented
       with  pixie  (see  pixie(1)).  Uses cycles0 statistic (freq on EV67) by
       default. Ensures fine grain profiles by setting stride to 1.  Overrides
       the  name of the kernel to profile. (The default is the booted kernel.)
       Enables triggered mode for kprofile. This option sets up	 all  required
       information  for	 running the performance counters, but does not invoke
       them. See the STATISTICS section for additional	information.   Enables
       PCNTCALLER  mode for kprofile. Collects profiling data on the caller of
       certain kernel  utility	routines  (for	example,  bcopy,  bzero,  sim‐
       ple_lock),  instead  of the routine itself.  Runs prof on the resulting
       profile data file(s). The following prof options are supported: Reports
       the  profile as an annotated disassembly.  Excludes procedure proc from
       the profile but includes its CPU time or other statistic in the	total.
       Excludes	 procedure proc from the profile and from the total.  Profiles
       source lines, printing those with the highest CPU time or other statis‐
       tic  first.  Reports the profile per source line within each procedure.
       Merges all profile data	files  into  file.   Prints  each  procedure's
       starting line number.  Includes only procedure proc in the profile, but
       totals all procedures.  Includes only procedure proc in the profile and
       in the total.  Profiles procedures, printing those with the highest CPU
       time or other statistic first.  Truncates the reports after n lines  or
       after (cumulative) n percent of the whole.

STATISTICS
       You  specify  the  statistics  that you want to collect for the program
       being profiled in one or more statistic operands.

       If you specify multiple statistics, uprofile  and  kprofile  accumulate
       their results. You cannot then view the results of any single statistic
       separately. Because collected data is  merged  into  a  single  buffer,
       interpretation of multiply collected statistics may be difficult.

       The  Alpha  architecture	 implemented  on your machine determines which
       statistics can be collected and the number of  counters	available  for
       collecting  multiple statistics at the same time. The implementation is
       indicated by the Alpha chip number, which can  be  displayed  with  the
       show  config console command before booting Tru64 UNIX, or, after boot‐
       ing, by	using  the  psrinfo  -v	 command,  or  by  calling  getsysinfo
       (GSI_PROC_TYPE).	 Also,	if  the	 uprofile command is run without argu‐
       ments, it will show how many counters and what statistics are available
       on your machine.

       All  of	the  chips  in	the  EV4  family  (21064 [EV4], 21064A [EV45],
       21066/21068 [LCA4]) have two performance	 counter  registers,  each  of
       which  can  be  separately programmed. The statistics that each counter
       can collect are shown in the following table:

       ──────────────────────────────
       Counter0Stats   Counter1Stats
       ──────────────────────────────
       0disabled       1disabled
       issues	       dcache
       pipedry	       icache
       loads	       dualissues
       pipefrozen      mispredicts
       branches	       floatops

       cycles	       intops
       PALcycles       stores
       nonissues       novictims
       victims
       ──────────────────────────────

       All of the chips in the EV5 family (21164  [EV5],  21164A  [EV56],  and
       21164PC	[PCA56])  have	three  performance  counter registers, each of
       which can be separately programmed. Some of the counters are common  to
       all  EV5	 implementations,  some are specific to EV5 and EV56, and some
       are specific to PCA56.

       The statistics that each of the common EV5  counters  can  collect  are
       shown in the following table:

       ──────────────────────────────────────────────────
       Counter0Stats   Counter1Stats   Counter2Stats
       ──────────────────────────────────────────────────
       0disabled       1disabled       2disabled
       cycles0	       nonissues       longstalls
       issues	       splitissue      pcmispredicts
		       pipedry	       branchmispredicts
		       replay	       icachemisses
		       singleissues    itbmisses
		       dualissues      dcacheldmisses
		       tripleissues    dtbmisses
		       quadissues      ldsmerged
		       flowchanges     ldureplays
		       intops	       fullreplays
		       floatops	       externalinput
		       loads	       cycles2
		       stores	       memorybarriers
		       icacheacc       lockedloads
		       dcacheacc
       ──────────────────────────────────────────────────

       The  statistics	that  each  of the EV5- and EV56-specific counters can
       collect are shown in the following table:

       ───────────────────────────────────
       Counter1Stats   Counter2Stats
       ───────────────────────────────────
       scacheacc       scachemisses
       scachereads     scachereadmisses
       scachewrites1   scachewritemisses
       scachevictim    scachesharedwrites
       bcacheref       scachewrites2
       bcachevictim    bcachemisses
       sysreqs	       systeminvalidates
		       systemreadrequests
       ───────────────────────────────────

       The statistics that each of the PCA56-specific counters can collect are
       shown in the following table:

       ──────────────────────────────────────────
       Counter1Stats	      Counter2Stats
       ──────────────────────────────────────────
       bcachereads	      bcachedreads
       bcachedreadhits	      bcachereadhits
       bcachedreadfills	      bcachereadfills
       bcachewrites	      bcachewritehits
       bcachecleanwritehits   bcachewritefills
       bcachevictims	      sysreadflushhits
       readmisstwo	      sysreadflushmisses
			      readmissthree
       ──────────────────────────────────────────

       The  EV6	 chip has two performance counter registers, each of which can
       be separately programmed. The statistics that each of the  EV6-specific
       counters can collect are shown in the following table:

       ──────────────────────────────
       Counter0Stats   Counter1Stats
       ──────────────────────────────
       0disabled       1disabled
       cycles0	       cycles1
       retinst	       retcondbranch
		       retdtb1miss
		       retdtb2miss
		       retitbmiss

		       retunaltrap
		       replay
       ──────────────────────────────

       The  default  is	 to  gather cycle statistics in the 0th counter and to
       disable other counters.

       The EV67 chip has two kinds of performance counters: traditional aggre‐
       gate  counters  and profile-me counters. The traditional aggregate sta‐
       tistics that each of the EV67-specific counters can collect  are	 shown
       in  the following table. Any one statistic or statistic combination may
       be selected.

       ──────────────────────────────
       Counter0Stats   Counter1Stats
       ──────────────────────────────
       0disabled       1disabled
       cycles0	       replay
       retinst	       cycles1
       retinst	       bcachemisses
       ──────────────────────────────

       If no aggregate statistics are selected, one profile-me	statistic  may
       be selected:

       ─────────────────────────────────────────────────────────────────────────────
       Profile-me Statistics
       ─────────────────────────────────────────────────────────────────────────────
       2disabled	     abort		 abort_per_ret	  arith_trap
       cbr_taken	     cbr_taken_per_ret	 cycles		  cycles_per_ret
       delay		     delay_per_ret	 dstream_fault	  dtb_miss
       dtb_miss_per_ret	     dtb_miss3		 dtb_miss4	  early_kill
       early_kill_per_ret    fp_disabled	 freq		  icache_miss
       icache_miss_per_ret   icache_parity	 inflt_bcache	  inflt_replays
       inflt_retires	     interrupt		 istream_accvio	  itb_miss
       ldst_order	     ldst_unalign	 map_stall	  map_stall_per_ret
       mispredict	     mispre‐		 opcdec		  replay_trap
			     dict_per_ret
       replay_trap_per_ret   retire		 trap		  trap_per_ret
       valid
       ─────────────────────────────────────────────────────────────────────────────

       The  default  is	 to  gather cycle statistics in the 0th counter and to
       disable other counters.

       For descriptions of the statistics for all EV4, EV5, and EV6  implemen‐
       tations, refer to pfm(7).

       You  can	 disable  any  counter	by specifying 0disabled, 1disabled, or
       2disabled as the counter statistic.  You can use this feature  to  iso‐
       late specific event types, such as loads, without extraneous data being
       generated. You cannot disable all counters at the same time, choose two
       statistics  for the same counter, or disable a counter once its statis‐
       tic is specified.

       When you specify no counter statistics,	uprofile  and  kprofile	 count
       cycles on counter 0 by default, and display (through prof) a profile in
       terms of seconds used by each procedure in the program, except for  any
       shared libraries.

       For noncycle statistics, the displayed profile shows the number of sam‐
       ples recorded, the sampling interval (events per second), and the total
       number of events that this implies. Most noncycle statistics of the EV5
       family CPUs are recorded about six cycles after	the  instruction  that
       triggered the sample.  So, when using prof's -asm or -lines option, the
       samples should be associated with one of the  previously	 executed  few
       instructions  of lines. The icacheacc, icachemisses, and dtbmisses sta‐
       tistics are usually attributed precisely.

       To perform a detailed analysis of short sections of  kernel  code,  use
       the  kprofile command with triggered mode (invoked with the -t option).
       When you use this mode, kprofile performs all of the required setup for
       enabling	 the  counters	as  normal,  but does not invoke them. You can
       insert counter start or stop  commands  into  the  kernel  code	to  be
       instrumented as follows:

       Turn  counters  on:   wrperfmon (PFOPT, 1) Turn counters off: wrperfmon
       (0)

       You can turn the counters on and off repeatedly to  collect  data  over
       many iterations or multiple sections of code.

       The macro PFOPT is defined in <sys/pfcntr.h>.

NOTES
       The  interrupt load that profiling places on the system may affect per‐
       formance, but usually the effect is insignificant.

       The kernel in use must have the pfm pseudo-device configured  into  it.
       To do this, use one of the following methods: Add the following line to
       the kernel configuration file, and rebuild the kernel. Do not use  this
       method  if CPU hot-swap is supported by the system, because it does not
       allow pfm to be	easily	unconfigured,  as  required  for  a  hot-swap;
       instead, use the sysconfig method below.	 pseudo-device	     pfm Enter
       the following command from the root account. Do not  configure  pfm  if
       CPU hot-swap is anticipated.  # sysconfig -c pfm

	      If  pfm  is configured, the CPU hot-swap procedure requires that
	      it be unconfigured, using the following command, before any  CPU
	      is swapped: # sysconfig -u pfm

	      The  autosysconfig program can be used to automatically load the
	      configurable pfm device at each system startup.

       The format of the data files produced by uprofile in Tru64 UNIX is dif‐
       ferent  from  the  format produced in versions of DIGITAL UNIX prior to
       Version 4.0. The Tru64 UNIX data files include the  names  of  selected
       statistics  in  profile	displays.  To  convert these data files to the
       industry-standard format, at the expense of losing  the	names  of  the
       statistics, use the pdtostd command.

RESTRICTIONS
       The EV4 victim and novictim statistics rely on the external performance
       counter pin connections as described in the EV4 chip specification. The
       DEC 3000/400, /500, /600, and /800 workstations have these connections.
       Attempts to display either  of  these  statistics  on  other  platforms
       (while allowed) will typically generate empty data.

       The  uprofile  command is only supported on EV4 Pass 3 or later proces‐
       sors. Attempts to use it on a Pass 2 processor will gather  PC  samples
       for every process running on the system.

       Using kprofile to generate statistics for a single command is only pos‐
       sible on EV4 Pass 3 or later processors. Attempts to do this on a  Pass
       2 processor will gather statistics for the entire system, as if no com‐
       mand had been specified.

       Using kprofile with triggered mode also requires an EV4 Pass 3 or later
       processor and cannot be performed with per-process monitoring.

       Only  one  tool	can  use the performance counters at a time. A message
       similar to “the counter device is busy” indicates that some other  tool
       is  using the performance counters (or has used them but not cleaned up
       properly). If you are sure no one else is using the  performance	 coun‐
       ters,  running  uprofile/kprofile with superuser privilege will attempt
       to reset the busy status and proceed.

FILES
       The performance counter device file.  The statistics file(s)  generated
       by  uprofile.   The statistics file(s) generated by kprofile.  The sta‐
       tistics file(s) generated with the -pids option.	 The default kernel to
       profile.

SEE ALSO
       Introduction: prof_intro(1)

       pdtostd(1),   pfm(7),   prof(1),	 runon(1),  psrinfo(1),	 sysconfig(8),
       autosysconfig(8), processor_sets(4)

       Programmer's Guide

								   uprofile(1)
[top]
                             _         _         _ 
                            | |       | |       | |     
                            | |       | |       | |     
                         __ | | __ __ | | __ __ | | __  
                         \ \| |/ / \ \| |/ / \ \| |/ /  
                          \ \ / /   \ \ / /   \ \ / /   
                           \   /     \   /     \   /    
                            \_/       \_/       \_/ 
More information is available in HTML format for server Tru64

List of man pages available for Tru64

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net