webgrab man page on Inferno

webgrab man page on Inferno

Man page or keyword search:
man Server 579 pages
apropos Keyword Search (all sections)
Output format

WEBGRAB(1)							    WEBGRAB(1)

NAME
       webgrab - fetch web page content as files

SYNOPSIS
       webgrab [ -r ] [ -v ] [ -o stem ] [ -p body ] url

DESCRIPTION
       Webgrab	connects  to  the web server named in the url.	It fetches the
       content of the web page also determined	by  the	 url,  and  stores  it
       locally in a file.  If the page is written in HTML, webgrab reads it to
       build a list of	sub-component  pages  (eg,  frames)  and  images.   It
       fetches those, saving the content in separate files.  It adds a comment
       to the end of each HTML file giving the time, and  the  file's  origin.
       It automatically follows redirections offered by the server.

       The  stem  of  the names of the output files is normally derived from a
       component of the url.  If the url contains a path name, the stem is the
       component  of that path, less any dot-separated suffix and prefix.  For
       example, given

	      http://www.vitanuova.com/inferno/old.index.html

       the stem would be index.	 If there is no path name, but	the  url  con‐
       tains  a	 domain	 name,	the  stem  is the penultimate component of the
       domain name (eg, excluding trailing .com, and initial www,  etc).   For
       example, given

	      www.innerhost.vitanuova.com

       the  stem would be vitanuova.  If all else fails, webgrab uses the stem
       webgrab.

       Given a stem, the initial page is stored in stem.suffix where suffix is
       the  suffix  (eg, .html) of the name of the original page.  Subordinate
       pages are saved	in  a  similar	way  in	 files	named  stem_1.suffix1,
       stem_2.suffix2, ... .

       The options are:

       -r     do not fetch subcomponents (just the `raw' source of url itself)

       -v     print a progress report

       -vv    print a chatty progress report

       -o stem
	      use the stem as given

       -p body
	      Use HTTP POST instead of GET, posting body as the data

       Webgrab	reads  the  configuration  file /services/webget/config (if it
       exists), to look for the address of an  optional	 HTTP  proxy  (in  the
       entry),	and  list  of domains for which a proxy should not be used (in
       the noproxy or noproxydoms entry).  If  symbolic	 network  and  service
       names  might  be	 involved,  the	 connection  server lib/cs needs to be
       already running.

FILES
       /services/webget/config

SOURCE
       /appl/cmd/webgrab.b

BUGS
       It should read the proxy name from the charon(1) configuration file and
       not the webget configuration file.
       It cannot do `secure' transfers (https).
       Its  HTML parsing is naive, but on the other hand, it is less likely to
       trip over HTML novelties.

SEE ALSO
       cs(8)

								    WEBGRAB(1)

[top]

List of man pages available for Inferno

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]

Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................

Vote for polarhome