TrumanWong

wget

Linux system download file tool

Supplementary instructions

wget command is used to download files from the specified URL. wget is very stable, and it has strong adaptability in situations with very narrow bandwidth and unstable networks. If the download fails due to network reasons, wget will continue to try until the entire file is downloaded. If the server interrupts the download process, it will contact the server again and continue downloading from where it stopped. This is useful for downloading large files from servers that have limited connection times.

wget supports HTTP, HTTPS and FTP protocols and can use HTTP proxy. The so-called automatic download means that wget can be executed in the background after the user exits the system. This means that you can log in to the system, start a wget download task, and then exit the system. Wget will execute in the background until the task is completed. Compared with most other browsers that require continuous user participation when downloading large amounts of data, this saves a lot of time. Big trouble.

Used to download resources from the network. If the directory is not specified, the downloaded resources will default to the current directory. Although wget is powerful, it is still relatively simple to use:

  1. Support breakpoint download function This was also the biggest selling point of Internet Ant and FlashGet back then. Now, Wget can also use this function, so users whose network is not very good can rest assured;
  2. Supports both FTP and HTTP download methods Although most software can now be downloaded using HTTP, sometimes, you still need to use FTP to download software;
  3. Support proxy server For systems with high security intensity, they generally do not expose their systems directly to the Internet. Therefore, supporting proxy servers is a must-have function for downloading software;
  4. Easy and easy to set up Perhaps, users who are accustomed to graphical interfaces are no longer used to the command line. However, the command line actually has more advantages in setting up. At least, you can click the mouse many times less, so don’t worry. Whether you clicked the mouse incorrectly;
  5. Small program, completely free If the program is small, you can ignore it because the hard disk is too big now; if it is completely free, you have to consider it. Even though there are many so-called free software on the Internet, the advertisements of these software But it's not what we like.

grammar

wget [parameter] [URL address]

Options

Startup parameters:

-V, --version Display the version of wget and exit
-h, --help print syntax help
-b, --background Go to background execution after startup
-e, --execute=COMMAND Execute commands in .wgetrc format. For wgetrc format, see /etc/wgetrc or ~/.wgetrc.

Logging and input file parameters:

-o, --output-file=FILE Write records to FILE file
-a, --append-output=FILE Append records to the FILE file
-d, --debug print debug output
-q, --quiet Quiet mode (no output)
-v, --verbose verbose mode (this is the default)
-nv, --non-verbose turn off verbose mode, but not quiet mode
-i, --input-file=FILE Download URLs that appear in the FILE file
-F, --force-html Treat the input file as an HTML format file
-B, --base=URL Use URL as a prefix for relative links that appear in the file specified by the -F -i parameter.
--sslcertfile=FILE optional client certificate
--sslcertkey=KEYFILE Optional client certificate KEYFILE
--egd-file=FILE specifies the file name of the EGD socket

Download parameters:

--bind-address=ADDRESS specifies the local address (hostname or IP, used when there are multiple IPs or names locally)
-t, --tries=NUMBER Set the maximum number of connection attempts (0 means unlimited).
-O --output-document=FILE Write the document to the FILE file
-nc, --no-clobber Do not overwrite existing files or use .# prefix
-c, --continue and then download unfinished files
–progress=TYPE Set progress bar mark
-N, --timestamping Do not re-download files unless they are newer than local files
-S, --server-response print server response
--spider don't download anything
-T, --timeout=SECONDS Set the number of seconds for response timeout
-w, --wait=SECONDS SECONDS seconds between attempts
–waitretry=SECONDS Wait 1…SECONDS seconds between relinks
–random-wait wait 0…2*WAIT seconds between downloads
-Y, --proxy=on/off turn proxy on or off
-Q, –-quota=NUMBER Set download capacity limit
-–limit-rate=RATE limit download rate

Directory parameters:

-nd --no-directories do not create directories
-x, --force-directories Force directory creation
-nH, --no-host-directories Do not create host directories
-P, --directory-prefix=PREFIX Save file to directory PREFIX/…
–cut-dirs=NUMBER ignore level NUMBER remote directories

HTTP option parameters:

-–http-user=USER Set the HTTP user name to USER.
-–http-passwd=PASS Set http password to PASS
-C, --cache=on/off allows/disallows server-side data caching (generally allowed)
-E, --html-extension Save all text/html documents with .html extension
--ignore-length ignore Content-Length header field
-–header=STRING Insert string STRING into headers
-–proxy-user=USER Set the proxy user name to USER
-–proxy-passwd=PASS Set the proxy password to PASS
-–referer=URL Include Referer: URL header in HTTP requests
-s, --save-headers Save HTTP headers to file
-U, --user-agent=AGENT Set the agent name to AGENT instead of Wget/VERSION
-–no-http-keep-alive Turn off HTTP active links (always linked)
--cookies=off do not use cookies
--load-cookies=FILE Load cookies from file FILE before starting a session
-–save-cookies=FILE Save cookies to FILE after session

FTP options parameters:

-nr, --dont-remove-listing Do not remove .listing files
-g, --glob=on/off Turn on or off the globbing mechanism of file names
--passive-ftp Use passive transfer mode (default).
--active-ftp Use active transfer mode
--retr-symlinks When recursing, point links to files (instead of directories)

Recursive download parameters:

-r, --recursive Recursive download--use with caution!
-l, --level=NUMBER maximum recursion depth (inf or 0 means infinite)
--delete-after partially delete files after completion
-k, --convert-links Convert non-relative links to relative links
-K, --backup-converted Back up file X as X.orig before converting it
-m, –-mirror is equivalent to -r -N -l inf -nr
-p, --page-requisites Download all images displayed in the HTML file

Inclusion and exclusion (accept/reject) in recursive downloads:

-A, --accept=LIST Semicolon separated list of accepted extensions
-R, --reject=LIST Semicolon-separated list of unacceptable extensions
-D, --domains=LIST Semicolon separated list of accepted domains
--exclude-domains=LIST Semicolon-separated list of unacceptable domains
--follow-ftp Follow FTP links in HTML documents
--follow-tags=LIST Semicolon-separated list of HTML tags to follow
-G, --ignore-tags=LIST Semicolon-separated list of ignored HTML tags
-H, --span-hosts when recursing go to external hosts
-L, --relative only follow relative links
-I, --include-directories=LIST list of allowed directories
-X, --exclude-directories=LIST List of directories not to be included
-np, --no-parent Do not trace back to parent directory
wget -S --spider url does not download and only displays the process

Parameters

URL: Download the specified URL address.

Example

Use wget to download a single file

wget http://www.jsdig.com/testfile.zip

The following example downloads a file from the Internet and saves it in the current directory. During the download process, a progress bar will be displayed, including (download completion percentage, downloaded bytes, current download speed, remaining download time).

Download and save with different file name

wget -O wordpress.zip http://www.jsdig.com/download.aspx?id=1080

By default, wget will use the last character that matches / to command. For dynamic link downloads, the file name will usually be incorrect.

Error: The following example downloads a file and saves it with the name download.aspx?id=1080:

wget http://www.jsdig.com/download?id=1

Even though the downloaded file is in zip format, it is still named with download.php?id=1080.

Correct: To solve this problem, we can use the parameter -O to specify a file name:

wget -O wordpress.zip http://www.jsdig.com/download.aspx?id=1080

wget speed limited download

wget --limit-rate=300k http://www.jsdig.com/testfile.zip

When you execute wget, it will occupy all possible bandwidth downloads by default. But when you are going to download a large file and you also need to download other files, it is necessary to limit the speed.

Use wget to resume downloading from breakpoints

wget -c http://www.jsdig.com/testfile.zip

Using wget -c to restart the download of interrupted files is very helpful when we download large files and are suddenly interrupted due to network and other reasons. We can continue downloading instead of re-downloading a file. You can use the -c parameter when you need to continue an interrupted download.

Use wget background download

wget -b http://www.jsdig.com/testfile.zip

Continuing in the background, pid 1840.
Output will be written to 'wget-log'.

When downloading very large files, we can use the parameter -b to perform background downloading. You can use the following command to check the download progress:

tail -f wget-log

Disguise proxy name download

wget --user-agent="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16" http://www.jsdig.com/testfile.zip

Some websites can deny your download request by judging that the proxy name is not a browser. But you can disguise it through the --user-agent parameter.

Test download link

When you plan to perform scheduled downloads, you should test whether the download link is valid at the scheduled time. We can add the --spider parameter to check.

wget --spider URL

If the download link is correct, it will be displayed:

Spider mode enabled. Check if remote file exists.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.

This ensures that the download will take place at the scheduled time, but if you give the wrong link, the following error will be displayed:

wget --spider url
Spider mode enabled. Check if remote file exists.
HTTP request sent, awaiting response... 404 Not Found
Remote file does not exist -- broken link!!!

You can use the --spider parameter in the following situations:

  • Check before scheduled download
  • Check whether the website is available at intervals
  • Check website pages for dead links

Increase the number of retries

wget --tries=40 URL

It may also fail if there are network problems or if you download a large file. By default, wget retries 20 times to connect and download files. If necessary, you can use --tries to increase the number of retries.

Download Multiple Files

wget -i filelist.txt

First, save a file of download links:

cat > filelist.txt
url1
url2
url3
url4

Then use this file and the parameter -i to download.

Mirror Website

wget --mirror -p --convert-links -P ./LOCAL URL

Download the entire website to your local computer.

  • --miror account opening mirror download.
  • -p downloads all files for normal html page display.
  • After --convert-links is downloaded, convert it into a local link.
  • -P ./LOCAL saves all files and directories to the local specified directory.

Filter downloads in specified formats

wget --reject=gif ur

To download a website, but you don't want to download images, you can use this command.

Save download information into log file

wget -o download.log URL

This can be used if you do not want the download information to be displayed directly in the terminal but in a log file.

Limit total download file size

wget -Q5m -i filelist.txt

You can use this when the file you want to download exceeds 5M and exits the download. Note: This parameter does not work for single file downloads, but only for recursive downloads.

Download the specified format file

wget -r -A.pdf url

This feature can be used in the following situations:

  • Download all images from a website.
  • Download all videos from a website.
  • Download all PDF files of a website.

FTP Download

wget ftp-url
wget --ftp-user=USERNAME --ftp-password=PASSWORD url

You can use wget to complete the download of ftp link.

Anonymous ftp download using wget:

wget ftp-url

FTP download using wget username and password authentication:

wget --ftp-user=USERNAME --ftp-password=PASSWORD url