Introduction
curl is a powerful command-line tool for network transfer operations over protocols like HTTP, supporting scripted HTTP requests and providing a rich set of options for users to customize requests and handle responses. Today’s article in the Frontend Morning Reading Course is shared by @Daniel Stenberg and translated by @Piao Piao.
The translation starts here~~
Background
This article assumes you are already familiar with HTML and basic networking knowledge.
As more and more applications migrate to the web platform, “HTTP scripting” has become increasingly common and popular. Today, the ability to automatically extract information from web pages, simulate user behavior, and submit or upload data to servers is very important.
curl is a command-line tool that can handle various URL operations and data transfers, but this article focuses on how to use it to send HTTP requests, whether for fun or practical purposes. We assume you already know how to use <span>curl --help</span>
or <span>curl --manual</span>
to get basic information.
curl does not do everything for you. It is responsible for sending requests, retrieving data, sending data, and receiving information. You may need to use some scripting language or make multiple manual calls to string the entire process together.
HTTP Protocol
HTTP is the protocol used to retrieve data from web servers. It is a simple protocol built on TCP/IP. Through this protocol, clients can also send information to servers in several different ways, which will be briefly introduced below.
HTTP is a pure ASCII text line sent from the client to the server to request a specific operation, and then the server replies with a few lines of text before sending the actual request content.
As a client, curl sends an HTTP request. The request includes a method (such as GET, POST, HEAD, etc.), some request headers, and sometimes a request body. The HTTP server returns a status line (indicating success or failure), response headers, and most of the time, a response body. The response body is the pure data you requested, such as the HTML of a web page, images, etc.
Viewing Protocol Content
Using curl’s <span>--verbose</span>
option (abbreviated as <span>-v</span>
), you can see the specific commands sent by curl to the server, as well as some additional hints.
When you need to debug or just want to understand what is happening between curl and the server, <span>--verbose</span>
is the most useful option.
However, sometimes the information provided by <span>--verbose</span>
is not detailed enough. In this case, you can use <span>--trace</span>
and <span>--trace-ascii</span>
, which will show all content sent and received by curl. For example:
curl --trace-ascii debugdump.txt http://www.example.com/
Viewing Time Information
Sometimes you may want to know how much time each step took, or just want to see the milliseconds between two points in the transfer process. In such cases, you can use the <span>--trace-time</span>
option, which will add a timestamp to each line of trace output:
curl --trace-ascii d.txt --trace-time http://example.com/
Identifying Transfers
When you are performing concurrent transfers, you need to know which transfer corresponds to which request. Especially when receiving response headers and logging, you need to know which transfer the result is from. You can use the <span>--trace-ids</span>
option, which will add transfer and connection identifiers to the beginning of each line of trace output:
curl --trace-ascii d.txt --trace-ids http://example.com/
Viewing Response Content
By default, curl outputs the response result to standard output (stdout). If you do not want it to display directly in the terminal, you need to redirect it, which can usually be done using <span>-o</span>
or <span>-O</span>
parameters.
URL
Specification
A Uniform Resource Locator (URL) is a format used to specify the address of a resource on the internet. You have probably seen many such addresses, such as <span>https://curl.se</span>
or <span>https://example.com</span>
. The standard definition of a URL is in [RFC 3986], and its formal name is actually URI (Uniform Resource Identifier), not URL.
Host
The hostname is usually resolved to an IP address through DNS (Domain Name System) or the local <span>/etc/hosts</span>
file, and curl communicates with the server through these IP addresses. You can also write the IP address directly in the URL, skipping the hostname resolution step.
During development or testing, you can use curl’s <span>--resolve</span>
option to force a hostname to resolve to a specific IP address, for example:
curl --resolve www.example.org:80:127.0.0.1 http://www.example.org/
Port Number
Each protocol supported by curl has a default port number, usually through TCP, and sometimes UDP. Normally, you do not need to worry about the port number, but if you are running a test server or using a different port in special scenarios, you can specify it by adding a colon and the port number after the hostname:
curl http://www.example.org:1234/
The port number you specify is the port that the server provides services on. If you are using a proxy server, you may also need to specify the proxy’s port number separately. For example, using an HTTP proxy running on port 4321:
curl --proxy http://proxy.example.org:4321 http://remote.example.org/
Username and Password
Some services require HTTP authentication, in which case you need to provide a username and password. curl will send these credentials to the remote server based on different authentication protocols.
You can choose to write the username and password directly in the URL:
curl http://user:[email protected]/
Or you can specify them separately using the <span>-u</span>
option:
curl -u user:password http://example.org/
Note that this type of HTTP authentication is not the method used by most user-facing websites today. Modern websites more commonly use forms and cookies for authentication.
Path Part
The path part is the resource path that curl sends to the server to request the corresponding content. The path is the part after the slash following the hostname (or port number).
Getting Page Content
GET
The simplest and most common HTTP request method is the GET request. The URL can point to a webpage, image, or file. The client sends a GET request to the server, and the server returns the corresponding content.
For example, this command:
curl https://curl.se
You will receive the HTML content of that webpage in the terminal.
All HTTP responses will include a set of response headers, but they are not displayed by default. You can use curl’s <span>--include</span>
(abbreviated as <span>-i</span>
) option to display both the response headers and the response body.
HEAD
If you only want to get the response headers without downloading the content itself, you can use the <span>--head</span>
(abbreviated as <span>-I</span>
) option, which will make curl send a HEAD request.
Note that some servers may reject HEAD requests, while others may respond normally; this is a compatibility issue that can be quite troublesome.
The purpose of the HEAD method is to have the server return the same response headers as GET, just without returning the content. For example, you might see a <span>Content-Length:</span>
response header, but the response body itself is empty.
Sending Multiple URLs in One Command
curl supports handling one or more URLs in a single command. While it is most common to handle just one, you can actually specify any number without limit. curl will sequentially send requests for each URL.
For example, sending two GET requests simultaneously:
curl http://url1.example.com http://url2.example.com
If you use the <span>--data</span>
parameter to send POST requests to the URLs, multiple URLs will also receive the same data.
For example, sending two POST requests simultaneously:
curl --data name=curl http://url1.example.com http://url2.example.com
Using Multiple HTTP Methods in One Command
Sometimes you may need to use different HTTP methods for multiple URLs in a single command. In this case, you can use the <span>--next</span>
option. Its function is to segment the command, allowing each segment to use different parameters and methods.
All URLs before <span>--next</span>
will use the same method and parameters.
When curl encounters <span>--next</span>
, it resets the method and parameters and starts a new setup.
For example, first send a HEAD request, then send a GET request:
curl -I http://example.com --next http://example.com
Or, first send a POST request, then send a GET request:
curl -d score=10 http://example.com/post.cgi --next http://example.com/results.html
HTML Forms
Forms
Forms are one of the most common ways to present HTML pages on websites, allowing users to input data and then click the “Confirm” or “Submit” button to send that data to the server. Once the server receives the data, it typically performs corresponding operations based on the content, such as searching the database with the input keywords, adding information to a defect tracking system, displaying addresses on a map, or performing login validation.
Of course, the server must run some program to receive this data; it cannot receive it out of thin air.
GET Request
GET forms use the <span>GET</span>
method in HTML, for example:
<form method="GET" action="junk.cgi">
<input type=text name="birthyear">
<input type=submit name=press value="OK">
</form>
In the browser, this form will display a text box and a button labeled “OK”. If you enter <span>1905</span>
and click OK, the browser will construct a new URL, for example:
junk.cgi?birthyear=1905&press=OK
If this form originally appeared on the page <span>www.example.com/when/birth.html</span>
, then the submitted address will change to:
www.example.com/when/junk.cgi?birthyear=1905&press=OK
This is the method used by most search engines.
To simulate a GET form request with curl, you just need to use the constructed URL:
curl "http://www.example.com/when/junk.cgi?birthyear=1905&press=OK"
POST Request
When using the GET method, all input fields will appear in the browser’s address bar. This is convenient if you want users to bookmark this page (with data); however, if it contains sensitive information, or if there are too many fields or the URL is too long, it becomes less appropriate.
The HTTP protocol also provides the POST method, where the data does not appear in the URL but is sent through the request body.
The form format is similar to GET, just with the method changed to POST:
<form method="POST" action="junk.cgi">
<input type=text name="birthyear">
<input type=submit name=press value=" OK ">
</form>
To simulate this POST request with curl, you can write:
curl --data "birthyear=1905&press=%20OK%20" http://www.example.com/when/junk.cgi
This POST request’s Content-Type is <span>application/x-www-form-urlencoded</span>
, which is the most common way.
You must ensure that the data you send is correctly encoded, for example, spaces should be written as <span>%20</span>
. curl will not automatically encode these contents. If you do not handle it properly, the server may receive incorrect data.
From newer versions of curl, you can use <span>--data-urlencode</span>
to automatically perform URL encoding, for example:
curl --data-urlencode "name=I am Daniel" http://www.example.com
If you use the <span>--data</span>
option multiple times in the command, curl will concatenate them and separate them with the <span>&</span>
symbol:
curl --data name=daniel --data score=10 http://www.example.com
File Upload (POST)
As early as 1995, HTTP added a new way to POST data specifically designed for file uploads. This is detailed in [RFC 1867], so it is sometimes called RFC 1867 POST.
The form used for file uploads is roughly as follows:
<form method="POST" enctype='multipart/form-data' action="upload.cgi">
<input name=upload type=file>
<input type=submit name=press value="OK">
</form>
The form indicates that the data type being sent is <span>multipart/form-data</span>
.
To upload a file with curl, you can write:
curl --form upload=@localfilename --form press=OK [URL]
Hidden Fields
In HTML applications, passing state information often uses hidden fields. These fields do not appear on the page, but they are still sent along with the form submission.
For example, the following form contains a visible field, a hidden field, and a submit button:
<form method="POST" action="foobar.cgi">
<input type=text name="birthyear">
<input type=hidden name="person" value="daniel">
<input type=submit name="press" value="OK">
</form>
When submitting this type of form with curl, you do not need to care whether the fields are hidden or visible; curl treats them the same:
curl --data "birthyear=1905&press=OK&person=daniel" [URL]
How to Analyze Browser POST Requests
When you want to simulate a browser submitting a form with curl, you naturally want it to send data in the same format as the browser.
A simple method is to save the HTML page containing the form locally, change the form’s <span>method</span>
to GET, and then click the submit button (you can also change the <span>action</span>
address if necessary).
This way, you can clearly see the data from the GET request appended to the end of the URL, which helps you understand how to construct the POST request with curl.
HTTP Upload
PUT Method
Perhaps the best way to upload data to an HTTP server is to use the PUT method. Of course, the server must also have a program or script that supports receiving PUT data.
To upload a file with curl:
curl --upload-file uploadfile http://www.example.com/receive.cgi
HTTP Authentication
Basic Authentication
HTTP authentication refers to providing your username and password to the server so that the server can verify whether you have the right to perform the request you are executing. The basic authentication used in HTTP (which is the type curl uses by default) is based on plaintext, meaning that the username and password sent are only slightly obfuscated but can still be fully read by network sniffers between you and the remote server.
Using curl to perform basic authentication:
curl --user name:password http://www.example.com
Other Authentication
The website may require different authentication methods (check the headers returned by the server), and then –ntlm, –digest, –negotiate, or even –anyauth may be suitable options for you.
Proxy Authentication
Sometimes, HTTP requests must access the network through a proxy server, which is especially common in corporate environments. Some proxy servers also require username and password authentication.
You can specify proxy authentication information in curl like this:
curl --proxy-user proxyuser:proxypassword curl.se
If the proxy requires NTLM authentication, you can use <span>--proxy-ntlm</span>
; if it requires Digest, use <span>--proxy-digest</span>
.
Note that if you use these username + password options but omit the password, curl will prompt you to enter it.
Hiding Credentials
It is important to note that when running programs, their command parameters may be visible to other users on the system. This means that others may see your plaintext password.
There are ways to avoid this, but in any case, caution is needed.
Also, keep in mind that while this is the authentication method for HTTP, many modern website logins do not use them, but instead implement login through web forms and cookies. More details can be found in the later “Web Login” section.
More HTTP Headers
Referer
The HTTP request can include a <span>Referer</span>
field (note the misspelling is historical), which indicates from which page the client has jumped to the current resource. Some programs or scripts check this field to verify whether the request comes from a specified page or website. Although this method is easy to forge and not very reliable, it is still used by many scripts.
You can easily set the Referer field in curl to “fool” the server:
curl --referer http://www.example.com http://www.example.com
User-Agent
Similar to Referer, the HTTP request can also include a <span>User-Agent</span>
field, which indicates what browser or tool the client is using. Many websites use this field to decide how to render the page. Some web developers customize different pages based on the browser type to ensure compatibility.
Sometimes you may find that the page obtained using curl is different from what you see in the browser; in this case, you may need to disguise curl as a certain browser.
The default User-Agent used by curl is <span>curl/version</span>
, for example:
User-Agent: curl/8.11.0
If you want curl to appear as if it is using Internet Explorer 5 on Windows 2000, you can write:
curl --user-agent "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" [URL]
Or disguise it as Netscape 4.73 running on an older version of Linux:
curl --user-agent "Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)" [URL]
Redirects
Location Header
When a server returns a page, it may sometimes include a <span>Location:</span>
header, prompting the client to jump to another page or retrieve a new resource.
By default, curl does not automatically follow these redirects but displays them like a normal response. However, you can use <span>--location</span>
to make curl automatically follow the jumps:
curl --location http://www.example.com
If you use <span>--data</span>
or <span>--form</span>
to POST data to a website and want to follow redirects, you can safely use it with <span>--location</span>
. curl will only use POST for the first request, and subsequent redirect requests will switch to GET.
Other Redirect Methods
Browsers typically support two types of redirects that curl does not:
- The HTML
<span><meta refresh></span>
tag, which sets an automatic redirect after a few seconds; - JavaScript code controlling page redirects.
Cookies (Client State Management)
Cookie Basics
Browsers manage client session states through cookies. The server sends cookies to the client and specifies properties such as the applicable path, hostname, expiration time, etc.
When the client visits a path and hostname that meet the criteria again, it automatically sends the cookies back to the server. Many websites use this method to implement “stay logged in” functionality.
To use these cookies in curl, you need to be able to record and return them, simulating browser behavior.
Cookie Related Options
Sending Cookies:
The simplest way is to directly add cookies in the command line:
curl --cookie "name=Daniel" http://www.example.com
Recording Cookies:
curl can use <span>--dump-header</span>
(abbreviated <span>-D</span>
) to record response headers, which also include cookies returned by the server:
curl --dump-header headers_and_cookies http://www.example.com
(However, it is more recommended to use <span>--cookie-jar</span>
to specifically store cookies)
Reading and Reusing Cookies:
curl has a built-in complete cookie parsing engine that supports using cookies saved from the last connection:
curl --cookie stored_cookies_in_file http://www.example.com
If you only want curl to receive cookies without using them, you can use a nonexistent filename:
curl --cookie nada --location http://www.example.com
Saving and Sharing Cookies:
curl supports reading and writing cookies in Netscape/Mozilla format, allowing different scripts to share cookies.
Reading old cookies and saving new cookies:
curl --cookie cookies.txt --cookie-jar newcookies.txt http://www.example.com
HTTPS (Secure HTTP)
HTTPS is Encrypted HTTP
There are several methods for secure HTTP transmission. The most commonly used protocol is usually referred to as HTTPS, which is HTTP based on SSL. SSL encrypts all data transmitted and received over the network, making it harder for attackers to eavesdrop on sensitive information.
SSL (or the current standard version TLS) provides a set of advanced features for secure transmission over HTTP.
When compiled with a TLS library, curl supports encrypted retrieval. It can be compiled to use one of several libraries, and running <span>curl -V</span>
will show which library your curl is compiled with (if any). To retrieve a page from an HTTPS server, simply run curl like this:
curl https://secure.example.com
Certificates
In HTTPS, client certificates can be used for further identity verification. curl supports this method.
Certificates are usually password-protected, and curl will prompt you to enter it, or you can provide it in the command:
curl --cert mycert.pem https://secure.example.com
curl will also verify whether the server’s certificate is trusted, and by default, it will compare it with local CA certificates. If verification fails, curl will refuse to connect unless you add <span>--insecure</span>
(abbreviated <span>-k</span>
):
curl --insecure https://example.com
If you have your own CA certificate file, you can let curl use it like this:
curl --cacert ca-bundle.pem https://example.com/
Custom Request Elements
Modifying Methods and Headers
When performing complex operations, you may need to add or modify individual elements in a curl request. For example:
Change the POST method to PROPFIND and send XML content:
curl --data "<xml>" --header "Content-Type: text/xml" --request PROPFIND example.com
Remove default headers (e.g., Host):
curl --header "Host:" http://www.example.com
Add new headers, such as Destination:
curl --header "Destination: http://nowhere" http://example.com
More Notes on Methods
It should be noted that curl will automatically choose which method to use based on the operation of the request. <span>-d</span>
will perform a POST request, <span>-I</span>
will perform a HEAD request, and so on. If you use the <span>--request / -X</span>
option, you can change the method keyword that curl chooses, but it will not change curl’s behavior. This means, for example, if you use -d “data” to perform a POST request and then use <span>-X</span>
to change the method to PROPFIND, curl will still consider it a POST request. You just need to add <span>-X</span>
POST in the command line to change a normal GET method to a POST method, for example:
curl -X POST http://example.org/
But curl’s behavior will not change; for example, if you use <span>-X POST</span>
but do not add <span>-d</span>
, it will still send a request without a body.
Web Login
Login Tips
Although logging in does not entirely belong to HTTP itself, many people often encounter such issues, so here is a brief explanation of how to simulate login with curl.
Websites usually track login status with cookies, so you need to first obtain and save these cookies. Many websites also set special cookies on the login page (to prevent skipping the login process), so the first step is to visit the login page and record the cookies set by the server.
Some login pages may set or modify cookies through JavaScript. In this case, you can:
- Study the HTML and JS code;
- Or capture and analyze the real requests made by the browser (including the cookies and form fields sent).
Login forms often have hidden fields (such as session IDs, tokens, etc.), so you need to first obtain the source code of the login page, extract all hidden fields, and then submit a POST request, remembering to URL encode the data.
Debugging Tips
When you make requests to certain sites with curl and find that the returned results are different from those in the browser, you need to make curl’s requests more like a browser.
Some debugging suggestions:
- Use
<span>--trace-ascii</span>
to save the complete request log for analysis; - Use
<span>--cookie</span>
and<span>--cookie-jar</span>
to manage cookies; - Set a commonly used User-Agent for browsers (using
<span>-A</span>
); - Set the Referer (using
<span>-e</span>
); - When using POST, ensure the order and format of fields are consistent.
Check the Method of Browser Requests:
Use the browser’s developer tools (F12) to view all HTTP headers sent and received (including HTTPS).
Or use tools like Wireshark or tcpdump to capture network traffic and analyze the real requests made by the browser. (For HTTPS, you need to use the <span>SSLKEYLOGFILE</span>
environment variable to decrypt the traffic.)
About this articleTranslator: @Piao PiaoAuthor: @Daniel StenbergOriginal:https://curl.se/docs/httpscripting.html
This issue of the Frontend Morning Reading Course is helpful to you, please help “like” it, looking forward to the next issue, please help “look” it.