http://linuxdevcenter.com/pub/a/linux/2005/05/05/libcurl.html?page=1
Simplify Network Programming with libCURL
by Ethan McCallum05/05/2005
The curl command-line tool is a one-stop shop for data transfer. It supports HTTP, FTP, LDAP, and other protocols. However, people who use it as just a download tool don't do it justice.
curl's inner workings use the libCURL client library. So can your programs, to make them URL aware. libCURL-enabled tools can perform downloads, replace fragile FTP scripts, and otherwise take advantage of networking without any (explicit) socket programming. The possibilities are endless, especially with libCURL using a MIT/X-style license agreement.
This article explains how to use libCURL's "easy" API, which is simple and should suit most needs. (I plan to cover the more powerful but more complex "shared" and "multi" interfaces in in a future article.) It uses the following scenarios to demonstrate libCURL programming:
- HTTP
GET
: to fetch content from a URL - Anonymous FTP download: to fetch a remote file
- HTTP
POST
: to simulate a web form, such as a search engine call - Authenticated FTP upload: to log in to a remote host and push a file
Stubs though they may be, the samples are working tools that you can use as building blocks for your own libCURL experiments. Feel free to download the example code and join in.
libCURL is a C library. My examples are in C++, but a proficient C programmer should be able to follow along. That said, I've discovered a template technique that should make libCURL a little easier for C++ programmers.
I tested the sample code under Fedora Core 3, libCURL version 7.12.3. As libCURL is under active development, the examples may require slight modifications to work under different library versions.
curl "Easy" Interface Basics
A typical client/server scenario involves a connection, plus one or many request/response iterations. Consider an HTTP transfer:
- The client establishes a connection with the server
- The client sends a request (usually a
GET
orPOST
operation) - The server sends back some data (HTML or an error message)
- The client and server terminate the connection
libCURL sits in the middle of this process. To use it, configure a context
object with request data (URL, parameters) and response handlers (callback functions). Pass this context to the library, which handles low-level network transport (connection initiation and teardown, data transfer) and calls your response handler(s).
Notice that libCURL doesn't really do anything with the data; it's more of a data transfer framework that fires your callbacks to do the heavy lifting. This clean separation of transport and handling abstracts your development from low-level networking and protocol concerns so that you can focus on writing your application.
Using libCURL's "easy" interface, then, involves the following sequence of API calls:
curl_global_init()
, to initialize the curl library (once per program)curl_easy_init()
, to create a contextcurl_easy_setopt()
, to configure that contextcurl_easy_perform()
, to initiate the request and fire any callbackscurl_easy_cleanup()
, to clean up the contextcurl_global_cleanup()
, to tear down the curl library (once per program)
The function curl_easy_setopt()
deserves attention:
curl_easy_setopt(CURL* ctx , CURLoption key , value )
The parameters are the context, the option name, and the option value, respectively. Think of value
as a void*
(it's really not, but bear with me), because it can be any data type. That data should, however, befit whatever key
sets.
HTTP GET
: Fetch a Web Page
The stub program step1 performs a simple HTTP GET
operation. It prints the response headers to standard error and the body to standard output.
First, step1 calls curl_easy_init()
to create a context object (CURL*
):
CURL* ctx = curl_easy_init() ;
It then calls curl_easy_setopt()
several times to configure the context. (CURLOPT_URL
is the target URL.)
curl_easy_setopt( ctx , CURLOPT_URL , argv[1] ) ;
CURLOPT_WRITEHEADER
is an open FILE*
to which libCURL will write the response headers. step1 sends them to stderr
.
Similarly, CURLOPT_WRITEDATA
is a FILE*
destination (here, stdout
) for the response body. This is text data for HTTP requests but may be binary data for FTP or other transfer types. Note that libCURL defines "read" as "sent data" and "write" as "received data"; some people may find these terms confusing.
CURLOPT_VERBOSE
is helpful for debugging. This option tells libCURL to print low-level diagnostic messages to standard error.
curl_easy_perform()
makes the actual URL call. In the event of an error, curl_easy_strerror()
prints an error message:
const CURLcode rc = curl_easy_perform( ctx ) ;
// for curl v7.11.x and earlier, look into
// the option CURLOPT_ERRORBUFFER instead
if( CURLE_OK != rc ){
std::cerr << "Error from CURL: "
<< curl_easy_strerror( rc) << std::endl ;
} ...
Otherwise, you can call curl_easy_getinfo()
to fetch transfer statistics. Similar to curl_easy_setopt()
, it takes a constant as a key and a void*
in which to store the data:
long statLong ;
curl_easy_getinfo( ctx , CURLINFO_HTTP_CODE , &statLong )
std::cout << "HTTP response code: " << statLong << std::endl ;
You must match the key constant to the pointer you provide: for example, CURLINFO_HTTP_CODE
(the numeric HTTP response code, such as 200
or 404
) requires a long
variable, whereas CURLINFO_SIZE_DOWNLOAD
(the number of bytes downloaded) requires a double
. Call curl_easy_cleanup()
to clean up the context object. Do this after any calls to curl_easy_getinfo()
, or you risk a segmentation fault.
curl_easy_setopt()
doesn't copy any of the pointers you assign to context values, nor does curl_easy_cleanup()
destroy them. You are responsible for ensuring pointer validity throughout the context's lifetime and for cleaning up any resources after the context's teardown.
FTP Download (Fetch a Remote File)
Web services are gaining steam, but plenty of systems still use plain old FTP jobs to transfer data between applications.
Scripts typically feed the ftp
command instructions via standard input. expect offers better error handling, because it simulates an interactive session; but in my experience high-end expect skills are fairly rare. Most annoying is that these scripts run outside of the main application, so they bypass any tracing or error-handling facilities.
step2 addresses these concerns by moving the FTP pull into the native-code application itself. (Pretend that step2 is a code excerpt from a larger, long-running app.) It also demonstrates how to process the data as it downloads, so you don't have to store it in a temporary file.
step2 and step1 share a lot of code, some of which I've already explained.
The CURLOPT_WRITEFUNCTION
context option specifies the function libCURL will call as it downloads the remote file:
size_t showSize( ... ) ;
curl_easy_setopt( ctx , CURLOPT_WRITEFUNCTION , showSize ) ;
The for()
loop in lines 202-239 sets up the FTP calls. The CURLOPT_URL
option is a URL created by concatenating the name of the target file with the name of the server. libCURL will try to use the same network connection for all of the FTP calls, because they share a context.
The value assigned to CURLOPT_WRITEDATA
is available in the CURLOPT_WRITEFUNCTION
callback (here, showSize()
). This can be any data type, either native or user defined. The callback uses the value as a means to keep state between invocations. In step2, this is a custom XferInfo*
object that stores information about the downloaded file and the number of times the library has invoked the callback:
class XferInfo {
void add( int more ) ;
int getBytesTransferred() const {
int getTimesCalled(){
} ;
...
XferInfo info ;
curl_easy_setopt( ctx , CURLOPT_WRITEDATA , &info ) ;
In turn, the showSize()
callback does all of the work. It tracks the size of the files downloaded from the FTP server. Note its signature:
extern "C"
size_t showSize(
void *source ,
size_t size ,
size_t nmemb ,
void *userData
)
All CURLOPT_WRITEFUNCTION
callbacks use this signature.
C++ users must expose callback functions with C linkage, hence the extern "C"
declaration. You can't specify an object member function as a callback, but I've found a template technique to pass the work to an object indirectly.
source
is a buffer of data. I usually cast this to a char*
because I process text data (HTML, XML). This example doesn't use this parameter because showSize()
doesn't do anything with the data itself.
Because source
is not NULL
-terminated, you can't use standard string functions to determine its length. Instead, use the product of size
*nmemb
.
userData
is the value assigned to the CURLOPT_WRITEDATA
context option. Note that the libCURL manual calls this parameter stream
, likely because it's a FILE*
when using the default (libCURL internal) write function. I call it userData
because that's a little less confusing.
As userData
is void*
, you must cast it back to its proper data type. showSize()
casts it to an XferInfo
object and calls its add()
member function to record the number of bytes transferred in this call:
extern "C"
size_t showSize( ... ){
XferInfo* info = static_cast< XferInfo* >( userData ) ;
const int bufferSize = size * nmemb ;
info->add( bufferSize ) ;
On success, your callback should return the number of bytes it processed (size
*nmemb
). libCURL compares this with the number of bytes it passed your function and aborts the transfer if they don't match. Return 0
to indicate the end of processing or some number less than size
*nmemb
to indicate that an error occurred.
A callback may fire several times for the same download, because the library hands you the file data in chunks. This is memory efficient if your code operates on piecemeal data, such as with low-level text parsing. Otherwise, you must store the data yourself as it comes in and handle it after the download, after the call to curl_easy_perform()
returns.
FTP Upload (Push a File to a Legacy System)
Legacy system uploads and downloads go hand in hand. The stub program step3 uses libCURL to log in to a remote FTP host and upload a file. It also describes a way to use a true C++ object as the callback handler (albeit indirectly).
step2 takes the remote FTP host, log in, and password as arguments. (A real app would read in this data from a config file; for now, pretend it's not a security problem to specify it on the command line.) The example merges the hostname with the target file name to form the URL:
ftp://host/file
It also concatenates the log in and password into a string used as the context option CURLOPT_USERPWD
:
login:password
Note that you can also put the log in info in the URL:
ftp://login:password@host/file
This URL format is simply another means to pass the log in information to the API. Unlike a browser, libCURL doesn't use a cache or history bar that prying eyes can later discover. (Hopefully, the remote FTP server doesn't log that kind of information, either.) Nor is the information available via ps
browsing.
For firewall-friendly transfers, the context option CURLOPT_FTP_USE_EPSV
tells the library to use passive FTP.
libCURL doesn't limit you to file transfers alone. You can also send arbitrary FTP commands, such as mkdir
or cwd
. Store the commands in a libCURL linked list (curl_slist*
):
struct curl_slist* commands = NULL ;
commands = curl_slist_append( commands , "mkdir /some/path" ) ;
commands = curl_slist_append( commands , "mkdir /another/path" ) ;
...
The CURLOPT_QUOTE
context option executes commands after logging in to the FTP server but before transferring data. CURLOPT_POSTQUOTE
specifies a list of commands to execute after having transferred data.
curl_easy_setopt( ctx , CURLOPT_QUOTE , commands ) ;
// ... call curl_easy_perform() to run the FTP session ...
curl_slist_free_all( commands ) ;
You can use these context options to curl-ify your old FTP scripts. step3 uses CURLOPT_QUOTE
to call cwd /
such that the file uploads relative to the root directory of the FTP server. (Without an explicit directory change, uploads operate to a path relative to the user's home directory.)
The context option CURLOPT_UPLOAD
tells the library this will be an upload call.
CURLOPT_FTPAPPEND
tells libCURL to append to the target file instead of overwriting it. It's not necessary in this example, but it's something you often see in legacy FTP jobs.
Similar to downloading data, when uploading you have a choice between passing the libCURL library a file handle or creating the data yourself in a callback.
To upload data from an existing file handle, set that FILE*
as the context option CURLOPT_READDATA
.
To use a callback instead (for example, to generate upload data on the fly), assign a function to the context option CURLOPT_READFUNCTION
. The function signature is very similar to that of CURLOPT_WRITEFUNCTION
:
size_t function(
char* buffer ,
size_t size ,
size_t nitems ,
void* userData
) ;
The difference is that buffer
is where you store data in this case, and the product size*nmemb
is the maximum number of bytes you can put there. (Return the number of bytes you put in the buffer.) userData
is the value assigned to CURLOPT_READDATA
.
step3's callback function is rather brief. I employ a C++ template technique to use an object indirectly as a callback handler. If you're not familiar with templates, note that the declaration
template< typename T >
class UploadHandler {
...
means the class UploadHandler
is incomplete as written. The data type T
comes from elsewhere in the code, when registering the function with libCURL:
curl_easy_setopt(
ctx ,
CURLOPT_READFUNCTION ,
UploadHandler< UploadData >::execute
);
Here, UploadData
is the type of object the handler function will use to do the work.
In turn, the static class function UploadHandler::execute()
is a mere pass-through: it casts the userData
value to type T
and invokes T::execute()
to do the actual work.
static size_t execute(
char* buffer ,
size_t size ,
size_t nitems ,
void* userData
){
T* realHandler = static_cast< T* >( userData ) ;
return( realHandler->execute( buffer , size , nitems ) ) ;
}
UploadHandler
will work with any class that implements a fitting execute
member function. I could have used standard inheritance instead:
// ... inside UploadHandler::execute() ...
Handler* h = static_cast< Handler* >( userData ) ;
return( h->execute( ... ) ) ;
I prefer the flexibility of templates, though. Inheritance would tie all handler objects to the Handler
interface.
Similar to download callbacks, the library may call upload callbacks called several times for a single file. Code accordingly.
HTTP POST
(Populate a Web Form)
HTTP POST
operations send form data to a web server as well as making code-to-code calls such as those in web services. The request body comprises the POST
data.
This article's final example, step4, demonstrates how to use libCURL for an HTTP POST
. It also explains how to set up custom HTTP request headers, such as browser identification.
The POST
body is just a string with &
characters between key=value
pairs:
const char* postData = "param1=value1¶m2=value2&..." ;
Pass this string to the library by assigning it to the CURLOPT_POSTFIELDS
option:
curl_easy_setopt( ctx , CURLOPT_POSTFIELDS , postData ) ;
Assign a curl_slist*
to CURLOPT_HTTPHEADER
to set custom HTTP headers:
curl_slist* responseHeaders = NULL ;
responseHeaders = curl_slist_append(
responseHeaders ,
"Expect: 100-continue"
) ;
// ... other curl_slist_append() calls ...
curl_easy_setopt(
ctx ,
CURLOPT_HTTPHEADER ,
responseHeaders
) ;
Note that libCURL clients skip the intermediate step of downloading and processing a form's HTML. In turn, it is unaware of any hidden fields or client-side technologies used therein (such as JavaScript). Put another way, you have to know what fields the web server expects before you can use a libCURL client to POST
data.
Conclusion
libCURL provides clean, simple networking for your native-code applications. With this API in your toolbox, you can incorporate one-off FTP operations into your main application, automate HTTP POST
requests, and more.
There's much more to libCURL than I've presented here. The examples should, however, give you a head start in putting libCURL to use in your own apps.
Resources
-
The article's sample code includes the source for the stub programs, as well as a JSP and PHP page with which to test step4. (The JSP requires a servlet spec 2.4 container, such as Tomcat 5, and a proper 2.4
web.xml
.)The pages simply echo the request headers and
POST
parameters received from the client. -
The curl web site has documentation and tutorials.
-
The TCPMon utility ships with Apache's Axis web services project. It's a listening proxy that shows client/server conversations in a GUI window. I've found it invaluable for debugging problems with my curl code, especially HTTP
POST
operations.TCPMon is a Java application and is thus portable to any Java-enabled platform that meets the JDK version requirements.
-
libCURL is especially useful for creating REST-based web services clients. Also known as XML over HTTP, REST web service calls encapsulate HTTP
GET
orPOST
requests instead of wrapping them in a SOAP envelope. Amazon.com, for example, offers its public web services API via REST as well as SOAP. Yahoo's web services use REST exclusively.
Ethan McCallum grew from curious child to curious adult, turning his passion for technology into a career.