'programming'에 해당되는 글 387건

  1. 2011.07.14 Simplify Network Programming with libCURL

반응형

http://linuxdevcenter.com/pub/a/linux/2005/05/05/libcurl.html?page=1

 

Simplify Network Programming with libCURL

by Ethan McCallum
05/05/2005

The curl command-line tool is a one-stop shop for data transfer. It supports HTTP, FTP, LDAP, and other protocols. However, people who use it as just a download tool don't do it justice.

curl's inner workings use the libCURL client library. So can your programs, to make them URL aware. libCURL-enabled tools can perform downloads, replace fragile FTP scripts, and otherwise take advantage of networking without any (explicit) socket programming. The possibilities are endless, especially with libCURL using a MIT/X-style license agreement.

This article explains how to use libCURL's "easy" API, which is simple and should suit most needs. (I plan to cover the more powerful but more complex "shared" and "multi" interfaces in in a future article.) It uses the following scenarios to demonstrate libCURL programming:

  • HTTP GET: to fetch content from a URL
  • Anonymous FTP download: to fetch a remote file
  • HTTP POST: to simulate a web form, such as a search engine call
  • Authenticated FTP upload: to log in to a remote host and push a file

Stubs though they may be, the samples are working tools that you can use as building blocks for your own libCURL experiments. Feel free to download the example code and join in.

libCURL is a C library. My examples are in C++, but a proficient C programmer should be able to follow along. That said, I've discovered a template technique that should make libCURL a little easier for C++ programmers.

I tested the sample code under Fedora Core 3, libCURL version 7.12.3. As libCURL is under active development, the examples may require slight modifications to work under different library versions.

curl "Easy" Interface Basics

A typical client/server scenario involves a connection, plus one or many request/response iterations. Consider an HTTP transfer:

  1. The client establishes a connection with the server
  2. The client sends a request (usually a GET or POST operation)
  3. The server sends back some data (HTML or an error message)
  4. The client and server terminate the connection

libCURL sits in the middle of this process. To use it, configure a context object with request data (URL, parameters) and response handlers (callback functions). Pass this context to the library, which handles low-level network transport (connection initiation and teardown, data transfer) and calls your response handler(s).

Notice that libCURL doesn't really do anything with the data; it's more of a data transfer framework that fires your callbacks to do the heavy lifting. This clean separation of transport and handling abstracts your development from low-level networking and protocol concerns so that you can focus on writing your application.

Using libCURL's "easy" interface, then, involves the following sequence of API calls:

  1. curl_global_init(), to initialize the curl library (once per program)
  2. curl_easy_init(), to create a context
  3. curl_easy_setopt(), to configure that context
  4. curl_easy_perform(), to initiate the request and fire any callbacks
  5. curl_easy_cleanup(), to clean up the context
  6. curl_global_cleanup(), to tear down the curl library (once per program)

The function curl_easy_setopt() deserves attention:

curl_easy_setopt(CURL* ctx , CURLoption key , value )

The parameters are the context, the option name, and the option value, respectively. Think of value as a void* (it's really not, but bear with me), because it can be any data type. That data should, however, befit whatever key sets.

HTTP GET: Fetch a Web Page

The stub program step1 performs a simple HTTP GET operation. It prints the response headers to standard error and the body to standard output.

First, step1 calls curl_easy_init() to create a context object (CURL*):

CURL* ctx = curl_easy_init() ;

It then calls curl_easy_setopt() several times to configure the context. (CURLOPT_URL is the target URL.)

curl_easy_setopt( ctx , CURLOPT_URL , argv[1] ) ;

CURLOPT_WRITEHEADER is an open FILE* to which libCURL will write the response headers. step1 sends them to stderr.

Similarly, CURLOPT_WRITEDATA is a FILE* destination (here, stdout) for the response body. This is text data for HTTP requests but may be binary data for FTP or other transfer types. Note that libCURL defines "read" as "sent data" and "write" as "received data"; some people may find these terms confusing.

CURLOPT_VERBOSE is helpful for debugging. This option tells libCURL to print low-level diagnostic messages to standard error.

curl_easy_perform() makes the actual URL call. In the event of an error, curl_easy_strerror() prints an error message:

const CURLcode rc = curl_easy_perform( ctx ) ;

// for curl v7.11.x and earlier, look into
// the option CURLOPT_ERRORBUFFER instead
if( CURLE_OK != rc ){
  std::cerr << "Error from CURL: "
    << curl_easy_strerror( rc) << std::endl ;
} ... 

Otherwise, you can call curl_easy_getinfo() to fetch transfer statistics. Similar to curl_easy_setopt(), it takes a constant as a key and a void* in which to store the data:

long statLong ;
curl_easy_getinfo( ctx , CURLINFO_HTTP_CODE , &statLong )
std::cout << "HTTP response code: " << statLong << std::endl ;

You must match the key constant to the pointer you provide: for example, CURLINFO_HTTP_CODE (the numeric HTTP response code, such as 200 or 404) requires a long variable, whereas CURLINFO_SIZE_DOWNLOAD (the number of bytes downloaded) requires a double. Call curl_easy_cleanup() to clean up the context object. Do this after any calls to curl_easy_getinfo(), or you risk a segmentation fault.

curl_easy_setopt() doesn't copy any of the pointers you assign to context values, nor does curl_easy_cleanup() destroy them. You are responsible for ensuring pointer validity throughout the context's lifetime and for cleaning up any resources after the context's teardown.

FTP Download (Fetch a Remote File)

Web services are gaining steam, but plenty of systems still use plain old FTP jobs to transfer data between applications.

Scripts typically feed the ftp command instructions via standard input. expect offers better error handling, because it simulates an interactive session; but in my experience high-end expect skills are fairly rare. Most annoying is that these scripts run outside of the main application, so they bypass any tracing or error-handling facilities.

step2 addresses these concerns by moving the FTP pull into the native-code application itself. (Pretend that step2 is a code excerpt from a larger, long-running app.) It also demonstrates how to process the data as it downloads, so you don't have to store it in a temporary file.

step2 and step1 share a lot of code, some of which I've already explained.

The CURLOPT_WRITEFUNCTION context option specifies the function libCURL will call as it downloads the remote file:

size_t showSize( ... ) ;
curl_easy_setopt( ctx , CURLOPT_WRITEFUNCTION , showSize ) ;

The for() loop in lines 202-239 sets up the FTP calls. The CURLOPT_URL option is a URL created by concatenating the name of the target file with the name of the server. libCURL will try to use the same network connection for all of the FTP calls, because they share a context.

The value assigned to CURLOPT_WRITEDATA is available in the CURLOPT_WRITEFUNCTION callback (here, showSize()). This can be any data type, either native or user defined. The callback uses the value as a means to keep state between invocations. In step2, this is a custom XferInfo* object that stores information about the downloaded file and the number of times the library has invoked the callback:

class XferInfo {
  void add( int more ) ;
  int getBytesTransferred() const {
  int getTimesCalled(){
} ;

...

XferInfo info ;
curl_easy_setopt( ctx , CURLOPT_WRITEDATA , &info ) ;

In turn, the showSize() callback does all of the work. It tracks the size of the files downloaded from the FTP server. Note its signature:

extern "C"
size_t showSize(
  void *source ,
  size_t size ,
  size_t nmemb ,
  void *userData
)

All CURLOPT_WRITEFUNCTION callbacks use this signature.

C++ users must expose callback functions with C linkage, hence the extern "C" declaration. You can't specify an object member function as a callback, but I've found a template technique to pass the work to an object indirectly.

source is a buffer of data. I usually cast this to a char* because I process text data (HTML, XML). This example doesn't use this parameter because showSize() doesn't do anything with the data itself.

Because source is not NULL-terminated, you can't use standard string functions to determine its length. Instead, use the product of size*nmemb.

userData is the value assigned to the CURLOPT_WRITEDATA context option. Note that the libCURL manual calls this parameter stream, likely because it's a FILE* when using the default (libCURL internal) write function. I call it userData because that's a little less confusing.

As userData is void*, you must cast it back to its proper data type. showSize() casts it to an XferInfo object and calls its add() member function to record the number of bytes transferred in this call:

extern "C"
size_t showSize( ... ){

  XferInfo* info = static_cast< XferInfo* >( userData ) ;
  const int bufferSize = size * nmemb ;

  info->add( bufferSize ) ;

On success, your callback should return the number of bytes it processed (size*nmemb). libCURL compares this with the number of bytes it passed your function and aborts the transfer if they don't match. Return 0 to indicate the end of processing or some number less than size*nmemb to indicate that an error occurred.

A callback may fire several times for the same download, because the library hands you the file data in chunks. This is memory efficient if your code operates on piecemeal data, such as with low-level text parsing. Otherwise, you must store the data yourself as it comes in and handle it after the download, after the call to curl_easy_perform() returns.

FTP Upload (Push a File to a Legacy System)

Legacy system uploads and downloads go hand in hand. The stub program step3 uses libCURL to log in to a remote FTP host and upload a file. It also describes a way to use a true C++ object as the callback handler (albeit indirectly).



step2 takes the remote FTP host, log in, and password as arguments. (A real app would read in this data from a config file; for now, pretend it's not a security problem to specify it on the command line.) The example merges the hostname with the target file name to form the URL:

ftp://host/file

It also concatenates the log in and password into a string used as the context option CURLOPT_USERPWD:

login:password

Note that you can also put the log in info in the URL:

ftp://login:password@host/file

This URL format is simply another means to pass the log in information to the API. Unlike a browser, libCURL doesn't use a cache or history bar that prying eyes can later discover. (Hopefully, the remote FTP server doesn't log that kind of information, either.) Nor is the information available via ps browsing.

For firewall-friendly transfers, the context option CURLOPT_FTP_USE_EPSV tells the library to use passive FTP.

libCURL doesn't limit you to file transfers alone. You can also send arbitrary FTP commands, such as mkdir or cwd. Store the commands in a libCURL linked list (curl_slist*):

struct curl_slist* commands = NULL ;
commands = curl_slist_append( commands , "mkdir /some/path" ) ;
commands = curl_slist_append( commands , "mkdir /another/path" ) ;
... 

The CURLOPT_QUOTE context option executes commands after logging in to the FTP server but before transferring data. CURLOPT_POSTQUOTE specifies a list of commands to execute after having transferred data.

curl_easy_setopt( ctx , CURLOPT_QUOTE , commands ) ;
// ... call curl_easy_perform() to run the FTP session ...
curl_slist_free_all( commands ) ;

You can use these context options to curl-ify your old FTP scripts. step3 uses CURLOPT_QUOTE to call cwd / such that the file uploads relative to the root directory of the FTP server. (Without an explicit directory change, uploads operate to a path relative to the user's home directory.)

The context option CURLOPT_UPLOAD tells the library this will be an upload call.

CURLOPT_FTPAPPEND tells libCURL to append to the target file instead of overwriting it. It's not necessary in this example, but it's something you often see in legacy FTP jobs.

Similar to downloading data, when uploading you have a choice between passing the libCURL library a file handle or creating the data yourself in a callback.

To upload data from an existing file handle, set that FILE* as the context option CURLOPT_READDATA.

To use a callback instead (for example, to generate upload data on the fly), assign a function to the context option CURLOPT_READFUNCTION. The function signature is very similar to that of CURLOPT_WRITEFUNCTION:

size_t function(
  char* buffer ,
  size_t size ,
  size_t nitems ,
  void* userData
) ;

The difference is that buffer is where you store data in this case, and the product size*nmemb is the maximum number of bytes you can put there. (Return the number of bytes you put in the buffer.) userData is the value assigned to CURLOPT_READDATA.

step3's callback function is rather brief. I employ a C++ template technique to use an object indirectly as a callback handler. If you're not familiar with templates, note that the declaration

template< typename T > 
class UploadHandler {
  ...

means the class UploadHandler is incomplete as written. The data type T comes from elsewhere in the code, when registering the function with libCURL:

curl_easy_setopt(
  ctx ,
  CURLOPT_READFUNCTION ,
  UploadHandler< UploadData >::execute
);

Here, UploadData is the type of object the handler function will use to do the work.

In turn, the static class function UploadHandler::execute() is a mere pass-through: it casts the userData value to type T and invokes T::execute() to do the actual work.

static size_t execute(
  char* buffer ,
  size_t size ,
  size_t nitems ,
  void* userData
){

  T* realHandler = static_cast< T* >( userData ) ;
  return( realHandler->execute( buffer , size , nitems ) ) ;

}

UploadHandler will work with any class that implements a fitting execute member function. I could have used standard inheritance instead:

// ... inside UploadHandler::execute() ...

Handler* h = static_cast< Handler* >( userData ) ;
return( h->execute( ... ) ) ;

I prefer the flexibility of templates, though. Inheritance would tie all handler objects to the Handler interface.

Similar to download callbacks, the library may call upload callbacks called several times for a single file. Code accordingly.

HTTP POST (Populate a Web Form)

HTTP POST operations send form data to a web server as well as making code-to-code calls such as those in web services. The request body comprises the POST data.

This article's final example, step4, demonstrates how to use libCURL for an HTTP POST. It also explains how to set up custom HTTP request headers, such as browser identification.

The POST body is just a string with & characters between key=value pairs:

const char* postData = "param1=value1&param2=value2&..." ;

Pass this string to the library by assigning it to the CURLOPT_POSTFIELDS option:

curl_easy_setopt( ctx , CURLOPT_POSTFIELDS , postData ) ;

Assign a curl_slist* to CURLOPT_HTTPHEADER to set custom HTTP headers:

curl_slist* responseHeaders = NULL ;

responseHeaders = curl_slist_append(
  responseHeaders ,
  "Expect: 100-continue"
) ;

// ... other curl_slist_append() calls ...

curl_easy_setopt(
  ctx ,
  CURLOPT_HTTPHEADER ,
  responseHeaders
) ;

Note that libCURL clients skip the intermediate step of downloading and processing a form's HTML. In turn, it is unaware of any hidden fields or client-side technologies used therein (such as JavaScript). Put another way, you have to know what fields the web server expects before you can use a libCURL client to POST data.

Conclusion

libCURL provides clean, simple networking for your native-code applications. With this API in your toolbox, you can incorporate one-off FTP operations into your main application, automate HTTP POST requests, and more.

There's much more to libCURL than I've presented here. The examples should, however, give you a head start in putting libCURL to use in your own apps.

Resources

  • The article's sample code includes the source for the stub programs, as well as a JSP and PHP page with which to test step4. (The JSP requires a servlet spec 2.4 container, such as Tomcat 5, and a proper 2.4 web.xml.)

    The pages simply echo the request headers and POST parameters received from the client.

  • The curl web site has documentation and tutorials.

  • The TCPMon utility ships with Apache's Axis web services project. It's a listening proxy that shows client/server conversations in a GUI window. I've found it invaluable for debugging problems with my curl code, especially HTTP POST operations.

    TCPMon is a Java application and is thus portable to any Java-enabled platform that meets the JDK version requirements.

  • libCURL is especially useful for creating REST-based web services clients. Also known as XML over HTTP, REST web service calls encapsulate HTTP GET or POST requests instead of wrapping them in a SOAP envelope. Amazon.com, for example, offers its public web services API via REST as well as SOAP. Yahoo's web services use REST exclusively.

Ethan McCallum grew from curious child to curious adult, turning his passion for technology into a career.


반응형
Posted by 공간사랑
,