Migrating Solaris Applications to Linux
The consequence of the increasing popularity of Linux is that companies are enthusiastic in migrating their existing applications and development environments to Linux. An application following the standards will be easier to port than a non-conformant application. Since Linux and Solaris share the common API sets of the UNIX system, migrating from Solaris to Linux is a easier path than from an operating system with different API sets such as the Microsoft Windows platform.
This chapter provides an outline on the advantages of doing such a migration and the path that can be taken. In addition to providing the toolsets that can be used in the process of migration, some standard issues with the porting is also discussed.
Issues with Porting
Porting applications from a Solaris platform to Linux platform may be considered an easy task because of the application level protability provided GNU C libraries of the Linux platform. In most cases, the application can be ported by simply recompiling the program. But if the code has any system or hardware dependent constructs then some modification is needed for carrying out a complete porting. The complexity of such a porting effort is directly proportional to the amount of system and hardware dependent code. If the application uses only standard language constructs and does not take platform considerations, then such an application will be less complicated to port. If the application uses non-POSIX constructs or platform specific optimizations, then the porting will be harder. Typically, a Java application falls in the first category and a non-POSIX C program falls in the second category. Some of the issues with porting are explained in the following sections.
Byte Ordering
Byte orderin, or endianness, is the property of a data element that refers to how its bytes are stored or addressed in memory. This property is determined by the CPU architecture of the platform. There are two endianness, big endian and little endian.
Big Endian
In this type of byte ordering the most significant byte is stored at the lowest storage address, for example, the MSB is stored in the leftmost position. Sun SPARC, HP Precision, and the IBM PPC use this type of ordering.
Example 1. Big Endian Ordering
Data: 0x33445566 |
Storage: |
Byte 0 | Byte 1 | Byte 2 | Byte 3 |
33(MSB) | 44 | 55 | 66 |
Little endian
In this type of byte ordering the least significant byte(LSB) is stored at the lowest storage address, for example, the MSB is stored in the rightmost position. Intel and DEC architectures use this ordering.
Example 2. Little Endian Ordering
Data: 0x33445566 |
Storage: |
Byte 0 | Byte 1 | Byte 2 | Byte 3 |
66 (LSB) | 55 | 44 | 33 |
Byte Ordering and Union Data Structure
The byte ordering discussed in the previous sections will become an issue in porting if the code uses union data structure. Union is a data structure to manipulate different types of data in the same storage area. It is completely the compilers responsibility to manage the size and alignment of the data stored in that storage area. For example, consider the following union:
Example 3. Union Data Structure
union char_int { char a[4]; int i; }char_int; |
Consider the following code that uses the above union:
Example 4. Union Data Structure Code
//define an instance variable of the union union char_int cint; //define a variable to hold the computation result int result; //assign a value to the union using the integer variable cint.i = 0x10394500; //divide the constant 100 with MSB i.e. first byte. result = 100/a[0]; |
The previous example uses the knowledge of byte ordering to retrieve the most significant byte of the integer data. While this code will work properly with big-endian architecture, when it is ported to little-endian machine, the code will fail because of a divide by zero error. To have a portable code, either the byte ordering information should not be used inside the code or the byte information has to be defined as conditional compilation parameters.
Example 5. Code
//Portability code added #ifdef BIGENDIAN const int MSBBYTE=0; #else const int MSBBYTE=3; #endif //define an instance variable of the union union char_int cint; //define a variable to hold the computation result int result; //assign a value to the union using the integer variable cint.i = 0x10394500; //divide the constant 100 with MSB i.e. first byte. result = 100/a[MSBBYTE]; |
Byte ordering may also cause problems with network data transfer. Most of the modern network protocols take the endianness into account. These protocols use the External Data Representation (XDR) ordering to transfer data. Data from different source environments are converted to this representation while starting a transfer. Data will convert back to the native representation of the target environment at the finish of transfer. Distributed applications that do raw data transfers should also account for the endianness. The standard C library provides many routines to do this conversion. APIs such as the following, provide a means of platform independence while doing the network transfer:
- ntohl() - converts unsigned int from XDR to native
- ntohs() - converts unsigned int from XDR to native
- htonl() - converts unsigned int from native to XDR
- htons() - converts unsigned short from native to XDR
Signal Handling
Signals are the means to notify a process or thread the occurrence of an event. In case of signal handling, Linux supports most of the signals supported by UNIX Systems such as SVR4 and the BSD implementations. However, there are some exceptions that need to be considered.
- SIGEMT represents hardware fault is not supported
- SIGINFO represents keyboard information requests and is supported
- SIGSYS represents invalid system call is not supported.
- SIGABRT and SIGIOT are identical.
- SIGIO, SIGPOLL, and SIGURG are identical.
- SIGBUS is defined as SIGUNUSED because there is no "bus error"' in Linux platform.
The following table lists various signals and their meaning in both Solaris as well as Linux environments. This table can be used to verify if the signals used in the code has a different semantics in Linux.
Table 1. Semantics of Signals in Solaris and Linux
SIGNAL | Solaris | Linux |
---|---|---|
SIGHUP | Terminate | Ignore |
SIGINT | Terminate | Ignore |
SIGQUIT | Terminate, core | Terminate, core |
SIGILL | Terminate, core | Terminate, core |
SIGTRAP | Terminate, core | Ignore |
SIGABRT | Terminate, core | Terminate, core |
SIGEMT | Terminate, core | Not supported on Linux |
SIGFPE | Terminate, core | Terminate, core |
SIGKILL | Terminate | Terminate |
SIGBUS | Terminate, core | Terminate, core |
SIGSEGV | Terminate, core | Terminate, core |
SIGSYS | Terminate, core | Not supported on Linux |
SIGPIPE | Terminate | Ignore |
SIGALRM | Terminate | Ignore |
SIGTERM | Terminate | Terminate |
SIGUSR1 | Terminate | Ignore |
SIGUSR2 | Terminate | Ignore |
SIGCHLD | Ignore | Ignore |
SIGPWR | Ignore | Ignore |
SIGWINCH | Ignore | Process stop |
SIGURG | Ignore | Ignore |
SIGPOLL | Terminate | Not supported on Linux |
SIGSTOP | Process stop | Process stop |
SIGSTP | Process stop | Process stop |
SIGCONT | Ignore | Ignore |
SIGTTIN | Process stop | Process stop |
SIGTTOU | Process stop | Process stop |
SIGVTALRM | Terminate | Terminate, core |
SIGPROF | Terminate | Ignore |
SIGXCPU | Terminate, core | Terminate, core |
SIGXFSZ | Terminate, core | Terminate, core |
SIGWAITING | Ignore | Not supported on Linux |
SIGLWP | Ignore | Not supported on Linux |
SIGFREEZE | Ignore | Not supported on Linux |
SIGTHAW | Ignore | Not supported on Linux |
SIGCANCEL | Ignore | Not supported on Linux |
SIGRTMIN | Terminate | Not supported on Linux |
SIGTRMAX | Terminate | Not supported on Linux |
Note: This table is also available in the Technical guide for porting applications from Solaris to Linux, Version 1.0
Runtime Libraries
A Linker is responsible for linking the executables with the shared libraries. Though both the environments follow the same method for linking, there are some subtle differences. Linux maintains two different set of libraries, system libraries and user libraries. Solaris has only one set of libraries.
Table 2. Runtime library differences between Solaris and Linux
Function | Solaris | Linux |
---|---|---|
Runtime Linker | /usr/lib/ld.so.1 |
/lib/ld-linux.so.1 |
Runtime Linker configurator | Crle | ldconfig |
File Systems
Migrating data from one file system to another can be accomplished by either data migration tools or data can be transferred over the network. Data transferring is the most time consuming of the two. Linux has support for about thirty different file systems. In Solaris the predominant file system is UFS, in Linux it is EXT2. Linux has support for many different file systems such as: adfs, affs, autofs, coda, coherent, cramfs, devpts, efs, ext, ext2, ext3, hfs, hpfs, iso9660, jfs, minix, msdos, ncpfs, nfs, ntfs, proc, qnx4, reiserfs, romfs, smbfs, sysv, tmpfs, udf, ufs, umsdos, vfat, xenix, xfs, and xiafs. The following table gives some filesystem differences between Linux and Solaris:
Table 3. Filesystem differences between Solaris and Linux
File System | Solaris | Linux |
---|---|---|
Common file system | UFS | EXT2/EXT3 |
Journaling file systems | Veritas | EXT3, REISERFS |
File system for reading CDs | HSFS | ISO9660 |
System information | PROCFS | PROC |
MSDOS file system | PCFS | MSDOS or VFAT |
Threads
Solaris provides a threading model under which all processes are based on threads called light weight process (LWP). In Solaris a process will be associated with one or more LWP. In the case of Linux all threads are based on process and each thread will be mapped to a process.
Solaris not only supports the native threading model but also the POSIX threading model; POSIX threads are also supported on Linux. If the application uses Solaris native thread model then the application needs a lot of rework before it can be ported. If an application has nonstandard proprietary functions, then a Solaris Thread Library (STL) available on Linux can be used to ease the migration of that application. This library set provides the Solaris thread interfaces built upon the POSIX thread library. [1]
Absolute addresses
Applications that use hard-coded addresses might be difficult to port. Some applications use mmap()
call for page fixing. Since each platform has its own way of handling the program stack, heap, system libraries, and so on, such a call to mmap with a hard-coded address might result in a segment violation.
For instance, some addressing schemes ignore the high-order bits. So a hard-coded address of 0x80000000 might get translated to 0x00000000 and cause unintentional results. It is desirable to eliminate the usage of hard-coded addresses inside the code. If the application mandates the use of hard-coded addresses, then it has to be changed to use only an allowed memory range, which you can obtain from /proc/<application_pid>/map
.
Padding
When a structure or an aggregate data type is used in the code, each platform has its own way of laying out the constituent elements. This arrangement is dependant on structure, architectural limitation, efficiency, and compiler. Most of the process architectures cannot read data from odd addresses. Architectures such as PowerPC are inefficient in reading the data if it starts at an address not divisible by four. So for efficiency purposes, compilers add the so-called pad bytes. The function of these bytes is just to maintain the byte alignment so that the code can perform efficiently.
Example 6. Pad Bytes
struct struct1{ DATA2 d1; // Data of size 2 bytes DATA4 d2; // Data of size 4 bytes }struct1; |
For a 4-byte alignment platform, the compiler would lay out the previous structure in the following table:
Table 4. 4-byte alignment
d10 | d11 | P | P |
d20 | d21 | d22 | d23 |
... | ... | ... | ... |
Where P is padding, d10, d11 are bytes of d1, and d20-3 are bytes of d2.
In Table 5, the data structure occupies 8 bytes because of the padding. For a platform that does not need any alignment, the compiler would lay out the same structure differently.
Table 5. 8-byte alignment
d10 | d11 | d20 | d21 |
d22 | d23 | ... | ... |
... | ... | ... | ... |
Where d10, d11 are bytes of d1, and d20-3 are bytes of d2.
In Table 5, the data structure occupies only 6 bytes. When using functions such as sizeof()
to measure the size of the structure, this behavior has to be taken into account while porting. Most compilers provide options for disabling or enabling byte alignment (including padding) that can be used as per requirement. But disabling alignment will usually result in performance degradation.
Toolset
C/C++ Resources
On the Linux platform, the developmental tools are mostly the GNU Compiler Collection (GCC). GCC includes tools for both C and C++. GNU tools are also available for Solaris (Solaris GNU tools: www.sunfreeware.com). All the Solaris applications can be first compiled with the GNU versions instead of the proprietary Solaris versions. However, the construct differences between the Solaris make utility and the GNU make utility (gmake) have to be resolved. The respective documentations can be used for this purpose. Once makefiles are made to work with gmake, rebuild the application. The errors thrown from a dry run with this makefile
can be classified into command-line option errors and code errors. Command-line option errors are easy to resolve using the following table that contains the common options for the C compilers on Solaris and the GNU GCC. For additional options the respective compiler manual should be used.
Table 6. Differences between options for Sun Workshop and GCC
Sun Workshop | GCC | Description |
---|---|---|
-# | -v | Turns on verbose mode, showing each component as it is invoked. |
-Xa | -ansi | Specifies compliance with the ANSI/ISO standard. The GCC supports all ISO C89 programs. You can use '-std' to specify the particular version of ISO C. |
-xinline | -finline-functions | Inlines only those functions specified. |
-xa | -ax | Generates extra code to write profile information for basic blocks, which will record the number of times each basic block is executed. |
-xspace | -O0 | Does not optimize. |
-xunroll= | -funroll_loops | Performs the optimization of loop unrolling only for loops in which the number of iterations can be determined at compile time or run time. |
-xtarget = name | -b=machine | The argument machine specifies the target machine for compilation. In addition, each of these target machine types can have its own special options, starting with 'm', to choose among various hardware models or configurations. |
-xo | -O, -O1, -O2, -O3, -Os | Controls various sorts of optimizations. |
-O | Same as above | Controls various sorts of optimizations. |
-xmaxopt | Ensures that GCC does not use #pragma. | |
-xnolib | -nostdlib | Does not link any libraries by default. |
-fsingle | -fsingle-precision-constant | Treats floating point constant as single precision constant instead of implicitly converting it to double precision. |
-C | -C | Tells the preprocessor not to discard comments. Used with the '-E' option. |
-xtrigraphs | -trigraphs | Supports ISO C trigraphs. |
-E | -E | Preprocess all C source files specified and outputs that result to standard output or to the specified output file. |
-xM | -M | Runs only the preprocessor on the named C programs, requesting that it generate makefile dependencies and send the result to the standard output. |
-xpg | -pg | Generates extra code to write profile information suitable for the analysis program gprof. |
-c | -c | Directs the compiler to suppress linking with ld and to produce the .o file for each source file. |
-o | -o | Names the output file. |
-S | -S | Directs cc to produce an assembly source file but not to assemble the program. |
-xtemp | TMPDIR | Specifies the directory to use for temporary files if the TMPDIR environment is set. |
-xhelp=f | -help | Displays online help information. |
-xtime | -time | Reports the CPU time taken by each subprocess in the compilation sequence. |
-w | -q | Suppresses compiler warnings. |
-erroff= %none | -W | Displays the warning messages. |
-errwarn | -Werror | Makes all warnings into errors. |
Table 6: From Technical guide for porting applications from Solaris to Linux, version 1.0
Command-line option errors could also be thrown from the linkers (in the case code compilation was successful). The most common options of the SPARCworks linker and the functionality of the GNU linker are as follows:
Table 7. Differences between options in SPARCworks and GNU
Solaris Workshop | gld Option | Description |
---|---|---|
-a | -static | Enables the default behavior in the static mode and prevents linking with the shared libraries. The linker is creating executable and undefined symbols causing error messages. GNU ld has the option -static that also enables this behavior. |
-b | Achieves the equivalent of the GNU linker by not compiling the source code with the option -fPIC/-fpic. | |
-g | -g | Produces debugging information in the operating system native format. |
-m | (-M) | Prints a linker map. The -M option prints something comparable but with a different format and slightly different content. |
-s | -S/-s | Achieves the equivalent of the -s option by using -S with the GNU linker, which removes only the debugging information. |
-h name | -soname name | Sets name as the shared object name. With the GNU linker, the option -soname must be used. |
-o filename | -o filename | Places output in a file. This applies regardless of the type of output being produced. The default is to put an executable file in a.out |
-L directory | -Ldir | Adds directory dir to the list of directories. |
-R path | -rpath path | Specifies the search direction to the runtime linker. The GNU linker uses the option -rpath. |
Table 7: From Technical guide for porting applications from Solaris to Linux, version 1.0
The code-related error messages and warnings are mainly due to the difference in constructs and APIs semantics between the compilers. These errors should be resolved by referring to the compiler manuals.
(출처:http://ldn.linuxfoundation.org/article/migrating-solaris-applications-linux-0)