University of Minnesota
Statistics
info@stat.umn.edu
612-625-8046


School of Statistics' home page.

Creating Web Pages

Introduction

One of the most basic distinctions among files is whether the file is a text file or a binary file. Text files contain readable text characters, and can be downloaded to a Mac or PC to be printed or imported into a word processing program.

This notes will help you to identify and handle files when you are working in Unix system. A few lines at the end will cover some basic types for Mac or PC.

If you are not sure whether a file is text or binary, try typing more followed by the file name. If the file is readable, it's probably text; if it's just a jumble of characters, it's probably binary.

The individual bytes in a binary file represent program instructions, pixels, or other information. A binary file either is a program you can run, or it must be interpreted by a specialized program (e. g., a picture encoded in the GIF or a movie in SGI format).

Since binary files must usually be transmitted perfectly to be useful, they are often encoded so they can be sent by regular mail or downloaded by a web browser. In most cases the browser takes care of that for downloading and mailing agents for transport as attachments. Some standard types of binary files are:

	Archive -- A group of files packaged together
	Compressed -- File(s) with redundant information compressed
	Executable -- Program in machine language, ready to run
	Graphics -- Pixel data, colors, etc.
	PostScript -- Text and printer commands
	Sound -- Sound wave, patch, etc.

It is certainly advisable to never double click a file of this type in a personal computer if you are not sure about the source of the file since double clicking will make that file to execute. In cases like this one should open the file trough the application that applies to the format.

Utilities to manipulate files can be roughly cataloged as: the ones that reduce the size of the data (and create .zip or .Z formats), the ones that create archives by packing several files (and create .tar or .arj) and the ones that change the file's format, like from ASCII to PostScript or from binary to encoded.

Figuring out what to do with a file.

It isn't always obvious that a file needs further processing before it can be used. The extensions to the filename often indicate what needs to be done, or what type of file it is.

If the file name has no extensions, look at it with more to see if it is a text file, or use the UNIX file command. A shell script file, for example will look like:

 >file /usr//lib/firefox/firefox.sh
 /usr//lib/firefox/firefox.sh: Bourne shell script text

and compiled binary like:

 >file /usr/lib/firefox/firefox-bin
 /usr/lib/firefox/firefox-bin: ELF 32-bit LSB executable, Intel 
 80386, version 1 (SYSV), for GNU/Linux 2.6.4, dynamically linked 
 (uses shared libs), for GNU/Linux 2.6.4, stripped

The following table lists some common extensions:

 Extension	Process by running

 .arc		compress/archive(PC)
 .arj		arj(PC)
 .au    audio
 .exe		PC/DOS (may be self-extracting)
 .gz		gzip (Unix)
 .hqx		BinHex (Mac encode)
 .lhz		lharc(PC)
 .pdf   portable document format (all platforms)
 .ps    PostScript language (all platforms) 
 .sea		Run on Mac (self-extracting)
 .sit		Stuffit(probably for Mac)
 .tar		tar archive
 .tex   plain text - ASCII 
 .uue		uudecode (Unix)
 .wax   audio image
 .Z		compress (Unix, PC)
 .zip		pkunzip(Unix/PC)
   

Some files have more than one extension. Process the file one extension at a time, starting with the last extension used.


For example, cubs.gif.uue.Z should be uncompressed first and then decoded.

Compressing a file.

Compressing files saves disk space. In Unix one can compress a file using gzip. For that type:

 >gzip filename

The compressed file will have the extension .gz. To get better compression one can use the flag --best

 >gzip --best filename

For more options, see the gzip man pages. You might find gzip --recursive useful. It compresses all the files in a directory and all its subdirectories.

Restoring a compressed file.

To uncompress a file, type:

 > gunzip my_file.gz

The gunzip command will uncompresses both formats, the old compress format (extension .Z) and the newer gzip format (extension .gz). Note that gunzip will convert the file to its decompressed format. That is, the file my_file.gz will not longer be in your directory but its decompresed version -- my_file --, will be instead.

For more information, see the gunzip man pages.

Reading a compressed file without uncompressing it.

The unzip -c command reads (.Z or .gz) files and writes their decompressed versions to standard output, leaving the files themselves unchanged. For example,

 >unzip -c filename.Z | more

reads a compressed file without changing it.

Note that the -c of unzip provides useful results only with regular text files.

Packaging files with tar.

Packaging files with tar will create a tar archive from separate files. To create a tar archive, use the command tar -cvf (c for create, v for verbose explanation of its processing, and f means use the named file for the archive) and then the name of the archive followed by the files to be put into the archive.

For example, if you have the files homework1.txt and homework2.txt, you can package these in an archive called hwork.tar by typing

 >tar -cvf hwork.tar *.txt

Since tar does no compression, you can then type:

 >gzip --best hwork.tar 

and the result will be a compressed file called hwork.tar.gz.

Note that tar does not remove the original files. It just places a copy of them in the archive.

You can archive a full directory structure by using the directory's name as tar input:

 >tar cvf my_files.tar my_files

and compress the tar file to save space:

 >gzip my_files.tar

Unpacking a tar file archive.

Before unpacking a tar file archive, first see what it contains by listing the files in the archive using:

 >tar tvf my_directory.tar

Once you had one idea of what the archive contains then changing tvf to xvf will extract all the files. To extract the files make sure you are in the directory where you want to archive to be extracted ant then type:

 >tar xvf my_directory.tar

or

 >tar xf my_directory.tar

if you do not want tar to list the files as it stores them.

Many tar archives are transferred or saved as compressed files. If you have a file of a combined type such as .tar.gz first you will need to decompress the file and then "untart" it. So, before the tar command use gunzip to decompress the file:

 >gunzip my_directory.tar.gz

Compressing and uncompressing files for a PC.

To compress and uncompress files in Unix for the PC use the zip program.

To unzip a file, type unzip followed by the name of the compressed (.zip) file; e.g., zip prog.zip. The original prog.zip file is not altered.

To make your own .zip archive, type zip -a followed by the name of the archive file you wish to create, and then the names of files to be included in the archive.

 >zip -a prog.zip prog*.*

will include all files whose names begin with prog.

Most PC are able to extract archived files with their standard system software, but if you need some updates for certain file types or old system versions see the Windows Internet Software page from University Academic Computing Services.

Compressing and uncompressing files for a Macintosh.

The Macintosh unix system can read, uncompress and load tar.gz created in Unix systems, but web browsers and other applications may use spceific formats to move files back and forth this machines.

The two most popular programs for the Macintosh are BinHex and Stuffit. BinHex is an encoding and decoding program which turns binary files into text files so they can be transmitted.

Stuffit is a multipurpose compression program which handles files processed by BinHex and a variety of other methods. Stuffit Lite is available as shareware from ftp sites and bulletin boards as well as from the Macintosh Internet Software from University Academic Computing Services .

Assuming you have downloaded the Stuffit archive to your Mac, run Stuffit and choose Open Archive from the File menu. From the Open dialog box, choose the archive you want to work with; specify the file name for the unstuffed file or click on Unstuff or Unstuff All.

To create a Stuffit archive, run Stuffit and choose New Archive from the File menu. Fill in the name of the archive and where it is to reside in your file system.

Once you have created an archive, you may add files or remove them from the archive.

Some history: how things work

Most web browsers and mail agents take care of converting binary files to formats suitables for transferring from one system to another. Once the file is received, the application recognize the file's format and proceeds to open it with the required software.

In the past, for a binary file to be transfered it had to be encoded at the local system and decoded at the target system. The unix programs uuencode and uudecode where used for that. uuencode will convert a binary file to ASCI text and uudecode will convert it back to binary.

You can test and see the resoults with any file, such a s a plain text file used in the example. To encode the test.tex file type:

 >uuencode test.tex test.tex > test.tex.uue

and to uncode it back to its original format:

 >uudecode < test.tex.uue

Please pay attention to the file's name used twice for uuencode and note that the decoded file will be named as the name of the file used for the uudecode command with the file type .uue removed.

Viewing and printing different file's formats

PS (PostScript) is a file format created by a page description programming language. The file type created by PostScript language applications is .ps. PDF (Portable Document Format) is a format used to transmit electronic copies of documents. This are self-contained files and will look the same in any printer or terminal. The file type is .pdf.

Neither .ps nor .pdf files are "readable" text files. That means you can not see the text with the more command -- iff you more the file it will look like garbage -=- or search them with grep or an editor. They are, however, printable files because our printers understand the PostScript language, which may not be the case for all printers, in particular old ones. To view this type of files in Unix you need to use x-windows applications such as xpdf, acroread, gs, or ghostview.

There are Unix utilities such as pdf2ps and ,ps2pdf that will convert PDF and PS files from one format to another.

In addition, if, for example, you use latex, latex will generate .dvi type files. You have to use dvips or xdvi if you want to print a .dvi file. Do not attempt to print a .dvi type file because you will get several pages of unreadable (garbage) text! The dvips command will convert the file to a .ps format. Use the -o option of dvips if you want to save the file instead of printing it.

 >dvips -omy_ps_file.ps my_file.dvi

There is another interesting converter for printable files if you want to save paper resources while making working copies of your documents. Its name is a2ps. In principle, a2ps was created to convert ASCII to PostScript as well as incorporating a number of options about the final page format (landscape, borders, margins, etc.) but now it handles .ps and .pdf formats as well. By default, a2ps will print your input file reducing the size of the page to fit two pages in one. Give it a tray - It helps to save trees!

You can move text files as well as .dvi, .ps, .pdf, etc. files from one computer to other. See information about that on the Logging into the Statistics Network. document.

You can also email or receive different file types as attachments. Most mailers will handle the attachments properly, but some old Unix programs such as elm will display the full attachment as part of the message text. Be careful not to print this messages from elm or other old mailers because you will send pages and pages of unreadable text to the printer.

New Location

The Statistical Clinic now operates in two locations, in Ford Hall on the Minneapolis campus for CLA consulting, and in McNeal Hall on the St. Paul campus for all other units.

Status of Free Statistical Consulting

Priority for our no cost consulting is limited to researchers from sponsoring units, currently CLA and CVM. Other researchers may be able to be seen on a limited basis depending on availability.

To ensure availability for researchers from your college, department, or graduate program, your unit can become a sponsor of this service by funding the portion of our costs proportional to your usage.

For specific projects, the best way to ensure availability of service is to include funding for statistical support in grant proposals.

Contact the Consulting Manager, Aaron Rendahl at arendahl@stat.umn.edu or 612-625-1062 with any questions.