With RPM’s instructions and tip video open for reference, I opened my ARST5100 virtual machine in VirtualBox. Under the Places dropdown menu, I opened the Home Folder and created a new folder named PeDALS. I then opened a Firefox browser and went to the address http://mas.clayton.edu/collections/PeDALS/PeDALS.zip. A dialog box opened giving me the option to either Open or Save the file. I chose the Save File option, which downloaded the compressed file to the Downloads folder. I used the cut command to remove the zip file from the Downloads folder and pasted it into the PeDALS folder located in my home (hoswald) folder.
I double-clicked on the zip file in the PeDALS folder in order to open it with Archive Manager. This gave me a set of options at the top of the page, including Extract. I chose to extract all files, and unchecked the Overwrite existing files box under Actions. I then clicked the Extract button. This process created a folder named PeDALS within the home/hoswald/PeDALS directory of files. In order to make the files as accessible as possible, without multiple layers of folders, I moved all of the files within the PeDALS folder into the upper PeDALS directory and deleted the extraneous folder. This made the full path hoswald/PeDALS.
I opened the command line by going to the Applications dropdown menu and opening Terminal. I input the following commands in order to create an inventory of file names in the PeDALs folder.
$ cd ~
$ find PeDALS -exec basename {} \; > inv_filenames_hlo.txt
Because I copied and pasted the second command from RPM’s instructions, I did need to edit the line to read as a hyphen as opposed to an en dash.
I then attempted to create a list of the directories into which the files are organized using the following commands.
$ cd ~
$ find PeDALS -exec > dirname {} \; > inv_dirlist_hlo.txt
However, at this time, I received a long list of documents followed by the notation: Permission denied. I went to the text file created out of this process, and it was blank. At this point, I reviewed both the assignment and the video from the September 7 ARST 5100 class. I noticed there was an extra > in the command that was not used in the command to create an inventory of filenames. I removed the extra > and the command worked.
In order to sort and find only unique file directories, I used the command:
find PeDALS -exec dirname {} \; |sort |uniq > inv_dirlist_hlo.txt
I also realized it would be useful to see the inventory of filenames in some sort of order, so I created a second, sorted text file using the command:
find PeDALS -exec basename {} \; |sort > inv_filenes_hlo1.txt
Now that I have only unique directories and a sorted inventory, I determined the number of lines of each of the files, using the command:
$ wc –l inv_filenames_hlo.txt
$ wc –l inv_dirlist_hlo1.txt
I found I have 598 lines in the inv_filename_hlo1.txt file and 203 lines in the inv_dirlist_hlo.txt file.
To characterize the types of files found in the collection, I underwent a number of steps, following RPM’s directions as seen below.
1. I switched to the parent directory of PeDALS.
$ cd /home/hoswald
2. I then created a list of the filenames without the directory
$ find PeDALS -exec basename {} \; > inv1.txt
3. Eliminated (imperfectly) lines that are likely not file names.
$ cat inv1.txt | grep '\.' > inv2.txt
4. Some files used a period elsewhere in the filename, so I used the stream editor command (sed).
$ cat inv2.txt | sed 's/^[^.]*\.//g' > inv3.txt
5. I stripped the file name, hopefully leaving just the extension.
$ cat inv3.txt | sed 's/^[^.]*\.//g' > inv4.txt
6. I sorted the list.
$ sort inv4.txt > inv5.txt
7. I removed duplicates
$ uniq inv5.txt > inv6.txt
8. I changed the name inv6.txt to inv_hlo.txt using the right-click, Rename option.
At this point, I opened the txt file to view my completed work. It showed two lines that appeared to be errors.
Both the highlighted 3.xls line and the 1_2009-11-14).doc line were left in after the removal of lines with at least two ‘.’s. In order to present the most accurate final product, I copied the document and renamed it inv_hlotest.txt.
I then repeated the command line:
$ cat inv_hlotest.txt | sed 's/^[^.]*\.//g' > inv_hlotest2.txt
This stripped the remainder of these two filenames, but I still needed to sort and then ensure only unique file extensions were listed. I used the following command lines:
$ sort inv_hlotest2.txt > inv_hlotest3.txt
$ uniq inv_hlotest3.txt > inv_hlofinal.txt
Finally, I did a quality control check by doing a word count on both inv_hlo.txt and inv_hlofinal.txt using the following command lines:
hoswald@ARST5100:~$ wc -l inv_hlo.txt
29 inv_hlo.txt
hoswald@ARST5100:~$ wc -l inv_hlofinal.txt
27 inv_hlofinal.txt
No comments:
Post a Comment