Removing files extracted with tar

I untarred an archive today that didn’t have its files stored in a root directory, but instead heaved the 603 odd files and directories all over my Downloads folder. This often happens when unzipping as well, so I decided to string a few commands together to get rid of the “flotsam”.

The first step to get rid of the offenders is to get the names of all the files in the archive:

    tar tf arch.tar.gz

The `t` argument tells `tar` to list the contents of the archive to standard output, including

    ./

Since we’re ultimately going to be sending files to `rm -rf`, it would be disastrous to leave `./` amongst the lines, since it will remove the contents of the entire directory. We’ll trim it (the first line of the output) from the list by specifying

    tar tf arch.tar.gz | tail +2

`tail` will ignore everything up until the n-th line when given `+n`, so we effectively ignore the first line by using `+2`. Now we can send each line to `rm -rf` using

    tar tf arch.tar.gz | tail +2 | xargs rm -rf

`xargs` runs `rm -rf` on each line that is piped to it, as if we specified each file manually. Once the files are removed we can `cd` to a new directory and extract the files within it, leaving the containing directory clean and tidy.

Categories: General, Linux Tags: , , , ,

Vim Diffing Files from Different GUI Folders

gVim has the built-in ability to diff files in the same Windows directory by selecting two or more files, right clicking, and selecting ‘Diff with Vim’. Sometimes, however, you want to compare two files in two drastically different folders and you don’t want to have to run vimdiff from the command line, specifying the two files separately, especially if they have completely different directory paths. To do so, right click the first file to be compared and select ‘Edit with Vim’. When the window comes up, enter the :vert diffsplit command to split the window vertically, or simply :diffsplit to split the window horizontally. Then right-click the second file of the comparison and select ‘Edit with existing Vim – <filename>’, where <filename> is the name of the first file you opened. This will open the second file in one pane of the existing gVim window, so now you can run :diffthis to compare the new file to the file that was already opened.

Categories: Vim, Windows Tags: , , ,

Generating an Unused Filename

To create a new filename, read the names of all the files of the directory the new file is to be written to. Find the “highest” filename; strings can initially be compared by length for efficiency. Once the “highest” filename is found, “increment” it, possibly wrapping the characters if required. The incremented filename will be unique.

Categories: General

Another Example of Range Termination

Continuing on from my previous post on Range Termination in Go, I’d just like to present another example I’ve come across on how it can be helpful, especially with declarative programming. The following code is for a Graph data structure I’m working on in Scala (available on my github). At the moment, without range termination, a definition for the boolean function, `isDirectedForest`, looks like this:

def isDirectedForest: Boolean =
  if ( this is cyclic )
    return false
  else {
    for ( v <- BFS over this from lastVertex )
      if ( ( this indegreeOf v ) > 1 )
        return false
    return true
  }

It works, but there are several problems. First, the most obvious, is that it’s not immediately clear what the function is actually doing. Secondly, there are several exit points; clean coding style encourages having only one exit point for each function. Third, for the loop, this is the only way we can actually achieve early termination (sorry, I refuse to use `break`). The range termination syntax I presented in the previous post can be used here to give a much more elegant definition for this function:

def isDirectedForest: Boolean = {
  var maxIndegreeIs1 = true
  if ( this isNot cyclic )
    for ( v <- BFS over this from lastVertex; maxIndegreeIs1 )
      maxIndegreeIs1 = ( this indegreeOf v ) <= 1
  return ( this isNot cyclic ) && maxIndegreeIs1
}

Which is a much more declarative definition of the function – we’re not defining it so much as we are describing what it does. The final (and only) return statement succinctly tells us what makes a graph a directed forest – it is such if the graph (this) is acyclic and the maximum indegree of any vertex is 1.

Binary Hash Tree

I’m not sure if this exists already, but it’s a straightforward data structure for maintaining a set of objects that don’t have any natural ordering, that is, they can’t be sorted and therefore can’t be put in a binary tree under normal circumstances. Take for instance a universe of furniture types, F, which contains all the different types of furniture in existence, and you want to maintain a subset of this universe, F = { f | f ∈ U }. The typical options would be linear array, hash table, or a tree to maintain the set. Lookup in the linear array would have to be performed using sequential sort, since there’s no natural ordering of the elements, so no sorting, so no binary sort, so lookup in the linear array has and O(n). A hash table would give a decided improvement initially, but since there will be so many members the initial gains will be quickly forgotten as clustering and long buckets (again unsortable) give rise to O(n) complexity once again. A tree is the next option, but the question is how to distribute the elements.

Elements of a balanced binary hash tree, with their hash subscripted to the right

The binary hash tree is simply a binary tree that has the hash of the element as its key. Theoretically, all hash codes of elements of a universe are unique, so theoretically no collisions can occur in this tree, as we are not mapping the hash codes to a domain, as would be the case in a hash table (in practice, we are bounded by the number of bits we use to hold the hash code). The data structure is very straightforward, and can be enhanced by balancing the tree to give O(log2(n)) time. We can also use this hash code “ordering” to sort the collection and then perform a binary search through the array (which is essentially the same as searching the binary tree, just using a linear data structure), which will give us O(log2(n)) again. Note that we can’t use this structure as a bucket for hashing, as all elements in a bucket have the same hash code already.

Range Termination in Go

I’ve had an simple idea for a while that I’d like to see implemented in modern languages – a terminating condition for iteration statements. Instead of breaking out of a loop with an `if` and `break` (cousin to `goto`, and just as bad in my opinion), include a terminating condition in the loop definition. I haven’t seen this feature implemented in any mainstream languages so far, and it seems to me to be a handy one to have. Take, for example, sequential sort with early termination in Java, using an iterator:

public static<T> boolean collectionContains( Collection<T> ts, final T e ) {
    int i = 0;
    for ( T t : ts ) {
        if ( e == t ) {
            break;
        }
        i++;
    }
    return ts.size() == i;
}

I would prefer to be able to write something like the following:

public static<T> boolean collectionContains( Collection<T> ts, final T e ) {
    int i = 0;
    for ( T t : ts; e == t ) {
        i++;
    }
    return ts.size() == i;
}

A very simple concept, but I feel it tidies up the code dramatically, and is a feature I feel they should have included when they first introduced the extended for loop.

I wrote a small hack to introduce the feature into the current Go compiler. I’ve only recently looked into learning Go, but it’s philosophy appeals to me and it’s thinking seems to lend itself to the idea I’m presenting here. The following are changes I’ve made to the Go compiler to add on the range termination:

In go/src/cmd/go.y (the lexer and parser), line 604, in rule 'for_header':
}
|   range_stmt ';' osimple_stmt
    {
        // osimple_stmt is the early termination condition
        $1->ntest = $3;
    }
|   range_stmt

This simply checks if a semi-colon appeared after the range, and if it does, adds the conditional that appears after it to the range node (range nodes don’t have the `ntest` member populated initially). Like I say, this is only a hack, if I were to actually implement this change I wouldn’t modify the range node after it was created, I would instead modify it during its  construction in the range_stmt rule. Anyway, moving on:

In go/src/cmd/range.c (typechecking for ranges), line 99, in function 'void walkrange(Node *n)':
    Node *fn, *tmp;
    Node *ntest;
    NodeList *body, *init;

This simply adds the `ntest` node for storing the current contents of `n->ntest` that we created above, as the contents of `n->ntest` are overwritten in the `switch` further on in the function.

Line 131:
    }

    ntest = N;
    if (n->ntest) {
        typecheck(&n->ntest, Erv);
        ntest = n->ntest;
    }

    switch(t->etype) {

Here we initialise `ntest` to n->ntest if it exists, after we check its validity.

Line 160:
        n->ntest = nod(OLT, hv1, hn);
        if (ntest != N) {
            typecheck(&n->ntest, Erv);
            n->ntest = nod(OANDAND, n->ntest, ntest);
        }
        n->nincr = nod(OASOP, hv1, nodintconst(1));

In the first switch statement, in the case that we are iterating over an array, `n->ntest` is overwritten and the terminating condition is set to be when the index is greater than or equal to the size of the array. So to add our extra terminating condition (if there is one), we first check the validity of the existing `n->ntest` (this would be carried out at the bottom of the function, but since I’m modifying the node I’ll do it here as well) and then “AND” it with our terminating condition (stored in `ntest`). The approach is the same for map and string, but not for channel, as the channel case doesn’t actually modify `n->ntest`, so we needn’t bother with it. Again, this is only a quick hack to implement the feature.

Line 191:
        n->ntest = nod(ONE, nod(OINDEX, hit, nodintconst(0)), nodnil());
if (ntest != N) {
typecheck(&n->ntest, Erv);
n->ntest = nod(OANDAND, n->ntest, ntest);
}

Line 251:
        n->ntest = nod(ONE, hv1, nodintconst(0));
        if (ntest != N) {
            n->ntest = nod(OANDAND, n->ntest, ntest);
            typecheck(&n->ntest, Erv);
        }
        n->ntest->ninit = list(list1(nod(OAS, ohv1, hv1)), a);

Now all that’s left to do is compile the project with `go/src/all.bash`. If you set your `$GOROOT` to the `go` directory you were working in you should have a new `6g` that will compile for loops with range termination. For instance:

package main

import "fmt"

func main() {
        m := map[string]float64{"0":0.0, "1":1.0, "pi":3.1415, "4":4.0}
        for k, v := range m; v != 1.0 {
                fmt.Printf( "key %s, value %g\n", k, v )
        }
        s := "abcde"
        for i, c := range s; c != 'c' {
                fmt.Printf( "\tfound value %q at %d\n", c, i )
        }
        var a [5]string
        a[0] = "a"
        a[1] = "b"
        a[2] = "c"
        a[3] = "d"
        a[4] = "e"
        for i, c := range a; c != "d" {
                fmt.Printf( "found value %q at %d\n", c, i )
        }
}
Output:
key pi, value 3.1415
key 4, value 4
key 1, value 1
found value 'a' at 0
found value 'b' at 1
found value 'c' at 2
found value "a" at 0
found value "b" at 1
found value "c" at 2
found value "d" at 3
Categories: Code, Go

Haskell Mindfunk

Just a line of Haskell that one of my lecturers gave me once, which defines a variable in terms of itself:

x = 1 : zipWith (*) [1..] x

Categories: Code, Haskell Tags: , , , ,

Python Substrings

Unlike a C string, Python strings cannot be changed. Assigning to an indexed position in the string results in an error. An Informal Introduction To Python

"Hello World!"[0]     # Returns 'H'
"Hello World!"[1]     # Returns 'e'
"Hello World!"[5]     # Returns ' '
"Hello World!"[6]     # Returns 'W'
"Hello World!"[11]    # Returns '!'
"Hello World!"[12]    # Causes IndexError: string index out of range
"Hello World!"[-1]    # Returns '!'
"Hello World!"[-2]    # Returns 'd'
"Hello World!"[-6]    # Returns 'W'
"Hello World!"[-7]    # Returns ' '
"Hello World!"[-11]   # Returns 'e'
"Hello World!"[-12]   # Returns 'H'
"Hello World!"[-13]   # Causes IndexError: string index out of range
"Hello World!"[0:2]   # Returns 'He'
"Hello World!"[1:5]   # Returns 'ello'
"Hello World!"[5:1]   # Returns ''
"Hello World!"[3:]    # Returns 'lo World!'
"Hello World!"[:4]    # Returns 'Hell'
"Hello World!"[-5:-3] # Returns 'or'
"Hello World!"[-5:]   # Returns 'orld!'
"Hello World!"[:-4]   # Returns 'Hello Wo'

Perl Strings

'Hello World!'      # Returns 'Hello World!', supports escaping of ' and \
'It\'s\\It is'      # Returns 'It's\It is'
'Don\'t "escape"'   # Returns 'Don't "escape"'
'Hello\nWorld!'     # Returns 'Hello\nWorld!'
"Hello World!"      # Returns 'Hello World!', supports escaping multiple
                    # characters, including \n and \t, and interpolation
"Hello\nWorld!"     # Returns 'Hello
                    #          World!'
'Hello' . ' ' . 'World!' # Returns 'Hello World!'
$hi = "Hello";
# The two types of interpolation are equivalent. The first type is used
# for variable names that would cause ambiguity.
"${hi} World!"      # Returns 'Hello World!'
"$hi World!"        # Returns 'Hello World!'

Python Strings

A ‘ and ” are interpreted as being the same in Python. “”” delimits a multi-line string. There is no reason to use one style of string delimiter over the other, although I personally follow the style presented by Will Harris in StackOverflow (see bottom for summary).

'Hello World!'      # Returns 'Hello World!'
"Hello World!"      # Returns 'Hello World!'
"""Hello World!"""  # Returns 'Hello World!'

'Hello              # Error: ' doesn't support unescaped newlines
World!'

'Hello\nWorld!'     # Returns 'Hello\nWorld!'

"Hello              # Error: " doesn't support unescaped newlines
World!"

"Hello\nWorld!"     # Returns 'Hello\nWorld!'

"""Hello            # Returns 'Hello\nWorld!'
World!"""

"""Hello\nWorld!""" # Returns 'Hello\nWorld!'

'Hello' + ' ' + 'World!'      # Returns 'Hello World!'

'%s %s!' % ('Hello', 'World') # Interpolation, returns 'Hello World!'
Will Harris’ style for Python string, summary
  • Single quotes for small symbol-like strings.
  • Double quotes around strings that are used for interpolation or that are natural language messages.
  • Triple double quotes for docstrings and raw string literals for regular expressions.