Monday, February 24, 2014

Python code to analyze U.S. Social Security baby names database

On prooffreader.com, I've blogged a few analyses of the U.S. Social Security Administration baby names database; those wishing to replicate my results or get a leg up towards more and better analyses are welcome to download the raw data and view the code in this IPython notebook. I've only been doing Python for a couple of months, and I'm not a professional programmer, but the code works, and I'm totally open to suggestions and improving my skills. Warning: I know python programmers like to be pythony or pythonesque or pythonish or something... I'm working on it, I promise!

Just a screenshot of the IPython notebook, because the post looked lonely with only text.
• • •

Wednesday, February 12, 2014

Excel VBA macro to copy cell values downwards only into contiguous blank cells

Like many non-programmers, I started out doing everything in Excel, and frustration forced me to learn quite a bit of VBA to extend its possibilities. Once I was introduced to Python, it was like a breath of fresh air, but Excel definitely still has its uses. There's a lot to be said for instant visual feedback about what you're doing.

Over the years I've collected lots of snippets of code to do things I wish Excel did out of the box, so when I come across one I think may be useful to others, I'll make them presentable (without training, I can be lazy when it comes to coding) and post them here. I always build at least a simple undo function into my macros, as well as an explanation of the macro in a message box up from with a chance to exit.

Obligatory caveat: if you use my macro, I make no guarantees the world will not explode, or your data will not be irretrievably deep fried. Save early, save often.

If you don't know how to install an Excel macro, never fear! I'm working on a really simple visual tutorial that I should post here soon... ish.

I used this macro tonight when I copied and pasted the Wikipedia List of countries and territories by land and maritime borders for a possible blog idea (stay tuned at prooffreader.com!) After munging with regex, the data looked like a decent Excel table, but the leftmost column only had values in the topmost cell. So my macro just fills all the cells down. Here's a GIF that explains it way better than words:


EDIT 13 Feb 2014 11:45 AM EDT: Thank you to BornOnFeb2nd in the Reddit Excel Forum for pointing out that this macro would work much faster if I read everything into an array instead of interating over cells; I have made the changes. I first wrote this a couple of years ago, so I guess I figured I wouldn't be using it on huge datasets or I was too lazy to program in how to deal with an entire selected column -- but heck, if I'm going to stick it on my blog, I think it behooves me to make it efficient. Even with only a few hundred rows, the speed difference is palpable.

And here's the code:

Sub PropagateDownwardIfBlank()
' By Prooffreader, prooffreaderplus.blogspot.com
 
Dim ToExit As Integer
ToExit = MsgBox("INSTRUCTIONS: Every value in the current selected column " & _
    "will be pasted into every contiguous blank cell below it. If an entire " & _
    "column is selected, the procedure will stop if the first time it encounters " & _
    "a blank cell with no non-blank neighbors to the left or right. You will be " & _
    "able to undo the results. Do you wish to continue?", vbYesNo)
If ToExit = vbCancel Then Exit Sub
 
Dim Selected As Range
Set Selected = Selection
 
If Selected.Columns.Count > 1 Then
    MsgBox ("More than one column is selected. Procedure halted.")
    Exit Sub
    End If
 
If Selected.Rows.Count = 1 Then
    MsgBox ("Only one row is selected. Procedure halted.")
    Exit Sub
    End If
     
Dim RowsUsed As Long
RowsUsed = Selected.Rows.Count
 
'If an entire column is selected, figure out where to stop
'based on neighboring cells so the user does not end up writing
'a million-cell array to memory for no reason
If Selected.Count = Rows.Count Then
    Dim LastRow As Long
    LastRow = Selected.Cells(RowsUsed, 1).End(xlUp).Row
    Dim StopCondition As Boolean
    Do While StopCondition = False
        ActiveSheet.Cells(LastRow, Selected.Column).Select
        If Selected.Column = 1 Then
            If ActiveSheet.Cells(LastRow, Selected.Column).Offset(0, 1).Value = "" _
                Then StopCondition = True
            Else
                If ActiveSheet.Cells(LastRow, Selected.Column).Offset(0, 1).Value = "" _
                    And ActiveSheet.Cells(LastRow, Selected.Column).Offset(0, -1).Value = "" _
                    Then StopCondition = True
            End If
            If StopCondition = False Then LastRow = LastRow + 1
        Loop
    Set Selected = Range(Selected.Cells(1, 1), ActiveSheet.Cells(LastRow - 1, Selected.Column))
    End If
 
Dim SelectedArray() As Variant
Dim UndoArray() As Variant
SelectedArray = Selected
UndoArray = Selected
Dim StoredValue As Variant
Dim i As Long
 
For i = 1 To UBound(SelectedArray, 1)
    If SelectedArray(i, 1) <> "" Then
        StoredValue = SelectedArray(i, 1)
        Else
            SelectedArray(i, 1) = StoredValue
        End If
    Next i
 
Selected.Value = SelectedArray
 
Dim ToUndo As Integer
ToUndo = MsgBox("Do you wish to keep the results? If you select No, the contents " & _
    "of the selection will be returned to their original state. This is the only " & _
    "chance to undo.", vbYesNo)
If ToUndo = vbNo Then Selected.Value = UndoArray
 
End Sub

I realize it's possible to do this within Excel, but I find it a lot of steps to remember for something I only do every now and then. Your mileage may vary.
• • •