Kaija Gahm: Changing individual column names

Kaija Gahm

Changing individual column names

How to rename individual columns in a data frame, based on the previous names and without using previous names.

Author

Affiliation

Kaija Gahm

Published

July 11, 2018

Citation

Gahm, 2018

This post has been slightly modified from its original form on woodpeckR.

Problem

How do I change the name of just one column in a data frame?

Context

This is a simple one that keeps coming up. Sometimes, whoever put together my data decided to capitalize the first letter of some column names and not others. Sometimes I’ve merged several data frames together and I need to distinguish the columns from each other.

Say my data frame is p8_0 and I’d like to change the column Area to area.

In the past, I’ve done this in one of two ways. Either I change all of the column names at once (if all of them need to be changed), or I use numerical column indexing. The latter makes a lot more sense if I have a lot of columns to deal with, but it means I have to know the number of the column whose name I have to change.

To find this out, I first have to look at all of the column names. Okay, no problem.

# See column names and numerical indices
names(p8_0)

  [1] "FID"                     "Join_Count"             
  [3] "TARGET_FID"              "Field1"                 
  [5] "barcode"                 "stratum"                
  [7] "lcode"                   "sdate"                  
  [9] "utm_e"                   "utm_n"                  
 [11] "snag"                    "OBJECTID"               
 [13] "uniq_id"                 "aa_num"                 
 [15] "AQUA_CODE"               "AQUA_DESC"              
 [17] "pool"                    "Area"                   
 [19] "Perimeter"               "bath_pct"               
 [21] "max_depth"               "avg_depth"              
 [23] "sd_depth"                "tot_vol"                
 [25] "area_gt50"               "area_gt100"             
 [27] "area_gt200"              "area_gt300"             
 [29] "avg_fetch"               "shoreline_density_index"
 [31] "econ"                    "sill"                   
 [33] "min_rm"                  "max_rm"                 
 [35] "len_met"                 "len_prm_lotic"          
 [37] "pct_prm_lotic"           "num_lotic_outl"         
 [39] "len_prm_lentic"          "pct_prm_lentic"         
 [41] "num_lentic_outl"         "pct_aqveg"              
 [43] "pct_opwat"               "len_terr"               
 [45] "pct_terr"                "pct_aq"                 
 [47] "len_wetf"                "pct_prm_wetf"           
 [49] "pct_terr_shore_wetf"     "len_wd"                 
 [51] "wdl_p_m2"                "num_wd"                 
 [53] "scour_wd"                "psco_wd"                
 [55] "len_revln"               "rev_p_m2"               
 [57] "num_rev"                 "pct_terr_shore_rev"     
 [59] "pct_prm_rev"             "area_tpi1"              
 [61] "pct_tpi1"                "area_tpi2"              
 [63] "pct_tpi2"                "area_tpi3"              
 [65] "pct_tpi3"                "area_tpi4"              
 [67] "pct_tpi4"                "sinuosity"              
 [69] "year_phot"               "NEAR_TERR_FID"          
 [71] "NEAR_TERR_DIST"          "NEAR_TERR_CLASS_31"     
 [73] "NEAR_TERR_CLASS_15"      "NEAR_TERR_CLASS_7"      
 [75] "NEAR_TERR_CLASS_31_N"    "NEAR_TERR_CLASS_15_N"   
 [77] "NEAR_TERR_CLASS_7_N"     "NEAR_TERR_HEIGHT_N"     
 [79] "NEAR_FOREST_FID"         "NEAR_FOREST_DIST"       
 [81] "NEAR_FOREST_CLASS_31"    "NEAR_FOREST_CLASS_15"   
 [83] "NEAR_FOREST_CLASS_7"     "NEAR_FOREST_CLASS_31_N" 
 [85] "NEAR_FOREST_CLASS_15_N"  "NEAR_FOREST_CLASS_7_N"  
 [87] "NEAR_FOREST_HEIGHT_N"    "year.p"                 
 [89] "depth.p"                 "current.p"              
 [91] "gear.p"                  "stageht.p"              
 [93] "substrt.p"               "wingdike.p"             
 [95] "riprap.p"                "trib.p"                 
 [97] "snagyn"                  "area_le50"              
 [99] "area_le100"              "area_le200"             
[101] "area_le300"              "pct_area_le100"         
[103] "pct_area_le50"           "pct_area_le200"         
[105] "pct_area_le300"          "stratum_name"

Okay, yes problem.

It’s not that hard to see that Area is the 18th column. But there are a bunch of columns that start with NEAR_TERR_ and NEAR_FOREST_ that would be easy to confuse. And what if I later modify my data cleaning script, insert new columns, and mess up the numerical indexing?

Solution

The first solution I came up with is simple but pretty clunky. At least it solves the problem of numerical indices getting misaligned. And if you mistype the column name or try to change the name of a column that doesn’t exist, it doesn’t throw an error.

# Change "Area" column name to "area"
names(p8_0)[names(p8_0) == "Area"] <- "area"

This works well, but it gets annoying if you have more than one column name to change. Every column requires typing names(p8_0) twice, and that adds up to a lot of lines of code.

To no one’s surprise, dplyr has a more elegant solution, using the rename function.

# Load dplyr 
library(dplyr) 

# Rename variable (new name first) 
p8_0 <- p8_0 %>% 
  rename(area = Area)

A quick note on rename: somewhat counterintuitively, the new name comes before the old name. General example:

# General syntax for rename 
#df %>% 
#  rename(newname = oldname)

rename saves a whole bunch of keystrokes and also scales very well to multiple columns.

Let’s say I wanted to change Area and Perimeter to area and perimeter, respectively, and I also wanted to change the rather clunky shoreline_density_index to sdi. And while we’re at it, snagyn, a factor variable that indicates whether a large piece of wood was present at the site (“yes” or “no”), might be clearer as snag_yn, and sinuosity could be shortened to sinu

Without dplyr:

# Change each column name individually
names(p8_0)[names(p8_0) == "Area"] <- "area"
names(p8_0)[names(p8_0) == "Perimeter"] <- "perimeter"
names(p8_0)[names(p8_0) == "shoreline_density_index"] <- "sdi"
names(p8_0)[names(p8_0) == "snagyn"] <- "snag_yn"
names(p8_0)[names(p8_0) == "sinuosity"] <- "sinu"

With dplyr:

# Change any column names you want to, all at once
p8_0 <- p8_0 %>% rename(area = Area, 
                perimeter = Perimeter,
                sdi = shoreline_density_index, 
                snag_yn = snagyn,
                sinu = sinuosity)

So pretty. As an added bonus, you’re saved from both quotation marks and the dreaded double equals sign (!!!).

In case anyone was counting, that’s 102 characters vs. 238 (spaces not included). 116 if you include loading dplyr, but you already had it loaded because you’re using it throughout your code, of course.

Outcome

Now I can rename only the columns I want, by name instead of numerical index, without fear of having to change everything if I insert or delete some columns later on.

Resources

More thoughts on changing individual variable names, including a couple other packages if you feel like trying them: https://stackoverflow.com/questions/7531868/how-to-rename-a-single-column-in-a-data-frame

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Gahm (2018, July 11). Kaija Gahm: Changing individual column names. Retrieved from https://kaijagahm.netlify.app/posts/2018-07-11-changing-individual-column-names/

BibTeX citation

@misc{gahm2018changing,
  author = {Gahm, Kaija},
  title = {Kaija Gahm: Changing individual column names},
  url = {https://kaijagahm.netlify.app/posts/2018-07-11-changing-individual-column-names/},
  year = {2018}
}

Changing individual column names

Author

Affiliation

Published

Citation

Problem

Context

Solution

Outcome

Resources

Footnotes

Corrections

Citation