python - Most efficient method for updating multiple columns in a single dataframe row

Question

Welcome To Ask or Share your Answers For Others

python - Most efficient method for updating multiple columns in a single dataframe row

asked Jan 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Most efficient method for updating multiple columns in a single dataframe row

line_profiler is showing me the surprising (to me) result that updating two columns in a single row is executed faster as two statements rather than one combined statement.

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   696      6907   42029943.0   6085.1      4.7    df_work.loc[self.iRow, 'status'] = 'X'
   697      6907   68856814.0   9969.1      7.7    df_work.loc[self.iRow, 'clock'] = self.dClock
   698      6907  178155598.0  25793.5     19.9    df_work.loc[self.iRow, ['status', 'clock']] = ['L', self.dClock]

Lines 696 and 697 take a combined 11 secs vs 18 secs for the equivalent line 698 so 2 separate updates are 40% faster than a single update statement. I see this pattern repeatedly. I assumed the single update would run faster and before I revert my code back I want to check if there is an even more efficient method that updating one column at a time within a row. Thanks!

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-01-24T02:55:09+0000

After future research the solution was to switch to iat instead of loc.

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   673      6907    5209397.0    754.2      1.7  df_work.iat[self.iRow, cols_work['clock']] = self.dClock

The per hit time decreased from 9969 to 754.

I initialized the dictionary to convert the column name to the column number for use with iat as follows:

    cols_work = {}
    for col in df_work.columns:
        cols_work[col] = len(cols_work)

Categories

python - Most efficient method for updating multiple columns in a single dataframe row

python - Most efficient method for updating multiple columns in a single dataframe row

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags