Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
289 views
in Technique[技术] by (71.8m points)

python - Going through the same logic by order

I have a piece of code as below:

a = df[['col1', 'col2_1', 'col2_2', 'col2_3', 'col3]]
a_indices = np.argmax(a.ne(0).values, axis=1)
a_df = pd.DataFrame(a.values[np.arange(len(a)), a_indices])

b = df[['col2_1', 'col2_2', 'col2_3', 'col3', 'col1]]
b_indices = np.argmax(b.ne(0).values, axis=1)
b_df = pd.DataFrame(b.values[np.arange(len(b)), b_indices])

....

This code is repetitive, and I am hoping to loop them through. The idea is to have all the combination of different orders of cal_1, col_2(col2_1, col2_2, col2_3), and col_3. The return should be a combined dataframe of a_df and b_df.

Note: col2_1, col2_2, and col2_3 can have different orders, but they always stay next to each other. Anyways to make this piece of code simpler?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

What you can do so far is to define the maximum number of iterations to loop on. So far you have 5 columns to loop on.

list_columns = ['col1', 'col2_1', 'col2_2', 'col2_3', 'col3']
print(len(list_columns)) # returns 5

Then, you can define your column names based on what you want to put in your dataframe. Suppose you have 5 iterations to make. Your column names would be ['A', 'B', 'C', 'D', 'E']. This is the column argument of your dataframe. An easier way to concatenate several columns at once is to create a dictionary first, with each column name being the key and each of them having a list the same size as a value.

list_columns = ['col1', 'col2_1', 'col2_2', 'col2_3', 'col3']
new_columns = ['A', 'B', 'C', 'D', 'E']

# Use a dictionary comprehension in my case
data_dict = {column: [] for column in new_columns}

n = 50 # Assume the number of loops is arbitrary there

for i in range(n):
    for col in new_columns:
        # do something
        data_dict[col].append(something)
    

In your case it looks like you can directly operate on the lists by providing a NumPy array instead. Therefore:

list_cols = ['col1', 'col2_1', 'col2_2', 'col2_3', 'col3']
new_cols = ['A', 'B', 'C', 'D', 'E']
data_df = {}

for i, (col, new_col) in enumerate(zip(list_cols, new_cols)):
  print(col, list_cols[0:i] + list_cols[i+1:])
  temp_df = df[[col] + list_cols[0:i] + list_cols[i+1:]]
  temp_indices = np.argmax(temp_df.ne(0).values, axis=1) 
  data_df[new_col] = b.values[np.arange(len(temp_df)), temp_indices]

final_df = pd.DataFrame(data_df)

What I basically did was a double unpacking combining enumerate to get the index and zip to get your final result. The columns are there selected and placed before the rest of the list in no particular order.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...