Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others


0 votes
in Technique[技术] by (71.8m points)

python - For data with a `set[int]` value, what fast means exist for grouping based on having at least one common member?

Currently, I am tackling such a problem by parsing through each set, picking each member, adding or skipping said member to some memory: set variable (to see if the number has already been parsed as a result of looking at some other set), and then having all sets which contain said member "reindexed" to be the union of all of said sets.

In code:

from typing import Set

from pandas import DataFrame

df = DataFrame({"set": [frozenset([1, 3]), frozenset([2, 3]), frozenset([5, 4])], 'data': [1, 2, 3]})
memory: Set[int] = set()
membership: frozenset
for membership in df["set"]:  # "for each set"
    localMembers = membership
    for i in membership:  # "for each element if not in memory"
        if i not in memory:
            others: frozenset
            for others in [m for m in df["set"] if i in m]:
                superset = localMembers.union(others)
                for toChange in df.index[df["set"] == localMembers].tolist():
                    df.at[toChange, "set"] = superset
                for toChange in df.index[df["set"] == others].tolist():
                    df.at[toChange, "set"] = superset
                localMembers = superset


>> df
         set  data
0  (1, 2, 3)     1
1  (1, 2, 3)     2
2     (4, 5)     3

This is, of course, extremely slow and thus was wondering what other means I could look into in order to speed such a process up. I imagine one approach could be to get the categories and then to do all the setting at the end.

Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share

2.1m questions

2.1m answers


56.5k users