Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.8k views
in Technique[技术] by (71.8m points)

performance - How can I speed up fetching the results after running an sqlite query?

As an answer on my question: Is it normal that sqlite.fetchall() is so slow? it seems that fetch-all and fetch-one can be incredibly slow for sqlite.

As I mentioned there, I have the following query:

time0 = time.time()
self.cursor.execute("SELECT spectrum_id, feature_table_id "+
                "FROM spectrum AS s "+
                "INNER JOIN feature AS f "+
                "ON f.msrun_msrun_id = s.msrun_msrun_id "+
                "INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax "+
                             "FROM convexhull GROUP BY feature_feature_table_id) AS t "+
                "ON t.feature_feature_table_id = f.feature_table_id "+
                "WHERE s.msrun_msrun_id = ? "+
                "AND s.scan_start_time >= t.rtMin "+
                "AND s.scan_start_time <= t.rtMax "+
                "AND base_peak_mz >= t.mzMin "+
                "AND base_peak_mz <= t.mzMax", spectrumFeature_InputValues)
print 'query took:',time.time()-time0,'seconds'

time0 = time.time()
spectrumAndFeature_ids = self.cursor.fetchall()      
print time.time()-time0,'seconds since to fetchall'

The execution of the select statement takes about 50 seconds (acceptable). However, the fetchall() takes 788 seconds, only fetching 981 results.

The way proposed to speed up the query given as answer to my question: Is it normal that sqlite.fetchall() is so slow? using fetchmany(), has not improved the speed of fetching the results.

How can I speed up fetching the results after running an sqlite query?


The sql exactly as I tried to execute it on command line:

sqlite> SELECT spectrum_id, feature_table_id
   ...> FROM spectrum AS s 
   ...> INNER JOIN feature AS f 
   ...> ON f.msrun_msrun_id = s.msrun_msrun_id 
   ...> INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax 
   ...> FROM convexhull GROUP BY feature_feature_table_id) AS t 
   ...> ON t.feature_feature_table_id = f.feature_table_id 
   ...> WHERE s.msrun_msrun_id = 1
   ...> AND s.scan_start_time >= t.rtMin
   ...> AND s.scan_start_time <= t.rtMax
   ...> AND base_peak_mz >= t.mzMin
   ...> AND base_peak_mz <= t.mzMax;

update:

So I started running the query on the commandline about 45 minutes ago, and it's still busy, so it's also very slow using the commandline.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

From reading this question, it sounds like you could benefit from using the APSW sqlite module. Somehow you may be victim of your sqlite module causing your query to be executed in some less performant manner.

I was curious so I tried using apsw myself. It wasn't too complicated. Why don't you give it a try?

To install it I had to:

  1. Extract the latest version.
  2. Have the installation package fetch the latest sqlite amalgamation.

    python setup.py fetch --sqlite
    
  3. Build and install.

    sudo python setup.py install
    
  4. Use it in place of the other sqlite module.

    import apsw
    <...>
    conn = apsw.Connection('foo.db')
    

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...