Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
903 views
in Technique[技术] by (71.8m points)

find out the relative contribution of base data set through global fitting using Matlab?

I have obtained 2 data curves (y1 and y2) from my experiment. Both curves have the same x.
For each curve, they should be described by their own base data set.

 y1 = a* bd1_y1 + b* bd1_y2 + (1-a-b)* bd1_y3 
 y2 = a* bd2_y1 + b* bd2_y2 + (1-a-b)* bd2_y3 

, where

bd1_y1, bd1_y2, and bd1_y3 are the base set data for y1 bd2_y1, bd2_y2, and bd2_y3 are the base set data for y2 a and b are the relative contributions (0<a,b<1) of base data set and are shared.

I have tried the method posted here: https://www.mathworks.com/matlabcentral/answers/1056-nonlinear-fit-to-multiple-data-sets-with-shared-parameters#comment_542407 But after working on it for days, I still don't know how to incorporate the base set into the function (f0), and thus no luck extracting a and b.
How can I fit the y1 and y2 to obtain a, b, and their confidence interval? Your help is greatly appreciated.

Data:
x = [1     2     3     4     5     6     7     8     9  ]; 
y1 = [0.4304    0.2249    0.1283    0.0794    0.0484    0.0326    0.0203    0.0125    0.0072]; 
y2 = [0.2179    0.1699    0.1410    0.1101    0.0871    0.0679    0.0515    0.0385    0.0296]; 
bd1_y1 = [0.5587    0.2244    0.1023    0.0520    0.0276    0.0155    0.0089    0.0053    0.0033]; 
bd1_y2 = [0.4580    0.2788    0.1198    0.0642    0.0342    0.0197    0.0115    0.0069    0.0043]; 
bd1_y3 = [0.3584    0.3102    0.1540    0.0755    0.0440    0.0248    0.0148    0.0091    0.0056]; 
bd2_y1 = [0.3266    0.1778    0.1255    0.0975    0.0777    0.0612    0.0478    0.0367    0.0281]; 
bd2_y2 = [0.2985    0.2086    0.1268    0.0939    0.0722    0.0580    0.0470    0.0383    0.0313]; 
bd2_y3 = [0.2451    0.2221    0.1434    0.0999    0.0775    0.0609    0.0494    0.0406    0.0335]; 
question from:https://stackoverflow.com/questions/66052010/find-out-the-relative-contribution-of-base-data-set-through-global-fitting-using

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Usually, I try to avoid constraints as much as possible. So the first step should be: check if the true solution is within the constraints anyway. I'll do the programming in Python, but it is quite generic, so it should be easy to translate it to Matlab.

import numpy as np

xl = np.arange(9)+1

y1 = np.array([
    0.4304, 0.2249, 0.1283, 0.0794, 0.0484, 0.0326, 0.0203, 0.0125, 0.0072
]) 
y2 = np.array([
    0.2179, 0.1699, 0.1410, 0.1101, 0.0871, 0.0679, 0.0515, 0.0385, 0.0296
])
bd1_y1 = np.array([
    0.5587, 0.2244, 0.1023, 0.0520, 0.0276, 0.0155, 0.0089, 0.0053, 0.0033
])
bd1_y2 = np.array([
    0.4580, 0.2788, 0.1198, 0.0642, 0.0342, 0.0197, 0.0115, 0.0069, 0.0043
])
bd1_y3 = np.array([
    0.3584, 0.3102, 0.1540, 0.0755, 0.0440, 0.0248, 0.0148, 0.0091, 0.0056
])
bd2_y1 = np.array([
    0.3266, 0.1778, 0.1255, 0.0975, 0.0777, 0.0612, 0.0478, 0.0367, 0.0281
])
bd2_y2 = np.array([
    0.2985, 0.2086, 0.1268, 0.0939, 0.0722, 0.0580, 0.0470, 0.0383, 0.0313
])
bd2_y3 = np.array([
    0.2451, 0.2221, 0.1434, 0.0999, 0.0775, 0.0609, 0.0494, 0.0406, 0.0335
])

"""
if y = a x1 + b x2 + (1- a - b ) x3 then
y - x3 = a ( x1 - x3  ) + b ( x2 -x3 ) 
i.e. simple transformation to get a homogeneous linear system
"""

y1k = y1 - bd1_y3
y11k = bd1_y1 - bd1_y3
y12k = bd1_y2 - bd1_y3

y2k = y2 - bd2_y3
y21k = bd2_y1 - bd2_y3
y22k = bd2_y2 - bd2_y3

"""
So we just want to solve a simple linear system 
y = a x1 + b x2 
with y = (y1,...., yn ) and x1 = ( x11, x12, ..., x1n) etc
handling the two measurements with shared parameters is done by just 
adding the according vectors so yp and yq go into 
y_tot = ( yp1, ..., ypn, yq1, ...yqn ) and same for  the x
The x for the matrix A such we can write the thing as y = A.(a,b)
This is standard linear optimization!
this is done in short in the following (first joining the data sets)
"""

y = np.append( y1k, y2k )
z1 = np.append( y11k, y21k )
z2 = np.append( y12k, y22k )
AT = np.array( [ z1, z2 ] )
A = np.transpose( AT )

U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )

v = np.dot( K, y )
a = v[0]
b = v[1]
print( a, b )

"""
in detail this should be done more carefully to avoid singular matrices etc,
e.g. via single value decomposition etc

in a next step it is usually assumed that all measurements share the 
same base error s which we calculate
"""

diff = y - a * z1 - b * z2
s2 = np.sum( diff**2 ) / ( len( diff ) - 1 )

"""
The covariance matrix now is calculated via error propagation,
namely like s_a = s * sqrt( sum_i ( da / dyi )^2 ), which gives the classical

s^2 * K KT
"""

cov = s2 * np.dot( K, np.transpose( K ) )
print( cov )

"""
the error of a is (and be accordingly)
"""

sa = np.sqrt( cov[0,0] )
sb = np.sqrt( cov[1,1] )

print( sa, sb )

This code already joins the two data sets, which could be extended to n sets easily. The final solution looks like this:

No constraints

upper graphs: original data. The target is blue and the three base data sets are yellow, green and red, respectively. lower graphs are the blue target and the yellow best fit without constraints

With the solution

>> 1.9712440231598143 -3.163257723318949

So this is not within boundaries. Actually, a > 1 and b < 0. So we know that the solution lies on the boundary. In the (a,b)-space, either on the right or on the bottom. Least square minimization in a truly linear problem gives a true parabolic shape, so there is only one solution, no local minima.

Now instead of trying to solve the constraint problem, we can force the solution somewhat on the boundary.

"""
We just set on parameter to the according boundary value
and just solve as before. Here a matrix formalism  is a small 
overkill, but for consistency a good way of writing it.
"""

"""
case 1: b = 0
"""
AT = np.array( [ b1all ] )
A = np.transpose( AT )
U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )
v = np.dot( K, yall )
aopt1 = v[0]

diff = yall - aopt1 * b1all - 0 * b2all
s2 = np.sum( diff**2 ) / ( len( diff ) - 1 )
cov = s2 *np.dot( K, np.transpose( K ) )
print( aopt1 )
print( cov, np.sqrt(cov[0,0]) )

"""
case 2: a=1
"""
ymod = yall - b1all
AT = np.array( [ b2all ] )
A = np.transpose( AT )
U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )
v = np.dot( K, ymod )
bopt1 = v[0]

diff = ymod - bopt1 * b2all
s2 = np.sum( diff**2 ) / ( len( diff ) - 1 )
cov = s2 *np.dot( K, np.transpose( K ) )
print( bopt1 )
print( cov, np.sqrt(cov[0,0]) )

Which gives for a

>> 0.3815249997379142
>> [[0.00807797]] 0.08987750010601071

and for b

>> -1.2997646861507886
>> [[0.01736744]] 0.13178558294218914

So on the a=1 line the optimum value for b is not within the constraints again, while with b = 0 there is a minimum for a.

From a graphics point this looks as

1D search chi-square surface as density plot. The boundary is shown in red. The true minimum is visible via the elliptic contour lines. The local minima on a=1 and b=0 are shown as blue dots

Here one can see the parabolic minimum of the chi-square surface. In case this has to be applied to many different data sets, it should be easy to automate the decision making. Calculation is fast, as everything is linear and can be calculated in one shot. If the true solution is outside the border and there is one border with a local minimum, there will be a second, but this is on the other side, so not relevant. Finally there is the case that for each border the minimum does not fulfill the constraints. Then the solution is the according corner. To some extend this simple case allows to manually perform steps that otherwise would be performed an algorithm for inequality constraints (equipped for much more complicated situations).

Eventually, the best fit within constraints looks like: Optimum with constraints As above but with constraints


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...