Python将字典转成scipy sparse matrix求方法,scipysparse,我的数据是从数据库里读出


我的数据是从数据库里读出来的,已经是稀疏矩阵了-(doc_a,doc_b,count)

如下:

doc_term_dict={('d1','t1'):12, ('d2','t3'):10, ('d3','t2'):5}<type 'dict'>

我用scikit-learn包做聚类,聚类接入的数据格式是 scipy.sparse.csr.csr_matrix
如下:

(0, 2164)   0.245793088885(0, 2076)   0.205702177467(0, 2037)   0.193810934784(0, 2005)   0.14547028437(0, 1953)   0.153720023365...<class 'scipy.sparse.csr.csr_matrix'>

求助如何转换呢?我看着半天都是吧普通dict转成scipy sparse matrix的,没找到如何把一个已经是sparse dict转成scipy sparse matrix?

我搞懂了正确转换方法,比较简单

1.先将dict转换成COO matrix,再转换成CSR matrix

    A[row[k], column[k] = data[k]]    # 创建 COO-matrix    coo = coo_matrix((data,(row,col)))    # Scipy 转换 COO 到 CSR format    return csr_matrix(coo)

代码

    from scipy.sparse import csr_matrix, coo_matrix    def convert(term_dict):        ''' Convert a dictionary with elements of form ('d1', 't1'): 12 to a CSR type         matrix.        The element ('d1', 't1'): 12 becomes entry (0, 0) = 12.        * Conversion from 1-indexed to 0-indexed.        * d is row        * t is column.        '''        # Create the appropriate format for the COO format.        data = []        row = []        col = []        for k, v in term_dict.items():            r = int(k[0][1:])            c = int(k[1][1:])            data.append(v)            row.append(r-1)            col.append(c-1)        # Create the COO-matrix        coo = coo_matrix((data,(row,col)))        # Let Scipy convert COO to CSR format and return        return csr_matrix(coo)    if __name__=='__main__':    doc_term_dict = { ('d1','t1'): 12,             \            ('d2','t3'): 10,             \            ('d3','t2'):  5              \            }       print(convert(doc_term_dict))

编橙之家文章,

评论关闭