IndexedRowMatrix¶

類 pyspark.mllib.linalg.distributed。 IndexedRowMatrix ( 行:pyspark.rdd.RDD(聯盟(元組(int,VectorLike],pyspark.mllib.linalg.distributed.IndexedRow]],numRows:int=0,numCols:int=0 ) ¶

代表一個row-oriented分布式索引行的矩陣。

參數

行 pyspark.RDD: 抽樣IndexedRows或(int,矢量)組成的元組或DataFrame int類型列指數和一個向量類型的列。
numRows int,可選: 矩陣的行數。非容積值是未知的,此時的行數將由馬克斯行索引+ 1。
numCols int,可選: 矩陣的列數。非容積值是未知的,此時列的數量將取決於大小的第一行。

方法

`columnSimilarities`()	計算所有餘弦相似性列。
`computeGramianMatrix`()	格蘭姆矩陣計算一個T ^。
`computeSVD`(k [、computeU rCond])	IndexedRowMatrix計算奇異值分解。
`乘`(矩陣)	這個矩陣乘以一個本地稠密矩陣在右邊。
`numCols`()	獲取或計算的關口。
`numRows`()	獲取或計算的行數。
`toBlockMatrix`([rowsPerBlock colsPerBlock])	這個矩陣轉換為一個BlockMatrix。
`toCoordinateMatrix`()	這個矩陣轉換為一個CoordinateMatrix。
`toRowMatrix`()	這個矩陣轉換為一個RowMatrix。

屬性

`行`	行IndexedRows IndexedRowMatrix存儲為一個抽樣。

方法的文檔

columnSimilarities ( )→pyspark.mllib.linalg.distributed.CoordinateMatrix ¶

計算所有餘弦相似性列。

例子

             > > >行=sc。並行化([IndexedRow(0,(1,2,3]),…IndexedRow(6,(4,5,6))))> > >墊=IndexedRowMatrix(行)> > >cs=墊。columnSimilarities()> > >打印(cs。numCols())3
            

computeGramianMatrix ( )→pyspark.mllib.linalg.Matrix ¶

格蘭姆矩陣計算一個T ^。

筆記

這不能計算矩陣有超過65535列。

例子

             > > >行=sc。並行化([IndexedRow(0,(1,2,3]),…IndexedRow(1,(4,5,6))))> > >墊=IndexedRowMatrix(行)
            

             > > >墊。computeGramianMatrix()DenseMatrix (3 3 (17.0, 22.0, 27.0, 22.0, 29.0, 36.0, 27.0, 36.0, 45.0), 0)
            

computeSVD ( k:int,computeU:bool=假,rCond:浮動=1 e-09 )→pyspark.mllib.linalg.distributed.SingularValueDecomposition ( pyspark.mllib.linalg.distributed.IndexedRowMatrix , pyspark.mllib.linalg.Matrix ] ¶

IndexedRowMatrix計算奇異值分解。

給定行矩陣的維度(m X n)分解成U * * V的地方

U: (m X k)(左奇異向量)是一個IndexedRowMatrix

的列是特征向量(X ')
s: DenseVector組成的特征值的平方根

(奇異值)在降序排列。
v: X k (n)(右奇異向量)矩陣的列

的特征向量(X)

更具體的實現細節,請參閱scala文檔。

參數

k int: 保持領先的奇異值的數量(0 < k < = n)。它可能會返回小於k如果有數值零奇異值或沒有足夠的麗茲值聚合前Arnoldi更新迭代的最大數量達到(以防矩陣A是壞脾氣的)。
computeU bool,可選: 是否要計算,如果設置為真,那麼計算U * V * s ^ 1
rCond 浮動,可選: 互惠的條件數。所有奇異值小於rCond *[0]被視為零,年代[0]是最大的奇異值。

返回

SingularValueDecomposition

例子

             > > >行=((0,(3,1,1)),(1,(- - - - - -1,3,1)))> > >irm=IndexedRowMatrix(sc。並行化(行))> > >svd_model=irm。computeSVD(2,真正的)> > >svd_model。U。行。收集()[IndexedRow (0, [-0.707106781187, 0.707106781187]), IndexedRow (1 [-0.707106781187, -0.707106781187])]> > >svd_model。年代DenseVector ([3.4641, 3.1623])> > >svd_model。VDenseMatrix (3 2 (-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, 0.0…), 0)
            

乘 ( 矩陣:pyspark.mllib.linalg.Matrix )→pyspark.mllib.linalg.distributed.IndexedRowMatrix ¶

這個矩陣乘以一個本地稠密矩陣在右邊。

參數

矩陣 pyspark.mllib.linalg.Matrix: 當地一個稠密矩陣的行數必須匹配這個矩陣的列數

返回

IndexedRowMatrix

例子

             > > >墊=IndexedRowMatrix(sc。並行化(((0,(0,1)),(1,(2,3)))))> > >墊。乘(DenseMatrix(2,2,(0,2,1,3)))。行。收集()[IndexedRow (0, [2.0, 3.0]), IndexedRow (1 [6.0, 11.0])]
            

numCols ( )→int¶

獲取或計算的關口。

例子

             > > >行=sc。並行化([IndexedRow(0,(1,2,3]),…IndexedRow(1,(4,5,6]),…IndexedRow(2,(7,8,9]),…IndexedRow(3,(10,11,12))))
            

             > > >墊=IndexedRowMatrix(行)> > >打印(墊。numCols())3
            

             > > >墊=IndexedRowMatrix(行,7,6)> > >打印(墊。numCols())6
            

numRows ( )→int¶

獲取或計算的行數。

例子

             > > >行=sc。並行化([IndexedRow(0,(1,2,3]),…IndexedRow(1,(4,5,6]),…IndexedRow(2,(7,8,9]),…IndexedRow(3,(10,11,12))))
            

             > > >墊=IndexedRowMatrix(行)> > >打印(墊。numRows())4
            

             > > >墊=IndexedRowMatrix(行,7,6)> > >打印(墊。numRows())7
            

toBlockMatrix ( rowsPerBlock:int=1024年,colsPerBlock:int=1024年 )→pyspark.mllib.linalg.distributed.BlockMatrix ¶

這個矩陣轉換為一個BlockMatrix。

參數

rowsPerBlock int,可選: 組成每一塊的行數。塊形成最後一行不需要給定的行數。
colsPerBlock int,可選: 列數,每一塊。塊形成最後一列不需要給定的列數。

例子

             > > >行=sc。並行化([IndexedRow(0,(1,2,3]),…IndexedRow(6,(4,5,6))))> > >墊=IndexedRowMatrix(行)。toBlockMatrix()
            

             > > >#這IndexedRowMatrix 7有效行,由於> > >6 #最高的行索引,以及隨之而來的> > ># BlockMatrix也會有7行。> > >打印(墊。numRows())7
            

             > > >打印(墊。numCols())3
            

toCoordinateMatrix ( )→pyspark.mllib.linalg.distributed.CoordinateMatrix ¶

這個矩陣轉換為一個CoordinateMatrix。

例子

             > > >行=sc。並行化([IndexedRow(0,(1,0]),…IndexedRow(6,(0,5))))> > >墊=IndexedRowMatrix(行)。toCoordinateMatrix()> > >墊。條目。取(3)[MatrixEntry (0, 0, 1.0), MatrixEntry (0、1、0.0), MatrixEntry (0.0 6 0))
            

toRowMatrix ( )→pyspark.mllib.linalg.distributed.RowMatrix ¶

這個矩陣轉換為一個RowMatrix。

例子

             > > >行=sc。並行化([IndexedRow(0,(1,2,3]),…IndexedRow(6,(4,5,6))))> > >墊=IndexedRowMatrix(行)。toRowMatrix()> > >墊。行。收集()[DenseVector ((1.0, 2.0, 3.0)), DenseVector ((4.0, 5.0, 6.0)))
            

屬性的文檔

行 ¶

行IndexedRows IndexedRowMatrix存儲為一個抽樣。

例子

             > > >墊=IndexedRowMatrix(sc。並行化([IndexedRow(0,(1,2,3]),…IndexedRow(1,(4,5,6)))))> > >行=墊。行> > >行。第一個()IndexedRow (0, (1.0, 2.0, 3.0))
            

以前的

IndexedRow

下一個

MatrixEntry