我試圖讓百分位值在不同的分歧,但我得到的結果數據磚PERCENTILE_DISC()函數是不準確的。我已經在MS SQL運行相同查詢但得到不同的結果集。
這裏都是Pyspark和MS SQL結果集
(1)磚查詢和結果集。
# # = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
ringlist = [(“ring3”, 1418年),
(“ring3”, 8014年),
(“ring3”, 4270))
列=(“環”、“價值”)
df1 =火花。createDataFrame (ringlist列)
df1.createOrReplaceTempView (“combineActionData”)
df_Percentile =火花。sql(“選擇戒指\
PERCENTILE_DISC(0.90)組(按價值)\
隨著環(分區)Percentile90 \
PERCENTILE_DISC(0.70)組(按價值)\
隨著環(分區)Percentile70 \
PERCENTILE_DISC(0.50)組(按價值)\
隨著環(分區)Percentile50 \
PERCENTILE_DISC(0.30)組(按價值)\
隨著環(分區)Percentile30 \
從combineActionData \
.display”) .distinct () ()
# # = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
(2)MS SQL結果集
/ / = = = = = = = = = = = = = = = = = = = = = =
創建表TestRing
(
環Varchar (50),
int值
)
插入TestRing
值(ring3, 1418),
(' ring3 ', 8014),
(' ring3 ', 4270)
選擇環
PERCENTILE_DISC(0.90)在集團(按價值)Percentile90環(分區)
PERCENTILE_DISC(0.70)在集團(按價值)Percentile70環(分區)
PERCENTILE_DISC(0.50)在集團(按價值)Percentile50環(分區)
PERCENTILE_DISC(0.30)在集團(按價值)Percentile30環(分區)
從TestRing
/ / = = = = = = = = = = = = = = = = = = = = =
注意:輸出結果集附加在附件。
請在這裏任何人幫助我,如果任何一個有一些想法。