解決:Re:尋找多個子DataFrame c…-磚- 13757

Tejas1987 · ‎07-12-2022

你好朋友,

我有一個DataFrame與特定的值。我試圖找到特定的值。

*我/ P -

| | | ID文本

|:|:- - - - - - |

| 1 |選擇不同的Col1 OrderID從表1 ((Col3像' % ABC % ')或(Col3像' % DEF % ')或(Col3像' % EFG % ')) |

| 2 |選擇不同的Col1 OrderID從表1 (Col2 = 1234)和((Col3像' % XYZ % ')或(Col3像' % PQR % ')) |

| 3 |選擇不同的Col1 OrderID從表1 ((Col3像' % MNO % ')或(Col3像' % PQR % ')) |

1 | 4 |選擇不同的Col1 OrderID、持續時間,從表1作為負載(Col3像' % PQR % ') |

| 5 |選擇不同的Col1作為OrderID從表1 ((Col4 =“AA”)或(Col4 =“BB”)或(Col4 =“CC”)或(Col4 = DD))和(Col3像' % XYZ % ') |

| 6 |選擇不同的Col1作為OrderID從表1 (Col1 = 1234) |

* * O / P -

| | ID text_codes |

|:|:- - - - - - - - - - - - |

| 1 | [“ABC”,“DEF”, ' EFG '] |

| 2 | | (“XYZ”、“評定”)

| 3 | | (MNO,“PQR”)

| 4 | |“評定”

| 5 | |“XYZ”

| 6 | []|

我已經嚐試——的步驟

我有索引的代碼基於“Col3”關鍵字+特定數量的單詞,所以中間輸出

* O / P -

| | ID codes_idx |

|:|:- - - - - - - - - - - |

| | 1 | 71;94;117

111 | | 2 | 88;

94 | | 3 | 71;

| 4 | 95 |

| 5 | 141 |

| 6 | |

代碼,

def toString (x): l =列(x) = (str(我)我的l)返回';' . join (ll) def getCodes (str1):中= ' (Col3像' res =[我+ 14 (len (str1))如果str1範圍。startswith(中、我)]= df2_query1.rdd toString (res)返回結果。地圖(λx (x [0], getCodes (x [0]))) result.collect () df_result = spark.createDataFrame(結果)

但是當我試圖讓子串函數應用。

res_list =[]我的範圍(len (str1)): idxi = str1。startswith(中,我)res_str =子串(str1 idxi + 14, 3) res_list.append (res_str)

我得到屬性錯誤,

org.apache.spark。SparkException:工作階段失敗而終止:任務0階段64.0失敗了4次,最近的失敗:在舞台上失去了任務0.3 64.0 (TID 118)(10.139.64.5執行人1):org.apache.spark.api.python。PythonException: & # 39; AttributeError: & # 39; NoneType& # 39;對象沒有屬性& # 39;_jvm& # 39; & # 39; & lt;命令- 1021210642233982祝辭,第9行。完整回溯如下:

我不知道我做錯了什麼。任何幫助都是感激。

謝謝,

光輝

AmanSehgal · ‎07-14-2022

substring函數的邏輯是什麼?

你不能使用str1 [idxi +十四3]子串嗎?

在原帖子查看解決方案

AmanSehgal · ‎07-14-2022

substring函數的邏輯是什麼?

你不能使用str1 [idxi +十四3]子串嗎?

Tejas1987 · ‎07-17-2022

你好,墨爾本,

它與你建議的表達。我隻是做一些小的改變最終的解決方案,但它是給我我想要的輸出。

我使用,

str1 (str (idxi) + 14: str (idxi) + 17)我的表達特定的值的函數,應用映射得到必要的輸出。

謝謝你！

問候,

光輝Parlikar

磚

發現多個動態子從DataFrame列?