取消
顯示的結果
而不是尋找
你的意思是:

建立大熊貓數據幀的功能在應用方差減少

傑克
新的貢獻者二世

我建立一個分類模型使用以下數據幀的120000條記錄(5條記錄樣本如圖所示):

df通過這些數據,我已經建立了以下模型:

從sklearn。從sklearn.feature_extraction model_selection train_test_split進口。文本從sklearn.feature_extraction進口CountVectorizer。文本從sklearn.feature_extraction進口TfidfTransformer。文本從sklearn進口TfidfVectorizer。從sklearn naive_bayes MultinomialNB進口。feature_selection進口VarianceThreshold模型= MultinomialNB () X_train X_test, y_train, y_test = train_test_split (df2 [' descrp_clean '], df2 [' group_name '], random_state = 0, test_size = 0.25,分層= df2 [' group_name ']) #為每一個記錄,計算tf-idf # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # tfidf = TfidfVectorizer (min_df = 3, ngram_range = (1、3) # X_train: (1) tfidf,(2)減少dimentionality # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # x_train_tfidf = tfidf.fit_transform (X_train) VT_reduce = VarianceThreshold(閾值= 0.000005)x_train_tfidf_reduced = VT_reduce.fit_transform (x_train_tfidf) #估計樸素貝葉斯模型# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # clf =模型。fit (x_train_tfidf_reduced y_train) # X_test:應用方差閾值# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # x_test_tfidf = tfidf.transform (X_test) x_test_tfidf_reduced = VT_reduce.transform (x_test_tfidf) #預測使用模型# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # y_pred = model.predict (x_test_tfidf_reduced) #比較實際預測結果# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # model.score (x_test_tfidf_reduced y_test) * 100

我可以創建一個dataframe展示單詞標記之前應用方差閾值:

X_train_tokens = tfidf.get_feature_names () x_train_df = pd.DataFrame (X_train_tokens) x_train_df.tail (5)

後減少方差特性減少到21758:

df3問題我怎樣創建一個dataframe x_train_df特性應用方差減少將顯示我的21758的特性?

1接受解決方案

接受的解決方案

Dan_Z
尊敬的貢獻者

這更多的是一種scikit-learn問題比一個磚的問題。但是我認為在VT_reduced.get_support()可能是你尋找的東西:

https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html年代……

在原帖子查看解決方案

1回複1

Dan_Z
尊敬的貢獻者

這更多的是一種scikit-learn問題比一個磚的問題。但是我認為在VT_reduced.get_support()可能是你尋找的東西:

https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html年代……

歡迎來到磚社區:讓學習、網絡和一起慶祝

加入我們的快速增長的數據專業人員和專家的80 k +社區成員,準備發現,幫助和合作而做出有意義的聯係。

點擊在這裏注冊今天,加入!

參與令人興奮的技術討論,加入一個組與你的同事和滿足我們的成員。

Baidu
map