Hello @kartik,
If you are able to obtain a DOMDocument object representing your HTML, then you just need to traverse it recursively and construct the data structure that you want.
Converting your HTML document into a DOMDocument should be as simple as this:
function html_to_obj($html) {
$dom = new DOMDocument();
$dom->loadHTML($html);
return element_to_obj($dom->documentElement);
}
Then, a simple traversal of $dom->documentElement which gives the kind of structure you described could look like this:
function element_to_obj($element) {
$obj = array( "tag" => $element->tagName );
foreach ($element->attributes as $attribute) {
$obj[$attribute->name] = $attribute->value;
}
foreach ($element->childNodes as $subElement) {
if ($subElement->nodeType == XML_TEXT_NODE) {
$obj["html"] = $subElement->wholeText;
}
else {
$obj["children"][] = element_to_obj($subElement);
}
}
return $obj;
}
Hope this is helpful!!